## Saturday, October 18, 2008

### Fast pixel-averaging

I don't know why it took me so long to realize that there's an easy, fast way to obtain the average of two RGB pixel values. (An RGB pixel is commonly represented as a 32-bit integer. Let's assume the top 4 bits aren't used.)

To ensure proper averaging of red, green, and blue components of two pixels requires parsing those 8-bit values out of each pixel and adding them together, then dividing by two, and crafting a new pixel out of the new red, green, and blue values. Or at least that's the naive way of doing things. In code (I'll show it in JavaScript, but it looks much the same in C or Java):

`   // The horribly inefficient naive way:   function average( a,b ) {      var REDMASK = 0x00ff0000;      var GREENMASK = 0x0000ff00;      var BLUEMASK = 0x000000ff;      var aRed = a & REDMASK;      var aGreen = a & GREENMASK;      var aBlue = a & BLUEMASK;      var bRed = b & REDMASK;      var bGreen = b & GREENMASK;      var bBlue = b & BLUEMASK;      var aveRed = (aRed + bRed) >> 1;      var aveGreen = (aGreen + bGreen) >> 1;      var aveBlue = (aBlue + bBlue) >> 1;         return aveRed | aveGreen | aveBlue;   }`

That's a lot of code to average two 32-bit values, but remember that red, green, and blue values (8 bits each) have to live in their own swim lanes. You can't allow overflow.

Here's the much cleaner, less obvious, hugely faster way:

`   // the fast way:   MASK7BITS = 0x00fefeff;    function ave( a,b ) {            a &= MASK7BITS;      b &= MASK7BITS;        return (a+b)>>1;   }`

The key intuition here is that you want to clear the bottom bit of the red and green channels in order to make room for overflow from the green and blue "adds."

Of course, in the real world, you would inline this code rather than use it as a function. (In a loop that's processing 800 x 600 pixels you surely don't want to call a function hundreds of thousands of times.)

Similar mask-based techniques can be used for adding and subtracting pixel values. Overflow is handled differently, though (left as an exercise for the reader).

1. Hmmm, I think both of these are a bit off:

>>> average(0x030303, 0).toString(16)
"18181"

(3, 3, 3) averaged with (0, 0, 0) should not be (0x1, 0x81, 0x81). When the average function does its shifts, it allows the low bit of the red and blue channels to slide over into the high bit of the neighboring channel.

And the ave function loses some precision:

>>> ave(0x030303, 0x030303).toString(16)
"20203"

You do lose some information when you mask off the low bit of red and green.

2. it allows the low bit of the red and blue channels to slide over into the high bit of the neighboring channel.

er, I mean red and green channels.

3. Thanks for the input, Mitch. There is indeed quantization loss. The fast algorithm is tantamount to "quantize every color to 7 bits, then add the two inputs and divide by two." The quantization step (the mask) results in colors that map to 0,2,4,6..126 instead of 0,1,2..255. For consistency the mask should probably be shown as 0x00fefefe instead of 0x00fefeff.

The tradeoff is to accept the precision loss (of mapping 8-bit colors to a 7-bit palette before "averaging" them) for high performance, or don't accept it and run slower. I prefer having the extra performance, because your eye cannot distinguish the difference between an image in which averaging was done the arithmetically "correct" way versus the fast way.

As for the first algorithm, you're right, it's not correct; it allows bit-spillback as you describe. Doh!

4. A fair enough tradeoff, but is it really necessary?

You could get full precision while still handling the channels in parallel by handling the high bits separately:

function ave2(a, b) {
var lowbits = (((a & 0x7f7f7f) + (b & 0x7f7f7f)) & 0xfefefe);

var highbits = (a & 0x808080) + (b & 0x808080);

return (lowbits + highbits) >> 1;
}

Assuming I haven't made a mistake there, then it can be done with 5 ANDs, 0 ORs, 3 additions, and 1 shift. Compared to the "ave" function, it's doing 3 more ANDs and 2 more additions, in exchange for an extra bit of precision.

Since it's probably all in registers though (at least in C), I wonder how much of a speed difference it actually makes. Is the bit-twiddling the bottleneck, or is it memory bandwidth? Are you actually using this? I'd be curious to see the results if you do a benchmark.

5. Add a comment. Registration required because trolls.