[math-fun] 32-bit reversal speed tests
I tested several 32-bit reversors for speed. Arndt32BitReverseA: 0.598 like Freed but without explicit constants Arndt32BitReverseB: 0.492 uses 31=5*3+1+5*3, 32=31+1 Freed32BitReverse: 0.484 uses 32,16,8,4,2,1 WDS32BitReverseA: 0.526 uses 32=27+5; 5 handled using multiply/mask tricks WDS32BitReverseB: 0.500 uses 32=27+5; 5 handled via 32-byte table lookup The winner is the comparatively boring: uint32 Freed32BitReverse(uint32 v){ //Edwin Freed in Dr. Dobb's Journal 1983. v = ((v >> 1) & 0x55555555) | ((v & 0x55555555) << 1); // swap odd and even bits v = ((v >> 2) & 0x33333333) | ((v & 0x33333333) << 2); // swap consecutive pairs v = ((v >> 4) & 0x0F0F0F0F) | ((v & 0x0F0F0F0F) << 4); // swap nybbles v = ((v >> 8) & 0x00FF00FF) | ((v & 0x00FF00FF) << 8); // swap bytes v = ( v >> 16 ) | ( v << 16); //swap 2-byte long pairs return(v); } -- Warren D. Smith http://RangeVoting.org <-- add your endorsement (by clicking "endorse" as 1st step)
* Warren D Smith <warren.wds@gmail.com> [Apr 09. 2014 18:51]:
On 4/9/14, Warren D Smith <warren.wds@gmail.com> wrote:
I tested several 32-bit reversors for speed.
--and, the winning 32-bit reversor is actually slower than Arndt's 64-bit reversor that uses assembler "bswap" inside.
Total futility.
If, in their infinite wisdom, the CPU makers would have added a "reverse bits in every byte", we'd be free from this business for all practical purposes. Exactly _zero_ register pressure. The additional wiring/gates cannot possibly be nontrivial. Now intel is looking at 512 bit regs and still their instruction set's anti-orthogonality is a poo fest that makes even the bad old x86 instruction set look structured. Humanity weeps. Let's write down (yes, yes, futility; all below ignoring bswap) for some specific n. Butterfly paradigm from now on. We can do 63 bits in 4 steps, as we have seen. 126 goes by [ 63, 63 ] in just one more step (swap those two halves in one more step, total now 5 steps). 127 goes by [ 63, 1, 63 ] in just one more step (swap those two halves in one more step, total now 5 steps). 128 goes by [ 1, 63, 63, 1 ] in just two steps (swap those two 63's in one more step, and those two 1's in one more, total now 6 steps). That is likely the first power of 2 that one can do in less then log_2(n) steps. But then, these butterfly steps cost more than "cut out by masks and shift-or together" as used in the straight forward routines. 129 (thanks for asking) goes by [ 1, 63, 1, 63, 1 ], again 6 steps. Yours futile, jj
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
* Joerg Arndt <arndt@jjj.de> [Apr 10. 2014 16:28]:
[...]
129 (thanks for asking) goes by [ 1, 63, 1, 63, 1 ], again 6 steps.
Nope, 129 takes only 5, via [ 43 43 43 ], where 43 takes 4 via [ 21 1 21 ], where 21 takes 3 via [ 9 3 9 ]. Surprisingly, the first deviation from the OEIS sequences occurs quite a bit earlier than I thought, ... *** drum roll *** uint revbin17(uint x) { uint z; z = (x ^ (x >> 2)) & 0b00100010010010001; x ^= z ^ (z << 2); z = (x ^ (x >> 4)) & 0b00001110000000111; x ^= z ^ (z << 4); z = (x ^ (x >> 10)) & 0b00000000001111111; x ^= z ^ (z << 10); return x; } Best, jj
[...]
participants (2)
-
Joerg Arndt -
Warren D Smith