* Warren D Smith <warren.wds@gmail.com> [Apr 09. 2014 18:51]:
On 4/9/14, Warren D Smith <warren.wds@gmail.com> wrote:
I tested several 32-bit reversors for speed.
--and, the winning 32-bit reversor is actually slower than Arndt's 64-bit reversor that uses assembler "bswap" inside.
Total futility.
If, in their infinite wisdom, the CPU makers would have added a "reverse bits in every byte", we'd be free from this business for all practical purposes. Exactly _zero_ register pressure. The additional wiring/gates cannot possibly be nontrivial. Now intel is looking at 512 bit regs and still their instruction set's anti-orthogonality is a poo fest that makes even the bad old x86 instruction set look structured. Humanity weeps. Let's write down (yes, yes, futility; all below ignoring bswap) for some specific n. Butterfly paradigm from now on. We can do 63 bits in 4 steps, as we have seen. 126 goes by [ 63, 63 ] in just one more step (swap those two halves in one more step, total now 5 steps). 127 goes by [ 63, 1, 63 ] in just one more step (swap those two halves in one more step, total now 5 steps). 128 goes by [ 1, 63, 63, 1 ] in just two steps (swap those two 63's in one more step, and those two 1's in one more, total now 6 steps). That is likely the first power of 2 that one can do in less then log_2(n) steps. But then, these butterfly steps cost more than "cut out by masks and shift-or together" as used in the straight forward routines. 129 (thanks for asking) goes by [ 1, 63, 1, 63, 1 ], again 6 steps. Yours futile, jj
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun