[math-fun] Arndt bit reversal 32 request
OK, another try. 32 = 27+5. Do the 27-reversal using 27=3*3*3 in "3 steps." Do the 5-reversal using a 32-entry lookup table in "1 step". Finally glue them together in reverse order. Sounds like "5 steps," which J.Arndt will reject as too many. However, really the 5+27 --> 27+5 swappage is done via a 5-hop rotation, which is got by a 5-hop left shift, a 27-hop right shift, and an OR. The word "and" here can be replaced by "and a table lookup, then". As a result, the whole algorithm is really i. 27-reverse in 3 steps. ii. 5+27 --> 27+5 swappage while inserting a 5->5 table lookup. This a 4 steps plus a table lookup, which is kind of "4 and a half steps." Plus..., you know you can do two 32-bit reversals at once using 64-bit reversal so on average faster :) ---------------------- Meanwhile, in case you care, I made a set of routines for dealing with 8-byte arrays as single 64-bit words, such as vector add, subtract, scale, additive shift by constant, subtract-with-saturation (negative results become 0), and sum-of-all-bytes, Many of these need to assume all byte values lie in [0..127]. Only the last two are nontrivial. -- Warren D. Smith http://RangeVoting.org <-- add your endorsement (by clicking "endorse" as 1st step)
However, really the 5+27 --> 27+5 swappage is done via a 5-hop rotation, which is got by a 5-hop left shift, a 27-hop right shift, and an OR. The word "and" here can be replaced by "and a table lookup, then". As a result, the whole algorithm is really
i. 27-reverse in 3 steps. ii. 5+27 --> 27+5 swappage while inserting a 5->5 table lookup. This a 4 steps plus a table lookup, which is kind of "4 and a half steps."
--Also, as another slimy alternative, the 5-bit reversal can be done via multiply by 100001000010000100001 binary, then mask, then another multiply by a constant, which results in the 5 bits magically moving from least-signif to most-signif end of the 32-bit word in reversed order. (No table lookup or shift needed.) The thing is, the 5 and the 27 will be being dealt with in parallel on a lot of processors, so that effectively the routine will take fewer steps.
participants (1)
-
Warren D Smith