[math-fun] bit-reversal for words of length 3^k, and a hundred E-bucks
Before I forget about it: The usual technique is - swapping pairs of single bits - swapping pairs of two bits - swapping pairs of four bits etc. til end. For words of length 3, one just has to swap the first and last bit. For words of length 3, one can do as just said for the length-3 sub-words then swap the first and last runs of 3 bits. Etc., 3^k bits reversed in k steps! Naively doing this with words of length not of the form 3^k leads to deletion of initial ones, BAD! Now one has to truncate masks to avoid losing ones. When you look into it, there is a truly huge amount of possible ways to do the task. Here is a way of reversing words of 2^k-1 bits: 1 bit: nop 3 bits: swap first and last 7 bits: as before for first and last 3 bits (one bits stays in the middle), then swap first and runs of 3 bits. 2^(k+1)-1 bits: as 2^k-1 for the first and last 2^k-1 bits, (one bits stays in the middle), then swap first and runs of 2^k-1 bits. Doing better is (for me, so far) a matter of creativity, the neat routine I gave for 64 bits uses the fact that 64 - 1 is divisible by 9. Here is a strategy for a (slightly different) routine, just as fine as the one I have given: Split the 63 low bits as [27][9][27] <--= [b] means "run of b bits" now use the 3^k idea to reverse all runs of 9 bits (two steps). At the next step, the two runs of 27 are completed (the middle run has to be left alone). At the last step, swap the two 27-bit runs. Done in four steps, or five for 64 bits with just an additional rotation by one. I didn't manage to find anything for 32 bits (neither 31 bits) better than this. I offer 100 Euro for a routine for 32 bits (or 31, your choice) that uses one op less than my routine for 64 (or 63 when you pick 31 above). Best, jj
* Joerg Arndt <arndt@jjj.de> [Apr 07. 2014 07:22]:
[...]
I didn't manage to find anything for 32 bits (neither 31 bits) better than this.
I offer 100 Euro for a routine for 32 bits (or 31, your choice) that uses one op less than my routine for 64 (or 63 when you pick 31 above).
It is indeed possible! For 32 bits: uint jj3_revbin32_4a(uint x) { uint z; #if 1 z = (x ^ (x >> 1)) & 0x4a52a529U; x ^= z ^ (z << 1); z = (x ^ (x >> 3)) & 0x18c60c63U; x ^= z ^ (z << 3); z = (x ^ (x >> 10)) & 0x003e001fU; x ^= z ^ (z << 10); z = (x ^ (x >> 17)) & 0x00007fffU; x ^= z ^ (z << 17); #else // same code using GCC's binary literals z = (x ^ (x >> 1)) & 0b01001010010100101010010100101001U; x ^= z ^ (z << 1); z = (x ^ (x >> 3)) & 0b00011000110001100000110001100011U; x ^= z ^ (z << 3); z = (x ^ (x >> 10)) & 0b00000000001111100000000000011111U; x ^= z ^ (z << 10); z = (x ^ (x >> 17)) & 0b00000000000000000111111111111111U; x ^= z ^ (z << 17); #endif return x; } Just 4 steps! And so we can do 64 in 5 steps: static inline ulong jj3_revbin64_5(ulong x) { ulong z; z = (x ^ (x >> 1)) & 0x4a52a5294a52a529UL; x ^= z ^ (z << 1); z = (x ^ (x >> 3)) & 0x18c60c6318c60c63UL; x ^= z ^ (z << 3); z = (x ^ (x >> 10)) & 0x003e001f003e001fUL; x ^= z ^ (z << 10); z = (x ^ (x >> 17)) & 0x00007fff00007fffUL; x ^= z ^ (z << 17); z = (x ^ (x >> 32)) & 0x00000000ffffffffUL; x ^= z ^ (z << 32); return x; } Variants for 32 bits are (from here on binary literals only) uint jj3_revbin32_4b(uint x) { uint z; z = (x ^ (x >> 1)) & 0b01010000101000010100001010000101U; x ^= z ^ (z << 1); z = (x ^ (x >> 2)) & 0b00110010011001001100100110010011U; x ^= z ^ (z << 2); z = (x ^ (x >> 7)) & 0b00000001111000000011100000001111U; x ^= z ^ (z << 7); z = (x ^ (x >> 21)) & 0b00000000000000000000011111111111U; x ^= z ^ (z << 21); return x; } uint jj3_revbin32_4c(uint x) { uint z; z = (x ^ (x >> 1)) & 0b01001001001001001001001001001001U; x ^= z ^ (z << 1); z = (x ^ (x >> 3)) & 0b00011000100011000100011000100011U; x ^= z ^ (z << 3); z = (x ^ (x >> 9)) & 0b00000000011111000000000000011111U; x ^= z ^ (z << 9); z = (x ^ (x >> 18)) & 0b00000000000000000011111111111111U; x ^= z ^ (z << 18); return x; } uint jj3_revbin32_4d(uint x) { uint z; z = (x ^ (x >> 2)) & 0b00100100010010010010010010001001U; x ^= z ^ (z << 2); z = (x ^ (x >> 3)) & 0b00011100001110001110001110000111U; x ^= z ^ (z << 3); z = (x ^ (x >> 7)) & 0b00000001111110000000000000111111U; x ^= z ^ (z << 7); z = (x ^ (x >> 19)) & 0b00000000000000000001111111111111U; x ^= z ^ (z << 19); return x; } Now we have seen 4 possible variants (different sets of shifts), there are at most 29 variants, with possible shift sets 1: [ 1 2 7 21 ] 2: [ 1 2 8 20 ] 3: [ 1 3 6 21 ] 4: [ 1 3 7 20 ] 5: [ 1 3 8 19 ] 6: [ 1 3 9 18 ] 7: [ 1 3 10 17 ] 8: [ 1 4 6 20 ] 9: [ 1 4 7 19 ] 10: [ 1 4 8 18 ] 11: [ 1 4 10 16 ] 12: [ 1 4 12 14 ] 13: [ 2 3 6 20 ] 14: [ 2 3 7 19 ] 15: [ 2 3 8 18 ] 16: [ 2 3 9 17 ] 17: [ 2 3 10 16 ] 18: [ 2 3 12 14 ] 19: [ 2 4 5 20 ] 20: [ 2 4 7 18 ] 21: [ 2 4 10 15 ] 22: [ 2 4 11 14 ] 23: [ 2 4 12 13 ] 24: [ 2 5 6 18 ] 25: [ 2 6 7 16 ] 26: [ 2 6 8 15 ] 27: [ 2 6 9 14 ] 28: [ 2 6 10 13 ] 29: [ 2 6 11 12 ] (I have checked that some of them cannot work). No such routine reversing 64 bits in 4 steps exists. For 5 steps there are at most 481 variants. For 16 bits in 3 steps the possible shift sets are 1: [ 1 4 10 ] 2: [ 2 3 10 ] 3: [ 2 6 7 ] But (modulo mistake) none of these work, so we apparently need 4 steps. Now I see that http://oeis.org/A125173 list all my (polished) values for the minumum number of steps. Note the comment there. If anyone can access D. E. Knuth, Problem 11264, Amer. Math. Monthly 114(2007)77. I'd be grateful for a copy. The JSTOR link is http://www.jstor.org/stable/i27642112 Best, jj
participants (1)
-
Joerg Arndt