Re: [math-fun] carryless multiplication (or other techniques?) for bit-permuting
I've investigated Warren's permutation problem in several different guises. I studied how to implement arbitrary permutations in APL using the APL transpose & restructure operations. If we have an n-dimensional array in APL, transpose can exchange indices; if we have a vector in APL, restructure can give it a new interpretation as an n-dimensional array. I also studied how to implement arbitrary permutations on the top k elements of a stack (in a stack machine). And then there's also Warren's bit-permutation problem. The basic problem in all of these permutation problems is straight-forward information theory: how many bits does it take to specify which permutation you're talking about in the first place. If all permutations are equally likely than we have log(n!) ~ O(n*log(n)) bits. So even if you wanted to specify an arbitrary permutation of the 64 bits in a 64-bit word, you would need 296 bits (= log2(64!)). At 10:33 AM 6/8/2012, Warren Smith wrote:
It is not so easy to permute bits, actually.
Say you have a W-bit word (W=64?). You want to permute the lower-significance K bits in some arbitrary known manner. How do you do it? Here is one way:
Step 1: multiply by 100001000010000... where the period is K. Carryless or regular multiply both work here and below.
Step 2: AND with some magic bit mask with one 1-bit per K-chunk chosen to get exactly one copy of each original bit, one per chunk, ordered in arbitrary known order.
Step 3: multiply by the correct magic sparse-1's number so that the correct permuted K bits magically appear in the most-signif K bits of the product word. It is possible here that if you are using regular (not carryless) multiply, you will need about log(K) buffer bits between the top K bits used for output, and the bottom K*K bits used for input.
Step 4: shift right by W-K+BuffSize hops to reposition in original K bit locations.
This works if BuffSize+K*K<=W, so if W=64 we can permute 8 bits with carryless, and maybe somewhat fewer with regular, multiply.
That sucks. I mean, to permute 8 bits I could've just used a 256-byte lookup table.
Looking in the FxtBook, http://www.jjj.de/fxt/fxtbook.pdf section 1.29 fails to answer the question though it does suggest future machine instructions that would.
This is assuredly true. On the other hand, being able to select bits and copy them/or them quickly is pretty difficult for current CPUs yet pretty important. Knuth spends a lot of time discussing how to do it using currently available ops, and frankly, it's just tough. Being able to select and copy 10 bits at a time (10 x 6 bits for index) from a 64-bit word into another would let you do 10-bit permutations in one instruction, 20 bits in two, 30 in 3, etc. Using the remaining bits to select granularity size so you could copy bit pairs, nybbles, bytes, etc. might be a cool hack. (SSE does have a byte permute instruction.) Some small but not insignificant fraction of the 35 CPU years needed for my recent Rubik work went to accessing a lookup table and doing a few shifts and ors to do what should be a pretty simple 24-bit permutation. On the other hand, only a few of us are really asking for such an instruction, and a lookup table can sometimes get you within shouting distance. Furthermore, it's not like C compilers are going to be automatically generating code for use cases of this anytime soon. -tom On Fri, Jun 8, 2012 at 1:05 PM, Henry Baker <hbaker1@pipeline.com> wrote:
I've investigated Warren's permutation problem in several different guises.
I studied how to implement arbitrary permutations in APL using the APL transpose & restructure operations. If we have an n-dimensional array in APL, transpose can exchange indices; if we have a vector in APL, restructure can give it a new interpretation as an n-dimensional array.
I also studied how to implement arbitrary permutations on the top k elements of a stack (in a stack machine).
And then there's also Warren's bit-permutation problem.
The basic problem in all of these permutation problems is straight-forward information theory: how many bits does it take to specify which permutation you're talking about in the first place. If all permutations are equally likely than we have log(n!) ~ O(n*log(n)) bits.
So even if you wanted to specify an arbitrary permutation of the 64 bits in a 64-bit word, you would need 296 bits (= log2(64!)).
At 10:33 AM 6/8/2012, Warren Smith wrote:
It is not so easy to permute bits, actually.
Say you have a W-bit word (W=64?). You want to permute the lower-significance K bits in some arbitrary known manner. How do you do it? Here is one way:
Step 1: multiply by 100001000010000... where the period is K. Carryless or regular multiply both work here and below.
Step 2: AND with some magic bit mask with one 1-bit per K-chunk chosen to get exactly one copy of each original bit, one per chunk, ordered in arbitrary known order.
Step 3: multiply by the correct magic sparse-1's number so that the correct permuted K bits magically appear in the most-signif K bits of the product word. It is possible here that if you are using regular (not carryless) multiply, you will need about log(K) buffer bits between the top K bits used for output, and the bottom K*K bits used for input.
Step 4: shift right by W-K+BuffSize hops to reposition in original K bit locations.
This works if BuffSize+K*K<=W, so if W=64 we can permute 8 bits with carryless, and maybe somewhat fewer with regular, multiply.
That sucks. I mean, to permute 8 bits I could've just used a 256-byte lookup table.
Looking in the FxtBook, http://www.jjj.de/fxt/fxtbook.pdf section 1.29 fails to answer the question though it does suggest future machine instructions that would.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
-- -- http://cube20.org/ -- http://golly.sf.net/ --
participants (2)
-
Henry Baker -
Tom Rokicki