* Joerg Arndt <arndt@jjj.de> [Apr 19. 2014 07:37]:
* Warren D Smith <warren.wds@gmail.com> [Apr 18. 2014 18:50]:
[...]
Henry Warren's book "Hacker's delight" already discussed the 8x8 matrix transpose, which is one of these 7 problems. He ended up with the following incredible algorithm:
uint64 TransposeBitBd(BitBd x){ //Adapted from Warren:: Hacker's delight. uint64 t; t = (x ^ (x >> 7)) & U64(0x00AA00AA00AA00AA); x = x ^ t ^ (t << 7); t = (x ^ (x >> 14)) & U64(0x0000CCCC0000CCCC); x = x ^ t ^ (t << 14); t = (x ^ (x >> 28)) & U64(0x00000000F0F0F0F0); x = x ^ t ^ (t << 28); return(x); }
Good luck figuring it out.
[...]
It is also a good idea to look at the binary bit masks: ulong transpose_bit_board(ulong x) { // from "Hacker's delight": transpose8rS64(...) from file transpose8.c ulong t; t = (x ^ (x >> 7)) & 0x00aa00aa00aa00aaUL; x = x ^ t ^ (t << 7); // mask = 0b0000000010101010000000001010101000000000101010100000000010101010UL t = (x ^ (x >> 14)) & 0x0000cccc0000ccccUL; x = x ^ t ^ (t << 14); // mask = 0b0000000000000000110011001100110000000000000000001100110011001100UL t = (x ^ (x >> 28)) & 0x00000000f0f0f0f0UL; x = x ^ t ^ (t << 28); // mask = 0b0000000000000000000000000000000011110000111100001111000011110000UL return x; } Now break the masks up into boards (dots for zeros) ........ 1.1.1.1. ........ 1.1.1.1. ........ 1.1.1.1. ........ 1.1.1.1. ........ ........ 11..11.. 11..11.. ........ ........ 11..11.. 11..11.. ........ ........ ........ ........ 1111.... 1111.... 1111.... 1111.... And that should make things evident. Use a quarter of the first two masks as 16 bit words to get the routine I asked for in my prior message. Btw. the largest n such that an n x n matrix can be transposed in three steps (assuming words of n^2 bits) should be 14. 16 x 16 (256 bit words) sadly need 4. Best, jj