[math-fun] matrix multiply over GF(2) -- yet another machine instruction we wish we had
Be nice if one could create an NxN bit-matrix (for example N=machine's wordsize) and then multiply it (mod 2) by an N-element bit vector (e.g machine word) in 1 instruction. This would help with cryptography, error correcting codes, and probably a lot of other stuff. Just saying. -- Warren D. Smith http://RangeVoting.org <-- add your endorsement (by clicking "endorse" as 1st step)
On 4/1/16, Warren D Smith <warren.wds@gmail.com> wrote:
Be nice if one could create an NxN bit-matrix (for example N=machine's wordsize) and then multiply it (mod 2) by an N-element bit vector (e.g machine word) in 1 instruction.
This would help with cryptography, error correcting codes, and probably a lot of other stuff.
--such as permuting bits in a word, would be another application. Also, if the XORs that were used to add up the sums mod 2, were replaced by plain ORs, then that other kind of matrix-vector "multiply" would be excellent for graph connectivity calculations.
This “binary matrix multiply” is one of the key instructions requested by NSA in the supercomputers designed by Cray etc. It’s somewhat remarkable that it has not been implemented in more mainstream architectures. I’d be surprised it is missing in the NVIDIA instruction set, for example, since they are successors of the supercomputer crowd.
On Apr 1, 2016, at 1:46 PM, Warren D Smith <warren.wds@gmail.com> wrote:
On 4/1/16, Warren D Smith <warren.wds@gmail.com> wrote:
Be nice if one could create an NxN bit-matrix (for example N=machine's wordsize) and then multiply it (mod 2) by an N-element bit vector (e.g machine word) in 1 instruction.
This would help with cryptography, error correcting codes, and probably a lot of other stuff.
--such as permuting bits in a word, would be another application.
Also, if the XORs that were used to add up the sums mod 2, were replaced by plain ORs, then that other kind of matrix-vector "multiply" would be excellent for graph connectivity calculations.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
Here's Intel's latest & greatest: http://www.theregister.co.uk/2016/03/31/intel_broadwell_ep_xeon_e5_2600_v4/ Intel's Broadwell Xeon E5-2600 v4 chips: So what's in it for you, smartie-pants coders New instructions, transactions, virtualization features and more --- Knuth also proposed some cool bit vector/matrix instructions, but I can't find the link just now. Vaughan Pratt showed that very long bit vector ("Boolean vector") instructions can be surprisingly powerful. --- Intel, AMD, nVidia may very well have undocumented and/or firmware-additional instructions specially designed by NSA, but unavailable to the unwashed. If NOBUS (Google it) is going to mean anything for brute force attacks, then NOBUS brute force attacks might as well have a 10x advantage over everyone else's. nVidia's new ARM chips have a 'soft' architecture, in which machine instructions are merely 'suggestions' about which operations to really perform. Although this 'advance' is touted as a way to increase performance -- e.g., in automotive video applications for driverless cars -- it is likely the world's least secure architecture. God only knows what's really going on inside these chips. At 10:57 AM 4/1/2016, Tom Knight wrote:
This "binary matrix multiply" is one of the key instructions requested by NSA in the supercomputers designed by Cray etc. It's somewhat remarkable that it has not been implemented in more mainstream architectures. I'd be surprised it is missing in the NVIDIA instruction set, for example, since they are successors of the supercomputer crowd.
On Apr 1, 2016, at 1:46 PM, Warren D Smith <warren.wds@gmail.com> wrote:
On 4/1/16, Warren D Smith <warren.wds@gmail.com> wrote:
Be nice if one could create an NxN bit-matrix (for example N=machine's wordsize) and then multiply it (mod 2) by an N-element bit vector (e.g machine word) in 1 instruction.
This would help with cryptography, error correcting codes, and probably a lot of other stuff.
--such as permuting bits in a word, would be another application.
Also, if the XORs that were used to add up the sums mod 2, were replaced by plain ORs, then that other kind of matrix-vector "multiply" would be excellent for graph connectivity calculations.
On 01/04/2016 18:46, Warren D Smith wrote:
On 4/1/16, Warren D Smith <warren.wds@gmail.com> wrote:
Be nice if one could create an NxN bit-matrix (for example N=machine's wordsize) and then multiply it (mod 2) by an N-element bit vector (e.g machine word) in 1 instruction.
This would help with cryptography, error correcting codes, and probably a lot of other stuff.
--such as permuting bits in a word, would be another application.
Also, if the XORs that were used to add up the sums mod 2, were replaced by plain ORs, then that other kind of matrix-vector "multiply" would be excellent for graph connectivity calculations.
As Henry Baker says, Knuth has proposed something very much along these lines: see http://mmix.cs.hm.edu/doc/instructions-en.html#Bit_Operations and look in the section headed "Mix and Match". His MXOR operation is an 8x8 matrix multiplication over GF(2), and his MOR operation is the same with OR instead of XOR for the addition. It's matrix * matrix rather than matrix * vector, and the size is chosen so that the whole matrix (rather than just one row or column) fits in a (64-bit) machine word. -- g
Gareth wrote:
As Henry Baker says, Knuth has proposed something very much along these lines: see http://mmix.cs.hm.edu/doc/instructions-en.html#Bit_Operations and look in the section headed "Mix and Match". His MXOR operation is an 8x8 matrix multiplication over GF(2), and his MOR operation is the same with OR instead of XOR for the addition.
It's matrix * matrix rather than matrix * vector, and the size is chosen so that the whole matrix (rather than just one row or column) fits in a (64-bit) machine word.
Since MXOR doesn't exist, what's the most efficient way to emulate it using existing instructions (representing matrices as uint64s)? I tried the obvious place http://www.hackersdelight.org/hdcode.htm to no avail. That contains a nice implementation to transpose such a matrix, but not to multiply matrices. A.B^T is easier to calculate than A.B, I believe. In particular, you can take the eight uint64s, where >>> indicates cyclic shift: C_0 = A & B C_1 = A & (B >>> 8) C_2 = A & (B >>> 16) ... C_7 = A & (B >>> 56) and apply the following operations in parallel (using SSE or AVX): C_i ^= (C_i >> 1) C_i ^= (C_i >> 2) C_i ^= (C_i >> 4) C_i &= 1 which will result in 64 bytes, each of which contains either 0 or 1. We want to pack these bits into a single uint64, like so: C = C_0 | (C_1 << 1) | (C_2 << 2) | ... | (C_7 << 7) and possibly do some annoying permutation on these bits. Just out of interest, *why* isn't MXOR in the Intel instruction set? It doesn't seem too complicated to implement, and it's useful both in cryptography and data analysis (see 'persistent homology'). Best wishes, Adam P. Goucher
participants (5)
-
Adam P. Goucher -
Gareth McCaughan -
Henry Baker -
Tom Knight -
Warren D Smith