After hearing about the new "carry-less multiplication" instructions coming soon to computers near you, it occurred to me that even without hardware help, such operations perhaps aren't so slow as I initially thought. Karatsuba squaring. p(x) is a polynomial over GF(2), where a "1" bit at bit position k represents 1*x^k. Adding is easy: p(x)+q(x) (mod 2) = p(x) xor q(x). Mod 2, subtraction is the same as addition. 2*p(x) (mod 2) = 0. If we split p(x) = q(x)*x^m+r(x) into two pieces q(x),r(x) of approximately the same size, q(x) is the "high part", and r(x) is the "low part". Then p(x)^2 = (q(x)*x^m+r(x))^2 = q(x)^2*x^2m + 2*q(x)*r(x)*x^m + r(x)^2 = q(x)^2*x^2m + r(x)^2 (2*anything = 0 mod 2, so middle part drops out.) So Karatsuba squaring is almost trivial: break the poly into 2 ~ equal-sized pieces, square each piece, and then put them together with a shift & xor. However, the ease of squaring backfires, as we can no longer use squaring to emulate multiplication: a*b = [(a+b)^2-(a-b)^2]/4 unfortunately in GF(2) becomes: a*b = [(a xor b)^2 xor (a xor b)^2]/0 = 0/0 ! So we need the traditional Karatsuba multiplication method: p(x) = q(x)*x^m+r(x) s(x) = t(x)*x^m+u(x) p(x)*s(x) = (q(x)*x^m + r(x)) * (t(x)*x^m + u(x)) = q(x)*t(x)*x^2m + ((q(x)+r(x))*(t(x)+u(x))-q(x)*t(x)-r(x)*u(x))*x^m + r(x)*u(x) = q(x)*t(x)*x^2m xor ((q(x) xor r(x))*(t(x) xor u(x)) xor q(x)*t(x) xor r(x)*u(x))*x^m xor r(x)*u(x) ----- Question: Is it possible to emulate multiplication using p(x)^3 or p(x)^5 (mod 2) ?