Fast Divisionless Determinant and Characteristic Polynomial of Matrix Warren D. Smith, Nov. 2011 By "characteristic polynomial" of a matrix M we shall mean charpoly(M,x) = det(I*(1-x)+M*x) = det(I-[I-M]*x) which is not exactly the same as the usual definition but the two are easily interconverted. When x=1 this is det(M) and when x=0 it is det(I)=1. Consider the 2nX2n block matrix (with nXn blocks) M = [E F] [G H] One form of the Schur identity is det(M) = det(E H - E G inverse(E) F) which leads to one way to evaluate det(M) in a number of steps of the same order as an nXn matrix multiplication and nXn matrix inversion. [And it is impossible to evalate det(M) is smaller order #steps, as can be shown using fact the gradient of det is the adjoint so we could use det plus algorithmic differentiation to compute matrix inverse fast.] If M happens to be the identity matrix this recursive process generates smaller identity matrices at every step on the way. Here however we shall be interested in no-division algorithms. Let D=I-E. charpoly(M,x) = charpoly(E,x) * det( [I-H]*x - G inverse(I-D*x) F*x*x) = charpoly(E,x) * det( [I-H]*x - G [I+xD+x^2D^2+x^3D^3+...] F *x*x) where in the infinite series it suffices to stop after terms in x^n are reached (or stop anyplace further on), because this is a formal Maclaurin series which we know must be a polynomial in x of degree n so we know higher powers of x are all going to come out with coefficient=0 in the end so we do not need to calculate them. Now use the product (1+y+y^2+y^3+...+y^127) = (1+y^64)(1+y^32)(1+y^16)(1+y^8)(1+y^4)(1+y^2)(1+y) to evaluate the series fast (and use repeated squaring of course). We then have charpoly(M,x) = charpoly(E,x) * det(H - G [I+xD][I+x^2D^2][I+x^4D^4]...[I+x^pD^p] F *x*x) where P is any power of 2 with P>=n/2. This halves the size of the matrix and after log(n) recursive levels we bottom out at 1x1 matrices. When x-->0 this recursive process generates identity matrices every step of the way. Then for general x we get Maclaurin series whose constant terms are identity matrices every step of the way, which enables us the compute the needed matrix inverse using the 1/(I-K*x) Maclaurin expansion as a product trick as above, each step of the way. To multiply two nXn matrices with scalar entries takes MatMult(n)=O(n^2.376) steps. [Don Coppersmith & Shmuel Winograd: Matrix multiplication via arithmetic progressions J. Symbolic Computation 9,3 (1990) 251-280.] To multiply two nXn matrices with kth-degree polynomial(x) entries takes MatMult(n)*k+O(k*logk*n^2) steps, which is O(k*n^2.376) steps if logk<n^0.376, by using FFT-based fast convolution to multiply polynomials with O(k) coefficient-multiplications. The product has log(n) terms and is multiplied in tree-fashion. The whole recursive scheme's runtime basically obeys T(2*n) = 2*T(n) + MatMult(n)*n*logn hence T(n) = O(Matmult(n)*n*log(n)) = O(n^3.38). THEOREM (DIVISIONLESS DET & CHARPOLY) What we have called the "characteristic polynomial" of an matrix M, can be computed (i.e. all its coefficients computed) in O(n^3.38) arithmetic operations if M is nXn, by a scheme which never divides two matrix entries; it only uses +, -, and * and employs O(log(n)) weird scalar constants inside the FFT. [Then the determinant of M is just this polynomial evaluated at x=1.] This runtime is a factor O(n) slower than the number of steps required for det(M) if division is allowed. Also note, our divisionless algorithm is massively parallelizable. Also note, we can by storing the "computation tree" as we proceed, then afterwards "backing down the tree" from the root (det) to the leaves( matrix entries) using the chain rule compute the partial derivatives of the det with respect to all the matrix entries, i.e. compute the matrix Adjoint (or whatever it is called, "Adjugate" maybe) also in this same time bound and also without use of any division, but need a lot of memory. ( See "reverse" method in http://en.wikipedia.org/wiki/Automatic_differentiation ) It also is possible to achieve O(n^3)-step determinant evaluation using only ONE division by triangularizing the matrix using row linear combinations, keeping track of the multiplication factor as you go. Det(triangular matrix) is then trivial (product of diagonal entries) and then at the end perform the one division, by the product of multiplication factors. (But this method is a lot less massively parallelizable.) Indeed, you can achieve O(MatMult(n)) steps by using rational numbers to represent intermediate results in a fast det algorithm then do the one division in the final rational number. It seems a bit odd that reducing from one to zero divisions costs you this much extra work. Or perhaps it doesn't. If you're smarter.