Re: [math-fun] Interesting matrix problem
Re: The 'textbook' solution using SVD's is looking better all the time, as the SVD *automatically* finds the *closest* rank-1 decomposition of the input matrix, which doesn't even have to pretend to have been a rank-1 matrix. Indeed, I tried a Jacobi-like SVD algorithm on some of these purported rank-1 matrices, and it converges extremely quickly because -- unlike non-rank-1 matrices -- each of these Jacobi/Givens rotations zeros out entire rows and columns and furthermore, these rotations don't screw up the previously zeroed rows and columns. A major downside to the SVD algorithm on these rank-1 matrices is that the row and column produced has been *normalized* to 1, and the *scale factor*, the first (the largest, and hopefully only non-zero) singular value is *irrational*. The other (non-'principal') singular values characterize the 'noise', which might be helpful if you're trying to reduce this noise. At 12:39 PM 2/6/2021, Henry Baker wrote:
Oops!!
There still is a possibility of *worsening* the error if the noise causes a sign change in one of the non-principal rows or columns.
A more robust solution would be to compute the correlation of the other rows/columns with the principal rows/columns, and choose the sign suggested by the correlation.
If the abs value of the correlation is too low, don't bother including it in the sum, as it is 'obviously' all noise.
The 'textbook' solution using SVD's is looking better all the time, as the SVD *automatically* finds the *closest* rank-1 decomposition of the input matrix, which doesn't even have to pretend to have been a rank-1 matrix.
At 09:34 AM 2/6/2021, Henry Baker wrote:
"Find the element in the matrix with the largest absolute value. Use the row and column containing that element, and ignore everything else."
The following idea might rescue another few bits from the noise:
Call the elt with the max abs value "maxelt", but maxelt keeps its sign.
Call the row and column with "maxelt" the "principal row" and the "principal" column. Let 'pi' be the row# for the principal row and 'pj' be the col# for the principal col. Note that maxelt = M[pi,pj].
Sum the rows as follows:
rowsums : rowsums + row[M,i]*signum(M[i,pj])*signum(maxelt)
Note that M[i,pj]*signum(M[i,pj]) = abs(M[i,pj]).
This makes the sign of the elt of the pj'th column of the i'th row *match* the sign of maxelt, and thereby avoids cancellation.
Also, since we aren't scaling these other (smaller) rows up, we won't be amplifying the noise.
Ditto for the column sums:
colsums : colsums + col[M,j]*signum(M[pi,j])*signum(maxelt)
Now use the row sums and column sums (suitably scaled) as our answer.
At 08:37 PM 2/4/2021, Andy Latto wrote:
Given that we know that the error is largest for the smallest elements, and all the columns are scalar multiples of any given row (and equivalently the same for the columns), is this algorithm significantly better than:
Find the element in the matrix with the argest absolute value.
Use the row and column containing that element, and ignore everything else.
Andy
On Thu, Feb 4, 2021 at 9:21 PM Henry Baker <hbaker1@pipeline.com> wrote:
We first show how to solve the problem when M is the outer product of *non-negative real* vectors.
Lemma 1.
Suppose we have a matrix M that purports to have the following structure:
[a b c d]' [p q r s] =
[ a p a q a r a s ] [ ] [ b p b q b r b s ] [ ] [ c p c q c r c s ] [ ] [ d p d q d r d s ]
where >>> all of a,b,c,d,p,q,r,s >= 0 <<<
The row sums are
[a (s + r + q + p), b (s + r + q + p), c (s + r + q + p), d (s + r + q + p)]
The column sums are
[(d + c + b + a) p, (d + c + b + a) q, (d + c + b + a) r, (d + c + b + a) s]
The total matrix sum is
(d + c + b + a) (s + r + q + p)
If M is the zero matrix, then M is the outer product of 2 zero vectors.
If M isn't the zero matrix, then the row sum>0, the col sum>0, and the total sum>0.
So we construct U' V = M as follows.
U = the row sums:
[a (s + r + q + p), b (s + r + q + p), c (s + r + q + p), d (s + r + q + p)]
V = the column sums/(total sum):
p q r s [-------------, -------------, -------------, -------------] s + r + q + p s + r + q + p s + r + q + p s + r + q + p
Or, we can split the scale factor (total sum) evenly:
U = row sums/sqrt(total sum):
a sqrt(s + r + q + p) b sqrt(s + r + q + p) c sqrt(s + r + q + p) d sqrt(s + r + q + p) [---------------------, ---------------------, ---------------------, ---------------------] sqrt(d + c + b + a) sqrt(d + c + b + a) sqrt(d + c + b + a) sqrt(d + c + b + a)
V = col sums/sqrt(total sum):
sqrt(d + c + b + a) p sqrt(d + c + b + a) q sqrt(d + c + b + a) r sqrt(d + c + b + a) s [---------------------, ---------------------, ---------------------, ---------------------] sqrt(s + r + q + p) sqrt(s + r + q + p) sqrt(s + r + q + p) sqrt(s + r + q + p)
So these scaled row-sums and scaled col-sums are the vectors we seek. QED
NB: It should now be clear why the row-sums and col-sums won't work when we include negative elements; e.g., there exist matrices M whose row-sums and col-sums are both *zero*, so the procedure above fails completely. ---
Now we handle the general case, where M consists of both positive and negative entries.
We first find a,b,c,d>=0 and p,q,r,s>=0 s.t. [a b c d]' [p q r s] = abs(M), where abs(M) is M with all of its entries replaced by their absolute values.
Now we assign +- *signs* to the elements a,b,c,d,p,q,r,s of U,V using the following scheme.
Process the non-negative vectors U=[a b c d] and V=[p q r s] in descending order of absolute value.
The largest element of M is the product of the largest element of U and the largest element of V. Assume that this element is |b|*|r| = |M[i,j]|.
We need to assign *signs* to b and to r, s.t. (b*r) = M[i,j], i.e., so that the sign(b)*sign(r)=sign(M[i,j]).
Since this is the first assignment, we can arbitrarily choose sign(b)=+1, and then set sign(r)=sign(M[i,j]).
We then process each of the other elements of U and V in descending order of absolute values, choosing signs for the remaining elements of U and the remaining elements of V so they are *consistent* with the previous choices of signs.
If any M[i,j]=0, it has no effect on any choice of signs for elements of U,V.
Due to round-off or other errors, the assignment near the end of this process -- i.e., when the elements of U and V are very close to zero, and we may find inconsistencies. However, these inconsistencies are extremely small (assuming that the input matrix was indeed produced as an outer product), and by processing the elements of U,V in descending order of absolute value, we ensure that the signs are consistent for the *strongest signals*, so that any inconsistencies are due to small round-off errors.
At 05:02 PM 2/3/2021, Henry Baker wrote:
Dan asks: what is the question?
Find the (real) vectors U,V whose outer product makes the (real) matrix; i.e.,
U' V = M
Clearly, for any real alpha,
(U/alpha)' (alpha*V) = M
so the vectors are determined only up to a single constant factor.
Once again, the '=' wants to be as close as possible within the constraints of FP arithmetic.
At 03:23 PM 2/3/2021, Henry Baker wrote:
Input: an nxn(*) matrix M of floating point numbers (positive, negative or zero) which is *supposed* to have rank exactly one, i.e., it is the outer product of two n-element *unknown* vectors, and hence every row has a common factor and every column has a common factor, which factor can be positive, negative or zero.
(*) The matrix doesn't have to be square, but it was easier to describe the square version of the problem
Unfortunately, due to floating point approximations and and other issues, the values of M have small errors, so the columns and rows are not *exact* multiples of their common factors, and if the common factor is supposed to be zero, it may actually be a number exceedingly close to zero but at the minimum exponent range.
NB: Due to *underflow*, floating point algebra has *zero divisors*, i.e., there exist x/=0, y/=0, s.t. x*y=0.
Such a problem can trivially be posed as an SVD problem, but I wanted a more direct solution, because I wanted to get the 'best' fit to the data within the limits of the floating point arithmetic.
I have a quite elegant solution and reasonably efficient solution, but I wanted to see if anyone else had the same idea. -- Andy.Latto@pobox.com
Le dim. 7 févr. 2021 à 16:57, Henry Baker <hbaker1@pipeline.com> a écrit :
Re: The 'textbook' solution using SVD's is looking better all the time,
Better than what? Do you have a "test suite", and if so (or otherwise), what do you get comparing that method and the "find the largest element..." solution or "add all rows / cols modulo sign flip" proposal ? I would like to see such data/results. I would be glad to provide my algorithm(s) in any current programming language. here it is in PARI/GP: /* simple variant: find the index (r,c) of the maximal element m = max{abs(M[i,j])}, return U = col. c and V' = row r of M with suitable normalization: Both (col. c and row r) have M[r,c] as a component, so we divide one of them by M[r,c] to get indeed U V' = M. One might also divide both by sqrt(m), one with the sign of M[r,c] if negative. */ getUVsimple(M)={ my(i); vecmax(abs(M),&i); [ M[,i[2]]/M[i[1],i[2]] , M[i[1],] ] } /* "average" variant: sum over rows & cols with possible sign flip to avoid cancellations. Linear combination of the rows is equivalent to left multiplication by a row matrix, similar for columns with right multiplication by a column vector / matrix. */ getUVsum(M)={ my(i,U,V); vecmax(abs(M),&i)/*use largest element as reference for sign*/; U = M*apply(sign,M[i[1],])~ ; V = apply(sign,M[,i[2]])~ *M; U *= M[i[1],i[2]] /(U[i[1]] * V[i[2]])/* normalize: want / have */ ; [U,V]} Both of these seem to work well for ill conditioned matrices. I tried for M = U V' with U and V having large and small elements of different sign, but I did not test with ("explicitly") perturbed M and also don't have the "textbook/SVD solution(s)" ready for use. - Maximilian
participants (2)
-
Henry Baker -
Maximilian Hasler