[math-fun] Interesting matrix problem
Input: an nxn(*) matrix M of floating point numbers (positive, negative or zero) which is *supposed* to have rank exactly one, i.e., it is the outer product of two n-element *unknown* vectors, and hence every row has a common factor and every column has a common factor, which factor can be positive, negative or zero. (*) The matrix doesn't have to be square, but it was easier to describe the square version of the problem Unfortunately, due to floating point approximations and and other issues, the values of M have small errors, so the columns and rows are not *exact* multiples of their common factors, and if the common factor is supposed to be zero, it may actually be a number exceedingly close to zero but at the minimum exponent range. NB: Due to *underflow*, floating point algebra has *zero divisors*, i.e., there exist x/=0, y/=0, s.t. x*y=0. Such a problem can trivially be posed as an SVD problem, but I wanted a more direct solution, because I wanted to get the 'best' fit to the data within the limits of the floating point arithmetic. I have a quite elegant solution and reasonably efficient solution, but I wanted to see if anyone else had the same idea.
What is the question? —Dan
On Wednesday/3February/2021, at 3:23 PM, Henry Baker <hbaker1@pipeline.com> wrote:
Input: an nxn(*) matrix M of floating point numbers (positive, negative or zero) which is *supposed* to have rank exactly one, i.e., it is the outer product of two n-element *unknown* vectors, and hence every row has a common factor and every column has a common factor, which factor can be positive, negative or zero.
(*) The matrix doesn't have to be square, but it was easier to describe the square version of the problem
Unfortunately, due to floating point approximations and and other issues, the values of M have small errors, so the columns and rows are not *exact* multiples of their common factors, and if the common factor is supposed to be zero, it may actually be a number exceedingly close to zero but at the minimum exponent range.
NB: Due to *underflow*, floating point algebra has *zero divisors*, i.e., there exist x/=0, y/=0, s.t. x*y=0.
Such a problem can trivially be posed as an SVD problem, but I wanted a more direct solution, because I wanted to get the 'best' fit to the data within the limits of the floating point arithmetic.
I have a quite elegant solution and reasonably efficient solution, but I wanted to see if anyone else had the same idea.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
Dan asks: what is the question? Find the (real) vectors U,V whose outer product makes the (real) matrix; i.e., U' V = M Clearly, for any real alpha, (U/alpha)' (alpha*V) = M so the vectors are determined only up to a single constant factor. Once again, the '=' wants to be as close as possible within the constraints of FP arithmetic. At 03:23 PM 2/3/2021, Henry Baker wrote:
Input: an nxn(*) matrix M of floating point numbers (positive, negative or zero) which is *supposed* to have rank exactly one, i.e., it is the outer product of two n-element *unknown* vectors, and hence every row has a common factor and every column has a common factor, which factor can be positive, negative or zero.
(*) The matrix doesn't have to be square, but it was easier to describe the square version of the problem
Unfortunately, due to floating point approximations and and other issues, the values of M have small errors, so the columns and rows are not *exact* multiples of their common factors, and if the common factor is supposed to be zero, it may actually be a number exceedingly close to zero but at the minimum exponent range.
NB: Due to *underflow*, floating point algebra has *zero divisors*, i.e., there exist x/=0, y/=0, s.t. x*y=0.
Such a problem can trivially be posed as an SVD problem, but I wanted a more direct solution, because I wanted to get the 'best' fit to the data within the limits of the floating point arithmetic.
I have a quite elegant solution and reasonably efficient solution, but I wanted to see if anyone else had the same idea.
----- Find the (real) vectors U,V whose outer product makes the (real) matrix; i.e., U' V = M Clearly, for any real alpha, (U/alpha)' (alpha*V) = M so the vectors are determined only up to a single constant factor. ----- I'm probably missing the obvious, but I wondered about that. It's true that Henry is starting from the assumption that there exists a solution. But instead: Suppose we want to try to solve for x, y, z, w the equation (A B) (x y)' (z w) = ( ) (C D) I.e., the four equations xz = A, xw = B, yz = C, yw = D for randomly chosen A, B, C, D (real or complex). When does there exist a solution? And what about the general case in higher dimensions: Let v, w be unknown vectors in K^n (K = R or C) with v' w = M, an arbitrary n x n matrix of constants in K. When does there exist a solution for v and w ??? —Dan
Re Dan's questions: As I pointed out in my original post, this problem can be solved with SVD (Singular Value Decomposition) of the input matrix. The answer is the first rows of the left & right SVD factors. The other rows of these orthogonal factors correspond to the non-principle singular values, which would be zero with no roundoff error, and extremely small with round-off error. What I wanted was a very simple (and hopefully completely *rational* -- not possible with SVD) solution (which I have demonstrated). At 06:39 PM 2/3/2021, Dan Asimov wrote:
----- Find the (real) vectors U,V whose outer product makes the (real) matrix; i.e., U' V = M Clearly, for any real alpha, (U/alpha)' (alpha*V) = M so the vectors are determined only up to a single constant factor. ----- I'm probably missing the obvious, but I wondered about that.
It's true that Henry is starting from the assumption that there exists a solution.
But instead: Suppose we want to try to solve for x, y, z, w the equation (A B) (x y)' (z w) = ( ) (C D)
I.e., the four equations xz = A, xw = B, yz = C, yw = D for randomly chosen A, B, C, D (real or complex).
When does there exist a solution?
And what about the general case in higher dimensions:
Let v, w be unknown vectors in K^n (K = R or C) with v' w = M, an arbitrary n x n matrix of constants in K.
When does there exist a solution for v and w ???
ÂDan
On 03/02/2021 23:23, Henry Baker wrote:
Input: an nxn(*) matrix M of floating point numbers (positive, negative or zero) which is *supposed* to have rank exactly one, ... Unfortunately, due to floating point approximations and and other issues, the values of M have small errors,
and asks: How can you find the vectors whose outer product M approximately is? The obvious approach is to remark that if M = uv'+h (h small) then Mb = (v.b)u + hb which is approximately a multiple of u, and M'a = (u.a)v + h'a which is approximately a multiple of v. So starting with any vector b not perpendicular to v, we can iterate a := (Mb)/b^2 b := (M'a)/a^2. (Why the b^2 and a^2 scaling factors? See the end of the next paragraph.) Before asking how fast this process converges (on the face of it is seems obviously first-order) we should ask what we expect it to converge _to_. Clearly not, in general, the true u,v, but we might hope for convergence to the u,v for which uv' is as close as possible to M. If we take this in the least-squares sense then the resulting derivative=0 equations say that u=Mv/v^2 and v=M'u/u^2; that is, that hv=0 and h'u=0. We still don't expect convergence to exactly u,v because we could equally converge to ku,v/k for some scalar k. But suppose that, given our current a,b, we pick the scaling of u,v to make a=u+d with d as small as possible. In particular, this implies u.d=0. Then our new approximation to v is M'(u+d)/(u+d)^2 = (vu'+h')(u+d)/(u+d)^2 = (u^2 v + h'd)/(u^2+d^2) where the various cross-terms vanish because u.d=0 and h'u=0. So to first order in d, this is v + h'd / u^2 = v + e, say. A little calculation shows that |e|/|v| is at most R/|u||v| times |d|/|u|, where R is the 2-norm of h; of course |u||v| is the 2-norm of uv'. So our errors go down by at worst a factor of about |h|/|uv'| at each iteration. If the errors in M are as small as the words "floating point approximations and other issues" suggest, it shouldn't take more than a few iterations for this to converge to machine accuracy. (Again, convergence will be not to the true u,v but to whichever u,v actually yield the best approximation to the perturbed M.) -- g
Le mer. 3 févr. 2021 à 19:24, Henry Baker <hbaker1@pipeline.com> a écrit :
Input: an nxn(*) matrix M of floating point numbers (positive, negative or zero) which is *supposed* to have rank exactly one, i.e., it is the outer product of two n-element *unknown* vectors (...)
When M[i,j] = U[i] V[j], i.e., M = U V' where U and V are column vectors / matrices and ' = transpose, then the rows of M are M[i,.] = U[i] V' and the columns of M are M[.,j] = V[j] U. So any row of M, or any (nonzero) linear combination of these, is a possible choice for V', and similar for columns of M and U. In general, the larger the numbers, the less important should be the influence of small rounding errors. So a valid choice could be to add up all of the rows, with signs flipped so that all the rows have the same "direction" (to avoid cancellations), to get a component-wise average, weighted by the respective norm, for V', and similar for U. I have a quite elegant solution and reasonably efficient solution, but I wanted to see if anyone else had the same idea.
Does the above go in the same direction of your idea, Henry ? - Maximilian
On Thu, Feb 4, 2021 at 2:02 PM M F Hasler <mhasler@dsi972.fr> wrote:
Le mer. 3 févr. 2021 à 19:24, Henry Baker <hbaker1@pipeline.com> a écrit :
Input: an nxn(*) matrix M of floating point numbers (positive, negative or zero) which is *supposed* to have rank exactly one, i.e., it is the outer product of two n-element *unknown* vectors (...)
When M[i,j] = U[i] V[j], i.e., M = U V' where U and V are column vectors / matrices and ' = transpose, then the rows of M are M[i,.] = U[i] V' and the columns of M are M[.,j] = V[j] U. So any row of M, or any (nonzero) linear combination of these, is a possible choice for V', and similar for columns of M and U. In general, the larger the numbers, the less important should be the influence of small rounding errors.
So a valid choice could be to add up all of the rows, with signs flipped so that all the rows have the same "direction" (to avoid cancellations), to get a component-wise average, weighted by the respective norm, for V', and similar for U.
The problem with this is how you define "same direction" in n-dimentional space. You can't just look at the signs; two vectors may have the same sign in one coordinate and different signs in a second coordinate. If you flip one so that they have the same second coordinate, they have different first coordinates. A slightly more sophisticated approach would be to say that two vectors are in the "same direction" if their dot product is positive, so if their dot product is negative, flip one of them. But this doesn't work for more than two vectors, because the "same direction" relation is not transitive. Andy
I have a quite elegant solution and reasonably efficient solution,
but I wanted to see if anyone else had the same idea.
Does the above go in the same direction of your idea, Henry ?
- Maximilian _______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
-- Andy.Latto@pobox.com
On Thu, 4 Feb 2021, 15:50 Andy Latto, <andy.latto@pobox.com> wrote:
The problem with this is how you define "same direction" in n-dimentional
space. You can't just look at the signs; two vectors may have the same sign in one coordinate and different signs in a second coordinate.
No they can't. I first had written a more detailed description of how to flip the sign but i have discarded it in favour of this simpler wording which is indeed unambiguous. Convince yourself that either all components have the same sign (as the respective component of the other vector) or all have the opposite signs of those of the other vector, when the two vectors are proportional (aka collinear). But FWIW, you can (and should) look just at the sign of the component which has the largest absolute value. (For the components close to zero, accumulated rounding and similar errors could indeed give an incorrect sign, viz –0 ≈ +0. But the sign of the largest components must be significant, unless the whole data has become random noise.) - Maximilian If you flip
one so that they have the same second coordinate, they have different first coordinates.
A slightly more sophisticated approach would be to say that two vectors are in the "same direction" if their dot product is positive, so if their dot product is negative, flip one of them.
But this doesn't work for more than two vectors, because the "same direction" relation is not transitive.
Andy
I have a quite elegant solution and reasonably efficient solution,
but I wanted to see if anyone else had the same idea.
Does the above go in the same direction of your idea, Henry ?
- Maximilian
participants (5)
-
Andy Latto -
Dan Asimov -
Gareth McCaughan -
Henry Baker -
M F Hasler