[math-fun] least-squares derivation ?
In common discussions of least squares, the parameters (m,b) are estimated for the equation y = m*x+b using as data various datapoints [x1,y1], [x2,y2], [x3,y3], etc. For example, in Wikipedia (where m=beta2 and b=beta1): https://en.wikipedia.org/wiki/Linear_least_squares#Example So far, so good. Now, if I merely exchange x and y, then my equation is x = m'*y+b', where should be m' = 1/m and b' = -b/m. (Let's ignore the case where the best m=0.) However, if I then estimate (m',b') using the same least squares method, I don't get (1/m,-b/m) ! So either I'm doing something wrong, or perhaps there is a more symmetric least squares method that treats x and y symmetrically ??
Standard least squares measures the *vertical* distance to the line from each point and minimizes the sum of the squares. Interchanging x and y is equivalent to switching to measuring the *horizontal* distance. You can instead measure the perpendicular distance, which is invariant under interchanging x and y (or any other isometry for that matter) but the solution is correspondingly hairier. On Thu, Oct 4, 2018 at 9:24 AM Henry Baker <hbaker1@pipeline.com> wrote:
In common discussions of least squares, the parameters (m,b) are estimated for the equation y = m*x+b using as data various datapoints [x1,y1], [x2,y2], [x3,y3], etc.
For example, in Wikipedia (where m=beta2 and b=beta1):
https://en.wikipedia.org/wiki/Linear_least_squares#Example
So far, so good.
Now, if I merely exchange x and y, then my equation is x = m'*y+b', where should be m' = 1/m and b' = -b/m. (Let's ignore the case where the best m=0.)
However, if I then estimate (m',b') using the same least squares method, I don't get (1/m,-b/m) !
So either I'm doing something wrong, or perhaps there is a more symmetric least squares method that treats x and y symmetrically ??
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
And let's not forget that there are good statistical reasons for minimizing the sums of squares of vertical distances, since this gives you the best estimate if y is actually a linear function of x plus Gaussian noise. If you assumed the noise was in the x data, not the y, then looking at minimizing horizontal distance would be appropriate, but almost always it is the y variable that is considered noisy, and x is known exactly. Finding the line of best fit that minimizes perpendicular distance is indeed hairier, and is something that is considered when fitting curves to data, higher order polynomials or circle arcs, for example. This is moving away from the original statistical reason for why the linear line of best fit is so useful, but sometimes you just want to fit a curve. However, I vaguely remember reading somewhere that minimizing perpendicular distance tends to give you best fit curves that under estimate the curvature, and more sophisticated methods are required for arbitrary curves. Unfortunately, I can’t find the reference at this time.
On Oct 4, 2018, at 12:28 PM, Tom Duff <td@pixar.com> wrote:
Standard least squares measures the *vertical* distance to the line from each point and minimizes the sum of the squares. Interchanging x and y is equivalent to switching to measuring the *horizontal* distance. You can instead measure the perpendicular distance, which is invariant under interchanging x and y (or any other isometry for that matter) but the solution is correspondingly hairier.
On Thu, Oct 4, 2018 at 9:24 AM Henry Baker <hbaker1@pipeline.com> wrote:
In common discussions of least squares, the parameters (m,b) are estimated for the equation y = m*x+b using as data various datapoints [x1,y1], [x2,y2], [x3,y3], etc.
For example, in Wikipedia (where m=beta2 and b=beta1):
https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_L...
So far, so good.
Now, if I merely exchange x and y, then my equation is x = m'*y+b', where should be m' = 1/m and b' = -b/m. (Let's ignore the case where the best m=0.)
However, if I then estimate (m',b') using the same least squares method, I don't get (1/m,-b/m) !
So either I'm doing something wrong, or perhaps there is a more symmetric least squares method that treats x and y symmetrically ??
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://urldefense.proofpoint.com/v2/url?u=https-3A__mailman.xmission.com_cg...
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://urldefense.proofpoint.com/v2/url?u=https-3A__mailman.xmission.com_cg...
if you really want to do it right, you need to weight the error in the x-direction and the error in y-direction inversely as precision of the data point in those directions. There's a book by Wolberg, "Prediction Analysis" which goes into all this in detail. Brent On 10/4/2018 9:41 AM, Lucas, Stephen K - lucassk wrote:
And let's not forget that there are good statistical reasons for minimizing the sums of squares of vertical distances, since this gives you the best estimate if y is actually a linear function of x plus Gaussian noise. If you assumed the noise was in the x data, not the y, then looking at minimizing horizontal distance would be appropriate, but almost always it is the y variable that is considered noisy, and x is known exactly.
Finding the line of best fit that minimizes perpendicular distance is indeed hairier, and is something that is considered when fitting curves to data, higher order polynomials or circle arcs, for example. This is moving away from the original statistical reason for why the linear line of best fit is so useful, but sometimes you just want to fit a curve. However, I vaguely remember reading somewhere that minimizing perpendicular distance tends to give you best fit curves that under estimate the curvature, and more sophisticated methods are required for arbitrary curves. Unfortunately, I can’t find the reference at this time.
On Oct 4, 2018, at 12:28 PM, Tom Duff <td@pixar.com> wrote:
Standard least squares measures the *vertical* distance to the line from each point and minimizes the sum of the squares. Interchanging x and y is equivalent to switching to measuring the *horizontal* distance. You can instead measure the perpendicular distance, which is invariant under interchanging x and y (or any other isometry for that matter) but the solution is correspondingly hairier.
On Thu, Oct 4, 2018 at 9:24 AM Henry Baker <hbaker1@pipeline.com> wrote:
In common discussions of least squares, the parameters (m,b) are estimated for the equation y = m*x+b using as data various datapoints [x1,y1], [x2,y2], [x3,y3], etc.
For example, in Wikipedia (where m=beta2 and b=beta1):
https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_L...
So far, so good.
Now, if I merely exchange x and y, then my equation is x = m'*y+b', where should be m' = 1/m and b' = -b/m. (Let's ignore the case where the best m=0.)
However, if I then estimate (m',b') using the same least squares method, I don't get (1/m,-b/m) !
So either I'm doing something wrong, or perhaps there is a more symmetric least squares method that treats x and y symmetrically ??
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://urldefense.proofpoint.com/v2/url?u=https-3A__mailman.xmission.com_cg...
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://urldefense.proofpoint.com/v2/url?u=https-3A__mailman.xmission.com_cg...
math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
The difference is that the error is measured in the y-direction in one case and the x-direction in the other. If you do a least-squres fit using the distance from the data point to the fitting line then you get the same fit when you interchange x and y. Brent On 10/4/2018 9:21 AM, Henry Baker wrote:
In common discussions of least squares, the parameters (m,b) are estimated for the equation y = m*x+b using as data various datapoints [x1,y1], [x2,y2], [x3,y3], etc.
For example, in Wikipedia (where m=beta2 and b=beta1):
https://en.wikipedia.org/wiki/Linear_least_squares#Example
So far, so good.
Now, if I merely exchange x and y, then my equation is x = m'*y+b', where should be m' = 1/m and b' = -b/m. (Let's ignore the case where the best m=0.)
However, if I then estimate (m',b') using the same least squares method, I don't get (1/m,-b/m) !
So either I'm doing something wrong, or perhaps there is a more symmetric least squares method that treats x and y symmetrically ??
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
But the distance from data point to fitting line is not invariant under change of units, say inches to centimeters. Furthermore, abscissa might be in apples, while ordinate is oranges. Then what's the distance? -- Gene On Thursday, October 4, 2018, 12:19:20 PM PDT, Brent Meeker <meekerdb@verizon.net> wrote: The difference is that the error is measured in the y-direction in one case and the x-direction in the other. If you do a least-squres fit using the distance from the data point to the fitting line then you get the same fit when you interchange x and y. Brent On 10/4/2018 9:21 AM, Henry Baker wrote:
In common discussions of least squares, the parameters (m,b) are estimated for the equation y = m*x+b using as data various datapoints [x1,y1], [x2,y2], [x3,y3], etc.
For example, in Wikipedia (where m=beta2 and b=beta1):
https://en.wikipedia.org/wiki/Linear_least_squares#Example
So far, so good.
Now, if I merely exchange x and y, then my equation is x = m'*y+b', where should be m' = 1/m and b' = -b/m. (Let's ignore the case where the best m=0.)
However, if I then estimate (m',b') using the same least squares method, I don't get (1/m,-b/m) !
So either I'm doing something wrong, or perhaps there is a more symmetric least squares method that treats x and y symmetrically ??
Yep, scaling is critical. One trick used frequently is to scale all dimensions so the variance is one. -tom On Thu, Oct 4, 2018 at 12:29 PM Eugene Salamin via math-fun < math-fun@mailman.xmission.com> wrote:
But the distance from data point to fitting line is not invariant under change of units, say inches to centimeters. Furthermore, abscissa might be in apples, while ordinate is oranges. Then what's the distance?
-- Gene
On Thursday, October 4, 2018, 12:19:20 PM PDT, Brent Meeker < meekerdb@verizon.net> wrote:
The difference is that the error is measured in the y-direction in one case and the x-direction in the other. If you do a least-squres fit using the distance from the data point to the fitting line then you get the same fit when you interchange x and y.
Brent
On 10/4/2018 9:21 AM, Henry Baker wrote:
In common discussions of least squares, the parameters (m,b) are estimated for the equation y = m*x+b using as data various datapoints [x1,y1], [x2,y2], [x3,y3], etc.
For example, in Wikipedia (where m=beta2 and b=beta1):
https://en.wikipedia.org/wiki/Linear_least_squares#Example
So far, so good.
Now, if I merely exchange x and y, then my equation is x = m'*y+b', where should be m' = 1/m and b' = -b/m. (Let's ignore the case where the best m=0.)
However, if I then estimate (m',b') using the same least squares method, I don't get (1/m,-b/m) !
So either I'm doing something wrong, or perhaps there is a more symmetric least squares method that treats x and y symmetrically ??
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
-- -- http://cube20.org/ -- http://golly.sf.net/ --
But what if the variances differ from point to point, or each have their own (x,y) correlation matrix. In standard least-squares, one simply weights each point by 1/variance. -- Gene On Thursday, October 4, 2018, 12:50:05 PM PDT, Tomas Rokicki <rokicki@gmail.com> wrote: Yep, scaling is critical. One trick used frequently is to scale all dimensions so the variance is one. -tom On Thu, Oct 4, 2018 at 12:29 PM Eugene Salamin via math-fun < math-fun@mailman.xmission.com> wrote:
But the distance from data point to fitting line is not invariant under change of units, say inches to centimeters. Furthermore, abscissa might be in apples, while ordinate is oranges. Then what's the distance?
-- Gene
On Thursday, October 4, 2018, 12:19:20 PM PDT, Brent Meeker < meekerdb@verizon.net> wrote:
The difference is that the error is measured in the y-direction in one case and the x-direction in the other. If you do a least-squres fit using the distance from the data point to the fitting line then you get the same fit when you interchange x and y.
Brent
On 10/4/2018 9:21 AM, Henry Baker wrote:
In common discussions of least squares, the parameters (m,b) are estimated for the equation y = m*x+b using as data various datapoints [x1,y1], [x2,y2], [x3,y3], etc.
For example, in Wikipedia (where m=beta2 and b=beta1):
https://en.wikipedia.org/wiki/Linear_least_squares#Example
So far, so good.
Now, if I merely exchange x and y, then my equation is x = m'*y+b', where should be m' = 1/m and b' = -b/m. (Let's ignore the case where the best m=0.)
However, if I then estimate (m',b') using the same least squares method, I don't get (1/m,-b/m) !
So either I'm doing something wrong, or perhaps there is a more symmetric least squares method that treats x and y symmetrically ??
That's why you need to make the distances dimenisonless by dividing the differences by the std-deviation of the data points in each direction. Brent On 10/4/2018 12:29 PM, Eugene Salamin via math-fun wrote:
But the distance from data point to fitting line is not invariant under change of units, say inches to centimeters. Furthermore, abscissa might be in apples, while ordinate is oranges. Then what's the distance?
-- Gene
On Thursday, October 4, 2018, 12:19:20 PM PDT, Brent Meeker <meekerdb@verizon.net> wrote:
The difference is that the error is measured in the y-direction in one case and the x-direction in the other. If you do a least-squres fit using the distance from the data point to the fitting line then you get the same fit when you interchange x and y.
Brent
On 10/4/2018 9:21 AM, Henry Baker wrote:
In common discussions of least squares, the parameters (m,b) are estimated for the equation y = m*x+b using as data various datapoints [x1,y1], [x2,y2], [x3,y3], etc.
For example, in Wikipedia (where m=beta2 and b=beta1):
https://en.wikipedia.org/wiki/Linear_least_squares#Example
So far, so good.
Now, if I merely exchange x and y, then my equation is x = m'*y+b', where should be m' = 1/m and b' = -b/m. (Let's ignore the case where the best m=0.)
However, if I then estimate (m',b') using the same least squares method, I don't get (1/m,-b/m) !
So either I'm doing something wrong, or perhaps there is a more symmetric least squares method that treats x and y symmetrically ??
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
I've been trying to improve the calculations of a 1906 -- yes, 112 years ago -- paper, where the author (Harvard Electrical Engineering Professor Arthur Edwin Kennelly) "eyeballed" the best fitting line. The "ordinary least squares" (OLS) method produces a slope of 1.0146, while my interpretation of the "geometric" method (http://www.mathpages.com/home/kmath110.htm) produces a slope of 1.0192. The author eyeballed a slope of 1.125, which is pretty far off the mark, although to be somewhat fair, he was erroneously attributing the *same* slope to 3 distinct data sets: (OLS 1.0146, my geometric calculation 1.0192) (OLS 1.1369, my geometric calculation 1.1702) (OLS 1.2072, my geometric calculation 1.1720) Notice how the "geometric" calculation produces much better matching in datasets #2 & #3; my only conclusion is that dataset #1 has more substantial errors or more significant differences from the other two than can be handled by this data analysis. I "standardized" the X and Y values to have similar ranges and variances, so that modelling the Euclidean rigid rotation of the (x,y) data to obtain the best-fitting slope is somewhat "reasonable". As you can see by results above, both the OLS and the "geometric" methods are considerably more accurate than eyeballing; this is particularly important, since the slope being computed appears as an *exponent* in the process model. At 12:18 PM 10/4/2018, Brent Meeker wrote:
The difference is that the error is measured in the y-direction in one case and the x-direction in the other.
If you do a least-squres fit using the distance from the data point to the fitting line then you get the same fit when you interchange x and y.
Brent
On 10/4/2018 9:21 AM, Henry Baker wrote:
In common discussions of least squares, the parameters (m,b) are estimated for the equation y = m*x+b using as data various datapoints [x1,y1], [x2,y2], [x3,y3], etc.
For example, in Wikipedia (where m=beta2 and b=beta1):
https://en.wikipedia.org/wiki/Linear_least_squares#Example
So far, so good.
Now, if I merely exchange x and y, then my equation is x = m'*y+b', where should be m' = 1/m and b' = -b/m. (Let's ignore the case where the best m=0.)
However, if I then estimate (m',b') using the same least squares method, I don't get (1/m,-b/m) !
So either I'm doing something wrong, or perhaps there is a more symmetric least squares method that treats x and y symmetrically ??
So this is a linear fit to data on a log-log plot. You are minimizing the log-errors, I hope. While worrying about the best fit, you might do a bootstrap estimate of the error in the fitted parameters to see how much precision you can expect. Brent On 10/4/2018 5:47 PM, Henry Baker wrote:
I've been trying to improve the calculations of a 1906 -- yes, 112 years ago -- paper, where the author (Harvard Electrical Engineering Professor Arthur Edwin Kennelly) "eyeballed" the best fitting line.
The "ordinary least squares" (OLS) method produces a slope of 1.0146, while my interpretation of the "geometric" method (http://www.mathpages.com/home/kmath110.htm) produces a slope of 1.0192.
The author eyeballed a slope of 1.125, which is pretty far off the mark, although to be somewhat fair, he was erroneously attributing the *same* slope to 3 distinct data sets:
(OLS 1.0146, my geometric calculation 1.0192) (OLS 1.1369, my geometric calculation 1.1702) (OLS 1.2072, my geometric calculation 1.1720)
Notice how the "geometric" calculation produces much better matching in datasets #2 & #3; my only conclusion is that dataset #1 has more substantial errors or more significant differences from the other two than can be handled by this data analysis.
I "standardized" the X and Y values to have similar ranges and variances, so that modelling the Euclidean rigid rotation of the (x,y) data to obtain the best-fitting slope is somewhat "reasonable".
As you can see by results above, both the OLS and the "geometric" methods are considerably more accurate than eyeballing; this is particularly important, since the slope being computed appears as an *exponent* in the process model.
At 12:18 PM 10/4/2018, Brent Meeker wrote:
The difference is that the error is measured in the y-direction in one case and the x-direction in the other.
If you do a least-squres fit using the distance from the data point to the fitting line then you get the same fit when you interchange x and y.
Brent
On 10/4/2018 9:21 AM, Henry Baker wrote:
In common discussions of least squares, the parameters (m,b) are estimated for the equation y = m*x+b using as data various datapoints [x1,y1], [x2,y2], [x3,y3], etc.
For example, in Wikipedia (where m=beta2 and b=beta1):
https://en.wikipedia.org/wiki/Linear_least_squares#Example
So far, so good.
Now, if I merely exchange x and y, then my equation is x = m'*y+b', where should be m' = 1/m and b' = -b/m. (Let's ignore the case where the best m=0.)
However, if I then estimate (m',b') using the same least squares method, I don't get (1/m,-b/m) !
So either I'm doing something wrong, or perhaps there is a more symmetric least squares method that treats x and y symmetrically ??
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
Upon reflection(!!), it occurred to me that the "ordinary" least squares solution is a pretty decent approximation to the "correct"/symmetrical slope, so one could simply *rotate* the entire x-y plane about the centroid of the data points until this slope was *zero*, and then run the "ordinary" least squares algorithm again. Since I believe that this iteration achieves cubic(?) convergence, at most 2-3 iterations should be required for any reasonable problem. However, the following method requires exactly 1 iteration, and hence should be considerably faster for large #'s of points: http://www.mathpages.com/home/kmath110.htm At 09:21 AM 10/4/2018, Henry Baker wrote:
In common discussions of least squares, the parameters (m,b) are estimated for the equation y = m*x+b using as data various datapoints [x1,y1], [x2,y2], [x3,y3], etc.
For example, in Wikipedia (where m=beta2 and b=beta1):
https://en.wikipedia.org/wiki/Linear_least_squares#Example
So far, so good.
Now, if I merely exchange x and y, then my equation is x = m'*y+b', where should be m' = 1/m and b' = -b/m. (Let's ignore the case where the best m=0.)
However, if I then estimate (m',b') using the same least squares method, I don't get (1/m,-b/m) !
So either I'm doing something wrong, or perhaps there is a more symmetric least squares method that treats x and y symmetrically ??
These methods talk about "the variance of the x and y variables" as though every data point has the same uncertainty. In most cases you have estimates of the uncertainty in different data points. Some may even have repeated measurements at the same nominal x value. A good fit needs to weigh the points differently. Brent On 10/4/2018 1:06 PM, Henry Baker wrote:
Upon reflection(!!), it occurred to me that the "ordinary" least squares solution is a pretty decent approximation to the "correct"/symmetrical slope, so one could simply *rotate* the entire x-y plane about the centroid of the data points until this slope was *zero*, and then run the "ordinary" least squares algorithm again. Since I believe that this iteration achieves cubic(?) convergence, at most 2-3 iterations should be required for any reasonable problem.
However, the following method requires exactly 1 iteration, and hence should be considerably faster for large #'s of points:
http://www.mathpages.com/home/kmath110.htm
At 09:21 AM 10/4/2018, Henry Baker wrote:
In common discussions of least squares, the parameters (m,b) are estimated for the equation y = m*x+b using as data various datapoints [x1,y1], [x2,y2], [x3,y3], etc.
For example, in Wikipedia (where m=beta2 and b=beta1):
https://en.wikipedia.org/wiki/Linear_least_squares#Example
So far, so good.
Now, if I merely exchange x and y, then my equation is x = m'*y+b', where should be m' = 1/m and b' = -b/m. (Let's ignore the case where the best m=0.)
However, if I then estimate (m',b') using the same least squares method, I don't get (1/m,-b/m) !
So either I'm doing something wrong, or perhaps there is a more symmetric least squares method that treats x and y symmetrically ??
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
participants (6)
-
Brent Meeker -
Eugene Salamin -
Henry Baker -
Lucas, Stephen K - lucassk -
Tom Duff -
Tomas Rokicki