[math-fun] Curve-fitting methods ?
The following xkcd comic (#2048) is funny, but it brings up a good point: are there any good methods for deciding when a particular curve fits a particular set of data points? Should statistics & spreadsheet programs utilize such a method to warn users that the fit they've chosen isn't very good? I've thought of somehow using color along the curve to indicate where the fit is good (perhaps "green") and portions along the curve where the fit isn't so good (perhaps "red"). A *spline* curve could also indicate its tension by means of color, for example. (Of course, none of this is going to help those who are color blind!) https://xkcd.com/2048/
If the physics of the problem leads one to suspect a power law (or exponential), it is tempting to fit a straight line on a log-log (or log-linear) plot in Excel. This does not weight all points (and their uncertainties) properly. Definitely need a warning. --R On Sun, Sep 23, 2018 at 10:34 AM Henry Baker <hbaker1@pipeline.com> wrote:
The following xkcd comic (#2048) is funny, but it brings up a good point: are there any good methods for deciding when a particular curve fits a particular set of data points?
Should statistics & spreadsheet programs utilize such a method to warn users that the fit they've chosen isn't very good?
I've thought of somehow using color along the curve to indicate where the fit is good (perhaps "green") and portions along the curve where the fit isn't so good (perhaps "red"). A *spline* curve could also indicate its tension by means of color, for example. (Of course, none of this is going to help those who are color blind!)
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
Two good general methods are 1. perturbing the data slightly (since all real data is noisy) and seeing if this changes your curve fit wildly. If the parameters of the fit are very sensitive to the data, you shouldn’t trust it. 2. cross-validation: hide some of the data points, fit the curve to the rest, and then see how well your fit recovers the ones you hid. - Cris
On Sep 23, 2018, at 8:32 AM, Henry Baker <hbaker1@pipeline.com> wrote:
The following xkcd comic (#2048) is funny, but it brings up a good point: are there any good methods for deciding when a particular curve fits a particular set of data points?
Should statistics & spreadsheet programs utilize such a method to warn users that the fit they've chosen isn't very good?
I've thought of somehow using color along the curve to indicate where the fit is good (perhaps "green") and portions along the curve where the fit isn't so good (perhaps "red"). A *spline* curve could also indicate its tension by means of color, for example. (Of course, none of this is going to help those who are color blind!)
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
And you can go even further than 2. in case you don't have any good theory of the scatter you can use bootstrap estimates of the scatter in fit parameters. Brent On 9/23/2018 8:58 AM, Cris Moore wrote:
Two good general methods are
1. perturbing the data slightly (since all real data is noisy) and seeing if this changes your curve fit wildly. If the parameters of the fit are very sensitive to the data, you shouldn’t trust it.
2. cross-validation: hide some of the data points, fit the curve to the rest, and then see how well your fit recovers the ones you hid.
- Cris
On Sep 23, 2018, at 8:32 AM, Henry Baker <hbaker1@pipeline.com> wrote:
The following xkcd comic (#2048) is funny, but it brings up a good point: are there any good methods for deciding when a particular curve fits a particular set of data points?
Should statistics & spreadsheet programs utilize such a method to warn users that the fit they've chosen isn't very good?
I've thought of somehow using color along the curve to indicate where the fit is good (perhaps "green") and portions along the curve where the fit isn't so good (perhaps "red"). A *spline* curve could also indicate its tension by means of color, for example. (Of course, none of this is going to help those who are color blind!)
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
Yes! I finally used the bootstrap myself in a recent computational physics paper. Very cool. I learned about it from Cosma Shalizi here: http://amsci.sigmaxi.org/shuttle.php?dest=node/2798 - Cris
On Sep 23, 2018, at 9:08 PM, Brent Meeker <meekerdb@verizon.net> wrote:
And you can go even further than 2. in case you don't have any good theory of the scatter you can use bootstrap estimates of the scatter in fit parameters.
Brent
On 9/23/2018 8:58 AM, Cris Moore wrote:
Two good general methods are
1. perturbing the data slightly (since all real data is noisy) and seeing if this changes your curve fit wildly. If the parameters of the fit are very sensitive to the data, you shouldn’t trust it.
2. cross-validation: hide some of the data points, fit the curve to the rest, and then see how well your fit recovers the ones you hid.
- Cris
On Sep 23, 2018, at 8:32 AM, Henry Baker <hbaker1@pipeline.com> wrote:
The following xkcd comic (#2048) is funny, but it brings up a good point: are there any good methods for deciding when a particular curve fits a particular set of data points?
Should statistics & spreadsheet programs utilize such a method to warn users that the fit they've chosen isn't very good?
I've thought of somehow using color along the curve to indicate where the fit is good (perhaps "green") and portions along the curve where the fit isn't so good (perhaps "red"). A *spline* curve could also indicate its tension by means of color, for example. (Of course, none of this is going to help those who are color blind!)
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
It depends somewhat on the purpose of the curve. Is it just going to be used as the best guess within the range of points? Or is it going to be used to estimate a parameter that will be used in further predictions? And what is a "good fit" depends on what it represents. If it's a fairly sharply defined relation then a lot of scatter means there's noise and the measurement is poor. But if it's just exploratory and you don't even know whether there's a relation, then a lot of scatter may mean there just isn't any relation. You know you have a good fit when you have a relation based on a causal theory and the theory includes estimates of the uncertainty and your data is consistent with both. Note that this means the data can have /*too little scatter*/ to be consistent with the theory. A familiar occurence in freshman physics lab. Brent On 9/23/2018 7:32 AM, Henry Baker wrote:
The following xkcd comic (#2048) is funny, but it brings up a good point: are there any good methods for deciding when a particular curve fits a particular set of data points?
Should statistics & spreadsheet programs utilize such a method to warn users that the fit they've chosen isn't very good?
I've thought of somehow using color along the curve to indicate where the fit is good (perhaps "green") and portions along the curve where the fit isn't so good (perhaps "red"). A *spline* curve could also indicate its tension by means of color, for example. (Of course, none of this is going to help those who are color blind!)
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
participants (4)
-
Brent Meeker -
Cris Moore -
Henry Baker -
Richard Howard