Re: [math-fun] Curve-fitting methods ?
At 08:24 AM 9/23/2018, Richard Howard wrote:
If the physics of the problem leads one to suspect a power law (or exponential), it is tempting to fit a straight line on a log-log (or log-linear) plot in Excel.
This does not weight all points (and their uncertainties) properly.
I can imagine that the error bounds on measurement errors don't scale properly after taking logs, but if the measurement errors are small compared with the underlying process variation (assuming that it is, indeed, a power law), then the weighting shouldn't be that far off. Or is there some other issue that I'm missing here?
least-squares fits on log-log plots often lead to the wrong exponent for power laws, since the data points out in the tail are noisier. one can also use a maximum-likelihood approach: see https://arxiv.org/abs/0706.1062 <https://arxiv.org/abs/0706.1062> - Cris
On Sep 23, 2018, at 12:08 PM, Henry Baker <hbaker1@pipeline.com> wrote:
At 08:24 AM 9/23/2018, Richard Howard wrote:
If the physics of the problem leads one to suspect a power law (or exponential), it is tempting to fit a straight line on a log-log (or log-linear) plot in Excel.
This does not weight all points (and their uncertainties) properly.
I can imagine that the error bounds on measurement errors don't scale properly after taking logs, but if the measurement errors are small compared with the underlying process variation (assuming that it is, indeed, a power law), then the weighting shouldn't be that far off.
Or is there some other issue that I'm missing here?
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
This article looks spot-on. Thanks for the link, Cris!! At 11:40 AM 9/23/2018, Cris Moore wrote:
least-squares fits on log-log plots often lead to the wrong exponent for power laws, since the data points out in the tail are noisier. one can also use a maximum-likelihood approach: see https://arxiv.org/abs/0706.1062 <https://arxiv.org/abs/0706.1062>
- Cris
On Sep 23, 2018, at 12:08 PM, Henry Baker <hbaker1@pipeline.com> wrote: At 08:24 AM 9/23/2018, Richard Howard wrote:
If the physics of the problem leads one to suspect a power law (or exponential), it is tempting to fit a straight line on a log-log (or log-linear) plot in Excel.
This does not weight all points (and their uncertainties) properly.
I can imagine that the error bounds on measurement errors don't scale properly after taking logs, but if the measurement errors are small compared with the underlying process variation (assuming that it is, indeed, a power law), then the weighting shouldn't be that far off.
Or is there some other issue that I'm missing here?
Very nice paper! The authors take 24 published datasets that claim to represent "power laws" and re-examine them using a consistent, sophisticated protocol. They also generate *synthetic* data sets with the *same statistics* to see how hard it is to distinguish these power law data from other standard probabilistic processes. (In this sense, they're aping crypto analysis, where one is typically trying to distinguish a given dataset from a uniformly random dataset.) It would be interesting to apply some of these more sophisticated tests against some computer cache performance data (none of the 24 examples in the paper were based on caching data), but I'm not the person to do this. Many of the techniques in this paper aren't relevant to non-frequency-based, non-fat-tail datasets -- e.g., force v. velocity data in fluid flow, where we also have power laws -- just not probability-of-occurrence power laws. One thing this paper does seem to recommend: estimate your model & parameters, then generate synthetic data from that model with those parameters. If the resulting dataset doesn't look at all like your original dataset, you're barking up the wrong tree! At 11:40 AM 9/23/2018, Cris Moore wrote:
least-squares fits on log-log plots often lead to the wrong exponent for power laws, since the data points out in the tail are noisier. one can also use a maximum-likelihood approach: see https://arxiv.org/abs/0706.1062 <https://arxiv.org/abs/0706.1062>
- Cris
On Sep 23, 2018, at 12:08 PM, Henry Baker <hbaker1@pipeline.com> wrote: At 08:24 AM 9/23/2018, Richard Howard wrote:
If the physics of the problem leads one to suspect a power law (or exponential), it is tempting to fit a straight line on a log-log (or log-linear) plot in Excel.
This does not weight all points (and their uncertainties) properly.
I can imagine that the error bounds on measurement errors don't scale properly after taking logs, but if the measurement errors are small compared with the underlying process variation (assuming that it is, indeed, a power law), then the weighting shouldn't be that far off.
Or is there some other issue that I'm missing here?
Yes, this is the kind of posterior model checking that we often forget to do. - Cris
On Sep 23, 2018, at 6:06 PM, Henry Baker <hbaker1@pipeline.com> wrote:
One thing this paper does seem to recommend: estimate your model & parameters, then generate synthetic data from that model with those parameters. If the resulting dataset doesn't look at all like your original dataset, you're barking up the wrong tree!
Power laws commonly occur as sampling estimates of distributions. This means the inherent errors are on the order of the square root of the number. So small values have proportionately large errors. On a log-log plot they are big near the bottom and small near the top. The fitting algorithm needs to take these errors as inverse measures of the weight. It also matters whether you doing a maximum likelihood fit or a least squares fit or something else. That's why it is best to have a theory of the measurement scatter as well as the underlying relation. Brent On 9/23/2018 11:08 AM, Henry Baker wrote:
At 08:24 AM 9/23/2018, Richard Howard wrote:
If the physics of the problem leads one to suspect a power law (or exponential), it is tempting to fit a straight line on a log-log (or log-linear) plot in Excel.
This does not weight all points (and their uncertainties) properly. I can imagine that the error bounds on measurement errors don't scale properly after taking logs, but if the measurement errors are small compared with the underlying process variation (assuming that it is, indeed, a power law), then the weighting shouldn't be that far off.
Or is there some other issue that I'm missing here?
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
participants (3)
-
Brent Meeker -
Cris Moore -
Henry Baker