Very nice paper! The authors take 24 published datasets that claim to represent "power laws" and re-examine them using a consistent, sophisticated protocol. They also generate *synthetic* data sets with the *same statistics* to see how hard it is to distinguish these power law data from other standard probabilistic processes. (In this sense, they're aping crypto analysis, where one is typically trying to distinguish a given dataset from a uniformly random dataset.) It would be interesting to apply some of these more sophisticated tests against some computer cache performance data (none of the 24 examples in the paper were based on caching data), but I'm not the person to do this. Many of the techniques in this paper aren't relevant to non-frequency-based, non-fat-tail datasets -- e.g., force v. velocity data in fluid flow, where we also have power laws -- just not probability-of-occurrence power laws. One thing this paper does seem to recommend: estimate your model & parameters, then generate synthetic data from that model with those parameters. If the resulting dataset doesn't look at all like your original dataset, you're barking up the wrong tree! At 11:40 AM 9/23/2018, Cris Moore wrote:
least-squares fits on log-log plots often lead to the wrong exponent for power laws, since the data points out in the tail are noisier. one can also use a maximum-likelihood approach: see https://arxiv.org/abs/0706.1062 <https://arxiv.org/abs/0706.1062>
- Cris
On Sep 23, 2018, at 12:08 PM, Henry Baker <hbaker1@pipeline.com> wrote: At 08:24 AM 9/23/2018, Richard Howard wrote:
If the physics of the problem leads one to suspect a power law (or exponential), it is tempting to fit a straight line on a log-log (or log-linear) plot in Excel.
This does not weight all points (and their uncertainties) properly.
I can imagine that the error bounds on measurement errors don't scale properly after taking logs, but if the measurement errors are small compared with the underlying process variation (assuming that it is, indeed, a power law), then the weighting shouldn't be that far off.
Or is there some other issue that I'm missing here?