From: Joerg Arndt <arndt@jjj.de> To: math-fun <math-fun@mailman.xmission.com> Subject: [math-fun] PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation. Message-ID: <20160319171653.GA4808@jjj.de> Content-Type: text/plain; charset=us-ascii
As some of us are interested in random number generation, I'd like to point to https://www.cs.hmc.edu/~oneill/ I'll unlike find time to read the paper (preprint as of now), so comments are welcome.
--well, she sure devotes a lot of attention to "marketing" it, but I rather wish she'd devote more attention to concisely explaining it. Her actual very very long paper (submitted to ACMTOMS) is here http://www.pcg-random.org/pdf/toms-oneill-pcg-family-v1.02.pdf She ignores a lot of published good fast bigcrushable generators in her "comparison with everybody else" which makes her look better than she really is. For example, she ignored S.Vigna's "shootout" http://xorshift.di.unimi.it/, R.P.Brent's XorGens, and Mark A. Overton: Dr.Dobbs Journal (May+June 2011). In my opinion, here's what the world wants: period = 2^280 to 2^512 because cuberoot(period) is enough to violate certain "coupon collector" tests and squareroot(period) is enough to violate birthday tests (for any generator whatsoever) hence these need to be inaccessibly large. She already blows it by not realizing that, proposing for general use period<=2^64, which is absurdly stupid. But in her defense, most others do not realize it either. The world of randomness authors and implementors contains a lot of stupidity. state = a number of bits approx equal to log(period)/log(2) because otherwise you are being embarrassingly inefficient about your memory usage and will pay in speed due to eating too much cache. For example, the "mersenne twister" is an embarrassment to humanity for that reason. randomness: at least good enough to pass the bigcrush test suite. (Most library randgens aren't.) Preferably also a combination of several unrelated-looking generation methods. Most kinds of generators are theoretically bad, meaning I can cook up a reasonably natural test they fail. For example, every LCG is bad due to matrix rank tests and hyperplane spacings. Every generator that outputs the expansion of a very long-period-decimal (in some radix) rational is bad because of continued fraction tests. Every linear generator using matrix powers over GF2 is bad again because of matrix rank tests. You can hide those flaws well enough to escape the limited scrutiny of BigCrush, but the flaws are still inherently there. "Cryptographically secure" is a somewhat vague term meaning no statistical test that runs in a reasonably small amount of time, can reveal nonrandomness. This currently can only be proven (in the cases where it can be) with the aid of conjectures including P/=NP, e.g. hardness of discrete logs. It also can be conjectured directly, e.g. if you cannot think of any efficient way to crack it, you conjecture it is crypto-secure. My present view is that a good compromise is this. Combine several exceptionally fast, simple, unrelated, and well understood generators with relatively-prime periods, such as Vigna, a Weyl generator, and an Overton generator. That way you are assured high period and can also assure passing bigcrush. Now on top of that, add a crypto-secure generator which is called only occasionally so it will not slow things down much. It also should be noted that essentially any invertible map you do to an N-bit state space, will have period of order 2^N... usually... although you will rarely be able to prove that. For example "Feistel" tranformations. That fact makes it quite easy to design high period, fast, space-efficient generators that are crypto-secure. The problem is, you will not know when you have designed them, versus when you have designed crap. It is kind of like being in a room full of 95% gold bricks, but you cannot distinguish the gold bricks from the 5% fake bricks. O'Neill's fastest two generators are "PCG XSL RR 128/64 (MCG)" with period=2^126, "PCG RXS M XS 64 (LCG) " with period 2^64, and "PCG XSL RR RR 128 (LCG)" with period=2^128. What are they? I'm not sure. Here is actual C code for one of her simplest generators // *Really* minimal PCG32 code / (c) 2014 M.E. O'Neill / pcg-random.org typedef struct { uint64_t state; uint64_t inc; } pcg32_random_t; uint32_t pcg32_random_r(pcg32_random_t* rng){ uint64_t oldstate = rng->state; // Advance internal state: rng->state = oldstate * 6364136223846793005ULL + (rng->inc|1); // Calculate output function (XSH RR), uses old state for max ILP: uint32_t xorshifted = ((oldstate >> 18u) ^ oldstate) >> 27u; uint32_t rot = oldstate >> 59u; return (xorshifted >> rot) | (xorshifted << ((-rot) & 31)); } so as you can see, it iterates "state" using an LCG with modulus 2^64, which is a pretty sucky thing to do. That presumably has period=2^64 albeit its rightmost bits will have far shorter periods, e.g. the rightmost bit will have period=2. It then outputs a disguised version of that state, disguised by performing some bit shifts and XORs, including a "random length" bit-rotation, and also outputting only 32 bits. This is fast, if you have a fast 64->128 multiplier. Otherwise, it is not so fast. She of course did not test on the latter kind of hardware since that would have made her look bad. I like the way she coded her random rotation primitive.