In that "best paper award" paper, John K. Salmon, Mark A. Moraes, Ron O. Dror, and David E. Shaw: Parallel Random Numbers: As Easy as 1, 2, 3 http://www.thesalmons.org/john/random123/papers/random123sc11.pdf they recommend random number generators based on 1) increment a counter 2) encrypt the counter via multiple rounds of "philox" transformations. A philox transform (which is a bijection) inputs two w-bit machine words L and R and outputs L' and R' where L' = R*M mod 2^w R' = floor(R*M / 2^w) XOR L XOR K where M is an odd constant and K is a constant (M and K can vary in different philoxes and they choose their Ms and Ks via a search for good ones that cause high "avalanching"). If we have a 64-bit machine and use L1,R1,L2,R2 for 256 bits in all in 4 words, then philox(L1,R1); //uses M1 which is fixed and K1 which increments by a constant philox(L2,R2); //uses M2 which is fixed and K2 which increments by a constant swap(R1,R2); is a "round". They find 7 rounds suffices to get immunity to the "crush U01" randomness test suite. One nice feature is you can tell it to use more rounds for less speed but more randomness (tunable). They also have other PRNGs based on the threefish and AES ciphers, but they are much more complicated to describe. CRITICISMS: Unfortunately, the flow of information in a philox transformation exhibits unidirectionality from the least- toward the most-significant bits. That is: Hi bits of R cannot affect lo bits of L'. Hi bits of L cannot affect lo bits of R'. Bits of L cannot affect any bits in L'. Therefore, the least-signif bits of L and R are going to be "less random" than its most-sig bits. Fortunately hi bits of R can affect lo bits of R' preventing complete death, but this gets the job done too slowly: it pathetically takes 3 rounds for hi bits of L to affect lo bits of (the new) L! To address this stupid deficiency, one might therefore consider interspersing philox transforms with these other kind of bijective transform: R = R XOR shiftright(R, H); and/or L = L XOR shiftright(L, H); and/or R = R XOR shiftright(L, H); where the H are (possibly varying) constants between 0.4 and 0.9 times the wordsize in bits. Furthermore to take advantage of multiply-and-accumulate when your computer has it, I'd recommend slightly redefining philox as L' = R*M mod 2^w R' = (floor(R*M / 2^w) + K mod 2^w) XOR L which is still a bijection. Furthermore, the idea of "increment a counter" is a bit too extreme (it leaves hi order bits almost always the same); we instead could increment one word by any odd constant (not 1) and increment a second word by a different odd constant (except not when the first word happens to equal 0). Something like this still has the same period as a counter but it changes a lot more bits every whack. Given that, you might want to dispense with K. AND SO.... My "better than best paper" award please? :)