[math-fun] reference for a simple algorithm

older
Re: [math-fun] I dream in math now.

Joerg Arndt

16 Feb 2010 16 Feb '10

7:16 a.m.

[a homework question] The following method is straightforward (from my thesis (and from my own brain)): --------------------- An algorithm for generating a random object $x$ where $x \in U \subset V$ ($U$ is the subset of all objects in $V$ that have a required property) is as follows. 1) Generate a random object $x \in V$. 2) If $x \in U$ then return it ($x$ is of the required type). 3) Go to step 1. --------------------- Now I have been asked (by one examiner) for a reference for this method, but surprisingly I cannot find anything. So trivial no one ever bothered to write down? Anyone? Btw. I call this thing the "rejection method". The example (square in a circle) at the bottom of the page http://en.wikipedia.org/wiki/Rejection_sampling would suggest that my algorithm is a special case of what is described on this page. However, I cannot see how it can be specialised from the Algorithm given just above it.

Show replies by date

Victor Miller

16 Feb 16 Feb

10:08 a.m.

Joerg, I think that the reference given in that page (which is also in von Neumann's collected works) J. von Neumann, "Various techniques used in connection with random digits. Monte Carlo methods", Nat. Bureau Standards, 12 (1951), pp. 36–38. is what you want. For example, Knuth if TAOCP vol. 2 page 120 referers to it (but surprisingly for Knuth doesn't give the exact reference). Victor On Tue, Feb 16, 2010 at 9:17 AM, Joerg Arndt <arndt@jjj.de> wrote:

...

[a homework question]

The following method is straightforward (from my thesis (and from my own brain)):

--------------------- An algorithm for generating a random object $x$ where $x \in U \subset V$ ($U$ is the subset of all objects in $V$ that have a required property) is as follows.

1) Generate a random object $x \in V$. 2) If $x \in U$ then return it ($x$ is of the required type). 3) Go to step 1. ---------------------

Now I have been asked (by one examiner) for a reference for this method, but surprisingly I cannot find anything.

So trivial no one ever bothered to write down?

Anyone?

Btw. I call this thing the "rejection method".

The example (square in a circle) at the bottom of the page http://en.wikipedia.org/wiki/Rejection_sampling would suggest that my algorithm is a special case of what is described on this page. However, I cannot see how it can be specialised from the Algorithm given just above it.

_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun

Joerg Arndt

11:10 a.m.

Thanks a lot! I cannot access neither the Neumann paper nor Knuth's TAOCP (because these are on some ship following me to Europe). So I'll just cite Neumann (first time citing a paper I haven't actually seen 8-) ) * Victor Miller <victorsmiller@gmail.com> [Feb 16. 2010 19:03]:

...

Joerg, I think that the reference given in that page (which is also in von Neumann's collected works) J. von Neumann, "Various techniques used in connection with random digits. Monte Carlo methods", Nat. Bureau Standards, 12 (1951), pp. 36–38.

is what you want. For example, Knuth if TAOCP vol. 2 page 120 referers to it (but surprisingly for Knuth doesn't give the exact reference).

Victor

[...]

Fred lunnon

2:31 p.m.

My first reaction was that this all seemed rather trivial --- closer inspection reveals interesting and useful stuff here, that ought to be more widely known about! WFL On 2/16/10, Joerg Arndt <arndt@jjj.de> wrote:

...

Thanks a lot!

I cannot access neither the Neumann paper nor Knuth's TAOCP (because these are on some ship following me to Europe).

So I'll just cite Neumann (first time citing a paper I haven't actually seen 8-) )

* Victor Miller <victorsmiller@gmail.com> [Feb 16. 2010 19:03]:

...
Joerg, I think that the reference given in that page (which is also in von Neumann's collected works) J. von Neumann, "Various techniques used in connection with random digits. Monte Carlo methods", Nat. Bureau Standards, 12 (1951), pp. 36–38.

is what you want. For example, Knuth if TAOCP vol. 2 page 120 referers to it (but surprisingly for Knuth doesn't give the exact reference).

Victor

...
[...]

_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun

Allan Wechsler

2:41 p.m.

Something is troubling me about this. You might say it gives me measure-heebie-jeebies. Obviously it works fine when the spaces being sampled are finite. When the spaces are infinite, however, there needs to be a something-or-other -- a measure? a sigma-algebra? -- in order for it to be well-defined to sample the larger space. Furthermore, the measure of the small space in the big space has to be biggish to make the algorithm efficient. (You don't want to have to sample the big space a trillion times to get one hit in the small space.) The thing that is really bothering me, though, is the fear that although the small space might be a subset of the big one, the natural measures on the two might disagree, so that evenly-distributed samples of the big space might map into unevenly-distributed samples of the small. Perhaps somebody can convince me that this failure mode never occurs in practice. Or perhaps somebody can tell me that I've gone wrong much earlier in my thinking. On Tue, Feb 16, 2010 at 4:30 PM, Fred lunnon <fred.lunnon@gmail.com> wrote:

...

My first reaction was that this all seemed rather trivial --- closer inspection reveals interesting and useful stuff here, that ought to be more widely known about! WFL

On 2/16/10, Joerg Arndt <arndt@jjj.de> wrote:

...
Thanks a lot!

I cannot access neither the Neumann paper nor Knuth's TAOCP (because these are on some ship following me to Europe).

So I'll just cite Neumann (first time citing a paper I haven't actually seen 8-) )

* Victor Miller <victorsmiller@gmail.com> [Feb 16. 2010 19:03]:

...
Joerg, I think that the reference given in that page (which is also in von Neumann's collected works) J. von Neumann, "Various techniques used in connection with random digits. Monte Carlo methods", Nat. Bureau Standards, 12 (1951), pp. 36–38.

is what you want. For example, Knuth if TAOCP vol. 2 page 120 referers to it (but surprisingly for Knuth doesn't give the exact reference).

Victor

...
[...]

_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun

_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun

Joerg Arndt

17 Feb 17 Feb

6 a.m.

* Allan Wechsler <acwacw@gmail.com> [Feb 17. 2010 08:49]:

...

[...]

Obviously it works fine when the spaces being sampled are finite.

My application for this is to generate (uniformly random) connected permutations. For large n only approx n!/n of all permutations are _not_ connected (and the test for connectedness is fast), so the resulting algorithm is very good in practice, cf. http://www.jjj.de/pub/ (thesis-draft, sect. 8.2)

...

When the spaces are infinite, however, there needs to be a something-or-other -- a measure? a sigma-algebra? -- in order for it to be well-defined to sample the larger space. Furthermore, the measure of the small space in the big space has to be biggish to make the algorithm efficient. (You don't want to have to sample the big space a trillion times to get one hit in the small space.)

Efficiency in my finite universe: -------------------- The method is efficient if 1) generating a random object x ∈ V is fast, 2) the test whether x ∈ U is fast, and 3) |V|/|U| (the expected number of iterations) is not too large. -------------------- (let's see whether the UTF-8 chars for \in appear on the list)

...

The thing that is really bothering me, though, is the fear that although the small space might be a subset of the big one, the natural measures on the two might disagree, so that evenly-distributed samples of the big space might map into unevenly-distributed samples of the small. Perhaps somebody can convince me that this failure mode never occurs in practice. Or perhaps somebody can tell me that I've gone wrong much earlier in my thinking.

[...]

I'd be surprised if uniformity could fail in U. However, being clueless in these matters, I'd only be a bit be surprised.

Schroeppel, Richard

10:04 a.m.

What's "connected"? Everything in one big cycle? --Rich ________________________________________ From: math-fun-bounces@mailman.xmission.com [math-fun-bounces@mailman.xmission.com] On Behalf Of Joerg Arndt [arndt@jjj.de] Sent: Wednesday, February 17, 2010 6:01 AM To: math-fun Subject: Re: [math-fun] reference for a simple algorithm * Allan Wechsler <acwacw@gmail.com> [Feb 17. 2010 08:49]:

...

[...]

Obviously it works fine when the spaces being sampled are finite.

...

When the spaces are infinite, however, there needs to be a something-or-other -- a measure? a sigma-algebra? -- in order for it to be well-defined to sample the larger space. Furthermore, the measure of the small space in the big space has to be biggish to make the algorithm efficient. (You don't want to have to sample the big space a trillion times to get one hit in the small space.)

...

The thing that is really bothering me, though, is the fear that although the small space might be a subset of the big one, the natural measures on the two might disagree, so that evenly-distributed samples of the big space might map into unevenly-distributed samples of the small. Perhaps somebody can convince me that this failure mode never occurs in practice. Or perhaps somebody can tell me that I've gone wrong much earlier in my thinking.

[...]

I'd be surprised if uniformity could fail in U. However, being clueless in these matters, I'd only be a bit be surprised. _______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun

Joerg Arndt

12:01 p.m.

* Schroeppel, Richard <rschroe@sandia.gov> [Feb 17. 2010 19:49]:

...

What's "connected"? Everything in one big cycle? --Rich

A permutation is "connected" if no proper prefix is mapped to itself. Test: compute maxima of all proper prefixes (cost is O(n)) and check the current max (for, say, length-k prefix) is greater than k (assuming you permute [1,2,...,n]). The name "connected" may reflect that a non-connected permutation can be written as concat of two perms ( P1(prefix) . P2(suffix) ), but then maybe it's just another name as stupid as "root system". "one big cycle" perms are "cyclic" perms (which, together with "all permutations" are trivial to create unbiased).

Allan Wechsler

1:12 p.m.

OK, let me see if I understand this. We write a permutation P on the integers [1...n] as (P(1), P(2), ..., P(n)). If we cut this sequence off prematurely, at only k elements, k < n, we get a proper prefix. This is not typically a permutation itself, but sometimes it is; if this is the case, the permutation is not connected. A connected permutation is one that has no "cut"; for all k < n, the prefix of length k contains an integer greater than k. The reason it took me a long time to figure out what we were talking about is that this is not really a group-theoretic property, and my reflex is to think of permutations as elements of a group; in the algebraic context, the objects being permuted are abstract frobs, which can be labeled with integers or letters or anything else. Another way to say this is that you can have two similar permutations, indistinguishable algebraically, only one of which is "connected". If a permutation is not the identity, it is always similar to a connected permutation; to prove this, pick any moving element x, and label x as "1" and P(x) as "n". Then the representation of P begins with an n, guaranteeing connectedness. On Wed, Feb 17, 2010 at 2:02 PM, Joerg Arndt <arndt@jjj.de> wrote:

...

* Schroeppel, Richard <rschroe@sandia.gov> [Feb 17. 2010 19:49]:

...
What's "connected"? Everything in one big cycle? --Rich

A permutation is "connected" if no proper prefix is mapped to itself.

Test: compute maxima of all proper prefixes (cost is O(n)) and check the current max (for, say, length-k prefix) is greater than k (assuming you permute [1,2,...,n]).

The name "connected" may reflect that a non-connected permutation can be written as concat of two perms ( P1(prefix) . P2(suffix) ), but then maybe it's just another name as stupid as "root system".

"one big cycle" perms are "cyclic" perms (which, together with "all permutations" are trivial to create unbiased).

_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun

Joerg Arndt

18 Feb 18 Feb

1:12 a.m.

* Allan Wechsler <acwacw@gmail.com> [Feb 18. 2010 08:21]:

...

OK, let me see if I understand this. We write a permutation P on the integers [1...n] as (P(1), P(2), ..., P(n)). If we cut this sequence off prematurely, at only k elements, k < n, we get a proper prefix. This is not typically a permutation itself, but sometimes it is; if this is the case, the permutation is not connected. A connected permutation is one that has no "cut"; for all k < n, the prefix of length k contains an integer greater than k.

Correct.

...

The reason it took me a long time to figure out what we were talking about is that this is not really a group-theoretic property, and my reflex is to think of permutations as elements of a group;

I see. There are plenty of things considered with permutations (of letters 1,2,...,n) that are anything but algebraic, such as rises and falls (e.g. in up-down perms where a0 < a1 > a2 < a3 > a4 <... ). Playful combinatorialists 8-))

...

in the algebraic context, the objects being permuted are abstract frobs, which can be labeled with integers or letters or anything else. Another way to say this is that you can have two similar permutations, indistinguishable algebraically, only one of which is "connected".

Similar permutations A,B, are such that there is some perm P such that A = P * B * P^{-1}. Each equiv classe contains all perms with identical cycle spectrum (corresponding to an integer partition). Now fix the partition, say n = M_1*L_1+M_2*L_2+...+M_u*L_u (M_k cycles of length L_k). Now you could choose as a representative for the class that is determined by the partition above as the (non-connected) perm that maps the M_1 prefixes of length L_1 to itself etc.

...

If a permutation is not the identity, it is always similar to a connected permutation; to prove this, pick any moving element x, and label x as "1" and P(x) as "n". Then the representation of P begins with an n, guaranteeing connectedness.

Yes: if there is at least one nontrivial cycle in a permutation then its class contains one where the first position is n (it contains the cycle (n -> 1 -> ...) ) [I may have been sloppy here as I didn't distinguish the permutations (maps) from actions (on the unit perm, an arrangement of letters)]

...

On Wed, Feb 17, 2010 at 2:02 PM, Joerg Arndt <arndt@jjj.de> wrote:

...
* Schroeppel, Richard <rschroe@sandia.gov> [Feb 17. 2010 19:49]:

...
What's "connected"? Everything in one big cycle? --Rich

A permutation is "connected" if no proper prefix is mapped to itself.

Test: compute maxima of all proper prefixes (cost is O(n)) and check the current max (for, say, length-k prefix) is greater than k (assuming you permute [1,2,...,n]).

The name "connected" may reflect that a non-connected permutation can be written as concat of two perms ( P1(prefix) . P2(suffix) ), but then maybe it's just another name as stupid as "root system".

"one big cycle" perms are "cyclic" perms (which, together with "all permutations" are trivial to create unbiased).

Victor Miller

17 Feb 17 Feb

7:04 a.m.

There are lots of variations of rejection sampling. In particular if it happens that the measures on the two spaces are different then usually another independent random variable is used to "correct" it. The one thing that's necessary is that the target measure is absolutely continuous with respect to the other. So if U is the target space and V is space that you can easily generate random elements of, and you know a function f with values in [0,1], so that the V probability of S a subset of U is integral_S f(x) d mu(x), where mu(x) is the measure on U, So pick x at random with respect to mu, then evaluate f(x) and accept x by generating a random y in [0,1] and accepting if f(x) <= y. Victor On Tue, Feb 16, 2010 at 4:41 PM, Allan Wechsler <acwacw@gmail.com> wrote:

...

Something is troubling me about this. You might say it gives me measure-heebie-jeebies.

Obviously it works fine when the spaces being sampled are finite.

When the spaces are infinite, however, there needs to be a something-or-other -- a measure? a sigma-algebra? -- in order for it to be well-defined to sample the larger space. Furthermore, the measure of the small space in the big space has to be biggish to make the algorithm efficient. (You don't want to have to sample the big space a trillion times to get one hit in the small space.)

The thing that is really bothering me, though, is the fear that although the small space might be a subset of the big one, the natural measures on the two might disagree, so that evenly-distributed samples of the big space might map into unevenly-distributed samples of the small. Perhaps somebody can convince me that this failure mode never occurs in practice. Or perhaps somebody can tell me that I've gone wrong much earlier in my thinking.

On Tue, Feb 16, 2010 at 4:30 PM, Fred lunnon <fred.lunnon@gmail.com> wrote:

...
My first reaction was that this all seemed rather trivial --- closer inspection reveals interesting and useful stuff here, that ought to be more widely known about! WFL

On 2/16/10, Joerg Arndt <arndt@jjj.de> wrote:

...
Thanks a lot!

I cannot access neither the Neumann paper nor Knuth's TAOCP (because these are on some ship following me to Europe).

So I'll just cite Neumann (first time citing a paper I haven't actually seen 8-) )

* Victor Miller <victorsmiller@gmail.com> [Feb 16. 2010 19:03]:

...
Joerg, I think that the reference given in that page (which is also in > von Neumann's collected works) J. von Neumann, "Various techniques > used in connection with random digits. Monte Carlo methods", Nat. > Bureau Standards, 12 (1951), pp. 36–38. > > is what you want. For example, Knuth if TAOCP vol. 2 page 120 > referers to it (but surprisingly for Knuth doesn't give the exact > reference). > > Victor >

...
[...]

_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun

_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun

_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun

5755

Age (days ago)

5757

Last active (days ago)

List overview

Download

10 comments

5 participants

participants (5)

Allan Wechsler
Fred lunnon
Joerg Arndt
Schroeppel, Richard
Victor Miller

[math-fun] reference for a simple algorithm

tags

participants (5)