Re: [math-fun] World's hardest sudoku puzzle?
Steve Rowley wrote: << . . . On the other hand, if you mean a "typical" sudoku in the sense that it's randomly selected, then we can MEASURE that! Assuming that "difficulty" translates into computer time, which is of course dubious... . . .
There ought to be an objective measure of the *paucity* of useful information readily provided by the diagram. IDEA: If (i,j) is empty, let P(i,j) be the subset of {1,2,...,9} that does NOT appear in the union of row i, column j, and the 3x3 box containing (i,j); if (i,j) is filled with the number k, then P(i,j) = {k}. (P stands for "Possible set".) Let s(i,j) := #(P(i,j)). I propose that a good intrinsic measure of the difficulty of a Sudoku is the *geometric* mean of the 81 s(i,j)'s, since the number of possibilities is multiplicative. -------------------------------------------------------------------------------------- It would be interesting to see how this measure compares with the time it takes software to solve a puzzle. --Dan
You can find other difficult sudoku puzzles here, with discussion on sudoku ratings: http://www.sudoku.com/forums/viewtopic.php?t=4212 It seems that AI Escargot-Snail is anyway the most difficult (currently known) puzzle, whatever the rating. Christian. -----Message d'origine----- De : math-fun-bounces+cboyer=club-internet.fr@mailman.xmission.com [mailto:math-fun-bounces+cboyer=club-internet.fr@mailman.xmission.com] De la part de Daniel Asimov Envoyé : vendredi 10 novembre 2006 07:14 À : math-fun Objet : Re: [math-fun] World's hardest sudoku puzzle? Steve Rowley wrote: << . . . On the other hand, if you mean a "typical" sudoku in the sense that it's randomly selected, then we can MEASURE that! Assuming that "difficulty" translates into computer time, which is of course dubious... . . .
There ought to be an objective measure of the *paucity* of useful information readily provided by the diagram. IDEA: If (i,j) is empty, let P(i,j) be the subset of {1,2,...,9} that does NOT appear in the union of row i, column j, and the 3x3 box containing (i,j); if (i,j) is filled with the number k, then P(i,j) = {k}. (P stands for "Possible set".) Let s(i,j) := #(P(i,j)). I propose that a good intrinsic measure of the difficulty of a Sudoku is the *geometric* mean of the 81 s(i,j)'s, since the number of possibilities is multiplicative. ---------------------------------------------------------------------------- ---------- It would be interesting to see how this measure compares with the time it takes software to solve a puzzle. --Dan _______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
For n objects, the fraction of permutations that are derangements is about 1/e. Does this extend to an infinite number of objects; i.e. is the probability of picking a permutation "at random" that is a derangement = 1/e? --Bill C.
Picking an infinite permutation at random is not a well defined concept. It comes down to the similar but simpler problem of picking a (positive) integer at random. Each integer must have probability zero (lim_{n->infinity} 1/n} of being chosen; but that means you can't ever pick one. For a random infinite permutation p, p(1) needs to be a positive integer picked at random. --- If you choose a probability distribution for infinite permutations, the probability that one will be a derangement will depend on what probability distribution you choose. There really isn't any obvious choice for the distribution. Franklin T. Adams-Watters -----Original Message----- From: wrcordw@sandia.gov For n objects, the fraction of permutations that are derangements is about 1/e. Does this extend to an infinite number of objects; i.e. is the probability of picking a permutation "at random" that is a derangement = 1/e? --Bill C. ________________________________________________________________________ Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam and email virus protection.
Well, that might be true, but is there any sense (maybe measure theory?), even if I don't have a nicely defined concept of picking a value at random, where I can still talk about the relative probability? --Bill C. -----Original Message----- From: math-fun-bounces+cordwell=sandia.gov@mailman.xmission.com [mailto:math-fun-bounces+cordwell=sandia.gov@mailman.xmission.com] On Behalf Of franktaw@netscape.net Sent: Friday, November 10, 2006 9:47 AM To: math-fun@mailman.xmission.com Subject: Re: [math-fun] derangements Picking an infinite permutation at random is not a well defined concept. It comes down to the similar but simpler problem of picking a (positive) integer at random. Each integer must have probability zero (lim_{n->infinity} 1/n} of being chosen; but that means you can't ever pick one. For a random infinite permutation p, p(1) needs to be a positive integer picked at random. --- If you choose a probability distribution for infinite permutations, the probability that one will be a derangement will depend on what probability distribution you choose. There really isn't any obvious choice for the distribution. Franklin T. Adams-Watters -----Original Message----- From: wrcordw@sandia.gov For n objects, the fraction of permutations that are derangements is about 1/e. Does this extend to an infinite number of objects; i.e. is the probability of picking a permutation "at random" that is a derangement = 1/e? --Bill C. ________________________________________________________________________ Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam and email virus protection. _______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
On 11/10/06, Cordwell, William R <wrcordw@sandia.gov> wrote:
Well, that might be true, but is there any sense (maybe measure theory?), even if I don't have a nicely defined concept of picking a value at random, where I can still talk about the relative probability?
--Bill C.
You can talk about the density of primes near an integer, something that is often called "the probability that a randomly chosen number is prime" even though that statement doesn't parse. You could therefore talk about the density of derangements near a given permutation, where distance between permutations is the number of transpositions required to transform one into the other. -- Mike Stay metaweta@gmail.com http://math.ucr.edu/~mike
Yes, but most (in some sense) permutations are infinitely far apart by this metric. Or, to put it another way, if a given infinite permutation has infinitely many fixed points, every permutation in any neighborhood of it has infinitely many fixed points, and thus is not a derangement. Further, there are infinitely many permutations one transposition away from a given infinite permutation, and trying to pick one at random gets us right back to our problem of picking a random integer. (We can get around this problem by making the distance be the maximum value at which the permutations differ, but this doesn't help with the first problem.) Fundamentally, this approach won't work. It requires that the neighborhood be finite, but that every permutation be in some neighborhood. This is inconsistent with the fact that there are uncountably many permutations. (You could instead ask for a measure on each neighborhood, but then you are really getting a measure on the set of all infinite permutations. If you have that, you don't need the neighborhoods.) Franklin T. Adams-Watters -----Original Message----- From: mike@math.ucr.edu You can talk about the density of primes near an integer, something that is often called "the probability that a randomly chosen number is prime" even though that statement doesn't parse. You could therefore talk about the density of derangements near a given permutation, where distance between permutations is the number of transpositions required to transform one into the other. -- Mike Stay ________________________________________________________________________ Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam and email virus protection.
Not really; at least none that I know of. My last paragraph was really an answer to this question. "There really isn't any obvious choice for the distribution." I don't think measure theory will help. I don't know of any interesting measures on functions from integers to integers; even if we did choose one, the permutations would almost have to be measure zero. A measure on the permutations seems even less likely. One can build a sequence of distributions based on a parameterized distribution. Let P(N,x) be a distribution of positive integers that become "flat" as N->infinity. The x here is a random variable on the interval (0,1). One could, for example, take P(N,x) = floor(-N*log(x)). Choose a parameter n. Start with p completely undefined. For each k from 1 to infinity, if p(k) is not yet defined, choose a random number x_k, and define p(k) = b_(P(n*k,x_k)), where b is the list of all numbers not yet in the range of the p. Then, if there is as yet no m with p(m) = k, choose a random number y_k, take m = c_(P(n*k,y_k)), where c is the list of numbers for which p is not yet defined, and set p(m) = k. Now take the probability that this permutation is a derangement, and take the limit as n->infinity. Note that there are still two arbitrary things about this procedure. The choice of P is obviously one; but it isn't clear that we should use P(n*k,...) instead of, perhaps, P(n+k,...) or P(n^k,...) or whatever. (We can't use P(n,...), because then there would be the same finite chance that 1 is selected at each step, and thus the probability of a derangement would be 0.) It might be that, under suitable assumptions about P, you could show that this limit is 1/e. Even if you did, I don't know how much it would really tell you. Franklin T. Adams-Watters -----Original Message----- From: wrcordw@sandia.gov Well, that might be true, but is there any sense (maybe measure theory?), even if I don't have a nicely defined concept of picking a value at random, where I can still talk about the relative probability? --Bill C. -----Original Message----- From:franktaw@netscape.net Picking an infinite permutation at random is not a well defined concept. It comes down to the similar but simpler problem of picking a (positive) integer at random. Each integer must have probability zero (lim_{n->infinity} 1/n} of being chosen; but that means you can't ever pick one. For a random infinite permutation p, p(1) needs to be a positive integer picked at random. --- If you choose a probability distribution for infinite permutations, the probability that one will be a derangement will depend on what probability distribution you choose. There really isn't any obvious choice for the distribution. Franklin T. Adams-Watters -----Original Message----- From: wrcordw@sandia.gov For n objects, the fraction of permutations that are derangements is about 1/e. Does this extend to an infinite number of objects; i.e. is the probability of picking a permutation "at random" that is a derangement = 1/e? --Bill C. ________________________________________________________________________ Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam and email virus protection.
What about permutations p in which |x-p(x)| <= n?
Under any sort of reasonable distribution, such permutations will have infinitely many fixed points with probability 1. Franklin T. Adams-Watters -----Original Message----- From: davidwwilson@comcast.net What about permutations p in which |x-p(x)| <= n? ________________________________________________________________________ Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam and email virus protection.
The following paper is probably relevant to this discussion: Fulman and Guralnick "Derangements in simple and primitive groups" http://front.math.ucdavis.edu/math.GR/0208022. In the paper they do discuss the proportion of derangments even in some infinite groups. -- Victor S. Miller | " ... Meanwhile, those of us who can compute can hardly victor@idaccr.org | be expected to keep writing papers saying 'I can do the CCR, Princeton, NJ | following useless calculation in 2 seconds', and indeed 08540 USA | what editor would publish them?" -- Oliver Atkin
Under any sort of reasonable distribution, such permutations will have infinitely many fixed points with probability 1. Franklin T. Adams-Watters -----Original Message----- From: davidwwilson@comcast.net What about permutations p in which |x-p(x)| <= n? ________________________________________________________________________ Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam and email virus protection.
Under any sort of reasonable distribution, such permutations will have infinitely many fixed points with probability 1. Franklin T. Adams-Watters -----Original Message----- From: davidwwilson@comcast.net What about permutations p in which |x-p(x)| <= n? ________________________________________________________________________ Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam and email virus protection.
Summary: (1) AI Escargot is indeed hard, (2) it's nowhere near the hardest findable in a couple hours, and (3) we don't have a good metric for predicting hardness.
Date: Fri, 10 Nov 2006 01:14:15 -0500 (GMT-05:00) From: Daniel Asimov <dasimov@earthlink.net>
IDEA: If (i,j) is empty, let P(i,j) be the subset of {1,2,...,9} that does NOT appear in the union of row i, column j, and the 3x3 box containing (i,j); if (i,j) is filled with the number k, then P(i,j) = {k}. (P stands for "Possible set".)
Let s(i,j) := #(P(i,j)).
I propose that a good intrinsic measure of the difficulty of a Sudoku is the *geometric* mean of the 81 s(i,j)'s, since the number of possibilities is multiplicative.
It would be interesting to see how this measure compares with the time it takes software to solve a puzzle.
I looked a little deeper at this, and had some fun with R programming. I took a sample of 500 randomly generated sudokus, each of which had 58 blanks, so that it was in the same general class as AI Escargot. Each sudoku was solved 5 times, and I took the median solution time for each sudoku as representative of how long that particular one took to solve. (This turned out to be overkill; all 5 times were always pretty close.) The timing data and R programs are available from me on request. Now let's look at how those solution times (in sec) were distributed over the sample of 500: ---------------------------------------------------------
with(result, quantile(median, probs = seq(0, 1, by = 0.25))) 0% 25% 50% 75% 100% 2.85 5.30 6.13 7.17 930.74
That is, the solution time ranged from just under 3sec to around 15min, with most rather tightly clustered around 6sec. What we learn from this: * Most randomly chosen sudokus are of comparable difficulty, i.e., the results were pretty tight about the median. * AI Escargot, which clocks in at around 30sec, is indeed harder than most. In fact, it's at about the 95th percentile, if you look at the spreadsheet of all the sample timings. WAY harder than most. * On the other hand, there exist sudokus which are much more fiendishly difficult. (They are, however, rare.) For example, the worst case in the sample was this, taking about 15min to solve: ---------------------------------------------------------
printSudoku(hardestSudokuInSample(result)) +-------+-------+-------+ | | 7 3 | | | | 6 | 8 7 | | 3 7 | | | +-------+-------+-------+ | | 7 | | | 2 | 5 | 3 8 7 | | 7 | | | +-------+-------+-------+ | 2 1 | 6 | | | | 3 | 4 | | 3 | 2 | 5 6 | +-------+-------+-------+
(Solution also available on request. :-) This is "hard" only in the sense that the algorithm in the R sudoku library -- whatever it is -- takes especially long. I have no idea if a human would find it hard, or would find it hard in comparison to AI Escargot, or ... whatever. * Then I looked at your (Dan's) suggestion about a difficulty metric; let's call that "difficulty" in what follows. It seemed pretty cool, but alas, the data say otherwise: - The difficulty scores are approximately normally distributed, with relatively narrow spread: --------------------------------------------------------- > quantile(difficulty, probs = seq(0, 1, by = 0.25)) 0% 25% 50% 75% 100% 2.520093 2.732545 2.795985 2.857816 3.118089 --------------------------------------------------------- - The difficulty score of AI Escargot is 2.56, i.e., well BELOW the median difficulty?! - The difficulty of the worst-case example above is 2.95. - The correlation of difficulty with median time to solve is only 0.085. Thus the difficulty score explains only about 0.7% of the variance of the times... i.e., very little at all. - A plot of time to solve vs difficulty is almost a horizontal line, i.e., no predictive power. - A linear regression model which tries to predict time from difficulty score (each centered around its mean and scaled by its standard deviation, i.e., as Z scores to get them on the same scale) with slope beta and intercept alpha: time - mean(time) difficulty - mean(difficulty) ----------------- = beta * ----------------------------- + alpha sd(time) sd(difficulty) ... confirms that there is no predictive value. :-( The regression is significant, but explains almost none of the variance (R^2 = 0.006): --------------------------------------------------------- > summary(lm(scale(result$"median") ~ scale(difficulty))) Call: lm(formula = scale(result$median) ~ scale(difficulty)) Residuals: Min 1Q Median 3Q Max -0.4249 -0.1566 -0.0966 -0.0273 27.4646 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -8.659e-17 3.152e-02 -2.75e-15 1.0000 scale(difficulty) 8.570e-02 3.154e-02 2.717 0.0067 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.9968 on 998 degrees of freedom Multiple R-Squared: 0.007344, Adjusted R-squared: 0.006349 F-statistic: 7.383 on 1 and 998 DF, p-value: 0.006697 --------------------------------------------------------- -- Steve Rowley <sgr@alum.mit.edu> http://alum.mit.edu/www/sgr/ Skype: sgr000
I don't comment often here, but two comments. 1. The sudoku you posted has more than one solution. 2. Hardest in the context under discussion ususally means "hardest for a person". This corrolates a bit with backtracking difficulty, but it only corrolates. Making a precise measure of "human difficulty" is difficult. Doug
David Eppstein has made a stab at trying to rate the difficulty of a Sudoku for humans in: "Nonrepetitive Paths and Cycles in Graphs with Application to Sudoku" http://arxiv.org/abs/cs.DS/0507053. I think that the key to understanding this is in looking for algorithms whose space is very limited. I believe that a well-trained human is willing to do a fair amount of computation as long as the number of bits he has to remember in his short term memory isn't too large (this is assuming that the only things that are allowed to be written down and the deduced numbers to be put in a cell). I've found that translating a Sudoku to a satisfiability problem and then giving it to any one of the good SAT solvers yields a solution in under 1 second (often much less). But all of them use far more storage than a human would be willing to use. -- Victor S. Miller | " ... Meanwhile, those of us who can compute can hardly victor@idaccr.org | be expected to keep writing papers saying 'I can do the CCR, Princeton, NJ | following useless calculation in 2 seconds', and indeed 08540 USA | what editor would publish them?" -- Oliver Atkin
Date: Wed, 15 Nov 2006 22:50:05 -0600 (CST) From: Douglas Bowman <bowman@math.niu.edu>
I don't comment often here, but two comments.
1. The sudoku you posted has more than one solution.
Yes, Wouter Meeson pointed this out in private email by supplying another solution.
2. Hardest in the context under discussion ususally means "hardest for a person". This corrolates a bit with backtracking difficulty, but it only corrolates. Making a precise measure of "human difficulty" is difficult.
Indeed. That's why I was explicit about saying I used only computer time, and that Dan's suggested metric didn't seem to mean much in terms of computer time. -- Steve Rowley <sgr@alum.mit.edu> http://alum.mit.edu/www/sgr/ Skype: sgr000
participants (9)
-
Christian Boyer -
Cordwell, William R -
Daniel Asimov -
David Wilson -
Douglas Bowman -
franktaw@netscape.net -
Mike Stay -
Steve Rowley -
victor@idaccr.org