Summary: (1) AI Escargot is indeed hard, (2) it's nowhere near the hardest findable in a couple hours, and (3) we don't have a good metric for predicting hardness.
Date: Fri, 10 Nov 2006 01:14:15 -0500 (GMT-05:00) From: Daniel Asimov <dasimov@earthlink.net>
IDEA: If (i,j) is empty, let P(i,j) be the subset of {1,2,...,9} that does NOT appear in the union of row i, column j, and the 3x3 box containing (i,j); if (i,j) is filled with the number k, then P(i,j) = {k}. (P stands for "Possible set".)
Let s(i,j) := #(P(i,j)).
I propose that a good intrinsic measure of the difficulty of a Sudoku is the *geometric* mean of the 81 s(i,j)'s, since the number of possibilities is multiplicative.
It would be interesting to see how this measure compares with the time it takes software to solve a puzzle.
I looked a little deeper at this, and had some fun with R programming. I took a sample of 500 randomly generated sudokus, each of which had 58 blanks, so that it was in the same general class as AI Escargot. Each sudoku was solved 5 times, and I took the median solution time for each sudoku as representative of how long that particular one took to solve. (This turned out to be overkill; all 5 times were always pretty close.) The timing data and R programs are available from me on request. Now let's look at how those solution times (in sec) were distributed over the sample of 500: ---------------------------------------------------------
with(result, quantile(median, probs = seq(0, 1, by = 0.25))) 0% 25% 50% 75% 100% 2.85 5.30 6.13 7.17 930.74
That is, the solution time ranged from just under 3sec to around 15min, with most rather tightly clustered around 6sec. What we learn from this: * Most randomly chosen sudokus are of comparable difficulty, i.e., the results were pretty tight about the median. * AI Escargot, which clocks in at around 30sec, is indeed harder than most. In fact, it's at about the 95th percentile, if you look at the spreadsheet of all the sample timings. WAY harder than most. * On the other hand, there exist sudokus which are much more fiendishly difficult. (They are, however, rare.) For example, the worst case in the sample was this, taking about 15min to solve: ---------------------------------------------------------
printSudoku(hardestSudokuInSample(result)) +-------+-------+-------+ | | 7 3 | | | | 6 | 8 7 | | 3 7 | | | +-------+-------+-------+ | | 7 | | | 2 | 5 | 3 8 7 | | 7 | | | +-------+-------+-------+ | 2 1 | 6 | | | | 3 | 4 | | 3 | 2 | 5 6 | +-------+-------+-------+
(Solution also available on request. :-) This is "hard" only in the sense that the algorithm in the R sudoku library -- whatever it is -- takes especially long. I have no idea if a human would find it hard, or would find it hard in comparison to AI Escargot, or ... whatever. * Then I looked at your (Dan's) suggestion about a difficulty metric; let's call that "difficulty" in what follows. It seemed pretty cool, but alas, the data say otherwise: - The difficulty scores are approximately normally distributed, with relatively narrow spread: --------------------------------------------------------- > quantile(difficulty, probs = seq(0, 1, by = 0.25)) 0% 25% 50% 75% 100% 2.520093 2.732545 2.795985 2.857816 3.118089 --------------------------------------------------------- - The difficulty score of AI Escargot is 2.56, i.e., well BELOW the median difficulty?! - The difficulty of the worst-case example above is 2.95. - The correlation of difficulty with median time to solve is only 0.085. Thus the difficulty score explains only about 0.7% of the variance of the times... i.e., very little at all. - A plot of time to solve vs difficulty is almost a horizontal line, i.e., no predictive power. - A linear regression model which tries to predict time from difficulty score (each centered around its mean and scaled by its standard deviation, i.e., as Z scores to get them on the same scale) with slope beta and intercept alpha: time - mean(time) difficulty - mean(difficulty) ----------------- = beta * ----------------------------- + alpha sd(time) sd(difficulty) ... confirms that there is no predictive value. :-( The regression is significant, but explains almost none of the variance (R^2 = 0.006): --------------------------------------------------------- > summary(lm(scale(result$"median") ~ scale(difficulty))) Call: lm(formula = scale(result$median) ~ scale(difficulty)) Residuals: Min 1Q Median 3Q Max -0.4249 -0.1566 -0.0966 -0.0273 27.4646 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -8.659e-17 3.152e-02 -2.75e-15 1.0000 scale(difficulty) 8.570e-02 3.154e-02 2.717 0.0067 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.9968 on 998 degrees of freedom Multiple R-Squared: 0.007344, Adjusted R-squared: 0.006349 F-statistic: 7.383 on 1 and 998 DF, p-value: 0.006697 --------------------------------------------------------- -- Steve Rowley <sgr@alum.mit.edu> http://alum.mit.edu/www/sgr/ Skype: sgr000