Me yesterday:
Their way to solve the game is to know all values of P(i,j,k), your probability of winning where (i,j,k) = (#pts for you, #pts for other player, #pts you've accumulated so far on this turn, which you'd risk by rolling again). [...] I do believe that if you knew the correct roll/don't roll decisions, you could solve for the probabilities. But it seems to me that you can have some set of P()s and sub-optimal set of roll/don't roll decision which are compatible with one another, but suboptimal globally.
Me today: Nope, in fact, the set of equations really do have a unique solution. Take the difference between a hypothetical suboptimal solution and the true optimum. This error function likewise has the property that the value at each game state is some weighted average of the values at its successors. That means that from any state with maximal error, you can only reach other states with maximal error. But eventually all states reach a win condition, where even incorrect solutions must agree that your probability of winning is 1. So the maximal error is zero. The usual proof that these "Bellman equations" have a unique solution requires that you include a parameter gamma<1 and have rewards n turns in the future discounted by a multiplicative factor of gamma^n. Then the maximal error is gamma times the weighted average of the error at its state's successors, hence all errors are zero. Here we've set gamma=1 but we can appeal to the boundary conditions to get zero error again. --Michael Kleber -- It is very dark and after 2000. If you continue you are likely to be eaten by a bleen.