Me today:
Nope, in fact, the set of equations really do have a unique solution. Take the difference between a hypothetical suboptimal solution and the true optimum. This error function likewise has the property that the value at each game state is some weighted average of the values at its successors. That means that from any state with maximal error, you can only reach other states with maximal error. But eventually all states reach a win condition, where even incorrect solutions must agree that your probability of winning is 1. So the maximal error is zero.
One way to compute the P(i, j, k) is to start with some arbitrary value (say all 0), and then compute correct decisions based on these P(i,j,k), and then compute revised values of P(i,j,k) based on these decisions, taking appropriate linear combinations of the old P(i,j,k) to produce the new P(i, j, k), and then iterate. The way to see that this will guarantee that this will converge to the proper value is to realize that the nth iteration of this corresponds to calculating the exact probability values and strategies for the game "Pig, modified with the rule that if the game takes longer than n rolls, Player 1 wins". I wrote a solver for "one checker each backgammon, using the doubling cube" this way, and it converged quite rapidly. Convergence might take a bit longer for Pig, since a typical game is longer, but it seems clear that modifying the game by declaring me the winner after 1000 rolls would not materially affect the outcome. -- Andy.Latto@pobox.com