Re: [math-fun] Toss Up Komi

2 Sep 2007

      Me yesterday:
...
Their way to solve the game is to know all values of P(i,j,k),
your probability of winning where  (i,j,k) = (#pts for you, #pts for
other player, #pts you've accumulated so far on this turn, which
you'd risk by rolling again).
[...]
I do believe that if you knew the correct roll/don't roll decisions,
you could solve for the probabilities.  But it seems to me that you
can have some set of P()s and sub-optimal set of roll/don't roll
decision which are compatible with one another, but suboptimal
globally.
Me today:

Nope, in fact, the set of equations really do have a unique solution.
Take the difference between a hypothetical suboptimal solution and
the true optimum.  This error function likewise has the property that
the value at each game state is some weighted average of the values
at its successors.  That means that from any state with maximal error,
you can only reach other states with maximal error.  But eventually
all states reach a win condition, where even incorrect solutions must
agree that your probability of winning is 1.  So the maximal error is zero.

The usual proof that these "Bellman equations" have a unique solution
requires that you include a parameter gamma<1 and have rewards n
turns in the future discounted by a multiplicative factor of gamma^n.
Then the maximal error is gamma times the weighted average of the
error at its state's successors, hence all errors are zero.  Here we've
set gamma=1 but we can appeal to the boundary conditions to get
zero error again.

--Michael Kleber

-- 
It is very dark and after 2000.
If you continue you are likely to be eaten by a bleen.