Re: [math-fun] Vague question about modeling human skill at specific activities

10 Feb 2012

      As someone who has played competitive poker, backgammon, and bridge,
I've thought about this a lot, mostly in the context of games that
have a definite luck component, since people are always asking
questions like "which game has more skill?", or "Is poker mostly a
game of skill or of luck?", and I've tried to figure out whether there
is even a precise way to frame such questions, let alone answer them.

In the context of games with a chance component, you can't just talk
about an "essentially zero" chance to win, since anyone can sometimes
beat anyone. So you have to choose some arbitrary threshold, say 95%,
and say that I'm "exactly one level better than you" if I beat you 95%
of the time. As long as we choose this threshold the same way for all
the games we're considering, we can still hope to make meaningful
intergame comparisons on things like number of levels. However, as Tom
points out,

On Fri, Feb 10, 2012 at 2:01 PM, Tom Rokicki <rokicki@gmail.com> wrote:
...
I think the number of "levels" depends largely on how much
variance we expect in performance from game to game, and
how much the game counteracts that variance through
"repeated measurement".
the length of the contest creates another obstacle to comparison. You
have to be a lot more skillful to be ahead after an hour of poker, or
win a 7-point match in backgammon, 95% of the time than you do to be
ahead after 10 hours of poker, or win a 21-point match in backgammon,
95% of the time. So unless you want to reach conclusions like "soccer
has less opportunity for skillful play than 'best-2-out-of-3 soccer
games', as can be seen by the fact that it has fewer levels", you want
to normalize by making the measure of a level something like "I am
exactly one level better than you if I win a 5-hour contest 90% of the
time".

A third obstacle to answering the question of how many levels of play
a game has is how to define the top and bottom of the scales. The top
is easy, since there are only a few reasonable choices; you can set
the top level as the level of the best human player in the world, or
the best human or computer player in the world, or at perfect play
(though the last will only give you speculation, not data, unless the
game has been solved).

But defining the bottom level of a game is much less clear. You
mention the cutoff of " all
players who know how to play and have a racket." But the number of
levels of play is going to depend a lot on whether you count players
like me, who could barely play back in high school and haven't picked
up a racket since. And you get strange cultural influences on your
measurement: I suspect that there are more poker levels than
backgammon because almost everyone owns a deck of cards and knows the
rules of poker, while those who own a backgammon board and know the
rules are likely to be people with some interest and aptitude for
games.
The worst Go player in Japan is probably considerably worse than the
worst Go player in the US, because people who are bad at Go are much
more likely to have a Go board and know the rules if they live in
Japan than if they live in the US.

A related question relates to the ability to make rating systems with
good predictive value. Even in games where the ability to beat someone
is transitive, the correct rating system depends on "how transitive"
the game is. Let's suppose that we want a rating difference of 100
points to mean winning two games out of 3. Then we structure the
ratings so that  if you and I play a long series of games, and I win
2/3 of them, our ratings will converge to ratings with a difference of
100. But now in designing a rating system, you have to choose a value
of p such that if I play a long series of games against a player, and
win p of them, our rating difference will converge to 200. So it seems
to me that to design a rating system, you have to answer the question
"If A beats B 2/3 of the time, and B beats C 2/3 of the time, what
fraction of the time will A beat C?" I don't see any a priori reason
this number shouldn't vary from game to game, and if it doesn't match
the number chosen by the designer of the rating system, then either
playing a player 100 points lower, or 200 points lower, will tend to
change your rating, and any stable equilibrium will depend not only on
the relative skill of the players, but on how often they play
opponents at various skill disparities.

Andy

Re: [math-fun] Vague question about modeling human skill at specific activities

Andy Latto