Re: [math-fun] Ken Thompson's compiler hack --- NN applicability?

23 Jun 2020

      Specifically, this is in regards to Leela Chess Zero.  If you look at 
many games played against Stockfish, especially those of a while ago, 
you get the impression that this happens:

1.  A locked position develops, both sides make no progress.  Both 
evaluations are mildly in favor of lc0.

2.  Lc0 starts shuffling, i.e. making moves that do not improve its 
position (and also do not make it worse).  Stockfish does the same.

3.  But, eventually, Stockfish's evaluation shows a big advantage for 
lc0, that lc0 does not yet see.  Stockfish reacts.

4.  Lc0 then perceives the reaction and plays into the new weakness, 
eventually wins.

So, how do you know the lc0 nets that get promoted as strong are not 
encoding, in themselves, the hints their own winning sides need to 
survive the training regime?  That is, is the self play training process 
optimizing for wins achieved unilaterally, or collaboratively?

Of course, this example is very specific for lc0 in that I thought this 
was possible first while watching those particular games.  However, I am 
interested in the general concept.

An issue I see is that, without an audit trail, finding whether an NN 
did this on its own is going to be very difficult.  I'd imagine a way to 
ameliorate this problem is diversity (as in biological diversity, and 
for the same reasons).

I'm interested to hear more informed opinions on this.  Are there any?

On 6/23/20 12:13, Allan Wechsler wrote:
...
There certainly are concerns of this kind, which can mostly be allayed by
proper training regimen. You'd have to be a bit more explicit about the
training regimen you envisage, before specific concerns about the
contestants gaming your regimen could be addressed. In particular: how does
a contestant determine that it is on the "losing side"?
On Tue, Jun 23, 2020 at 2:59 PM Andres Valloud <ten@smallinteger.com> wrote:
...
Hi, suppose you're training a neural network via self play.  It looks
like it's getting stronger.  How do you know the versions that get
promoted do not also encode, in themselves, by chance, a collaboration
mechanism that helps then win?
That is, how do you know the strongest nets do not also help the winning
side win when they play the losing side?
How do you know they are not implementing Thompson's compiler hack?
Andres.
_______________________________________________
math-fun mailing list
math-fun@mailman.xmission.com
https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
_______________________________________________
math-fun mailing list
math-fun@mailman.xmission.com
https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun