Specifically, this is in regards to Leela Chess Zero. If you look at many games played against Stockfish, especially those of a while ago, you get the impression that this happens: 1. A locked position develops, both sides make no progress. Both evaluations are mildly in favor of lc0. 2. Lc0 starts shuffling, i.e. making moves that do not improve its position (and also do not make it worse). Stockfish does the same. 3. But, eventually, Stockfish's evaluation shows a big advantage for lc0, that lc0 does not yet see. Stockfish reacts. 4. Lc0 then perceives the reaction and plays into the new weakness, eventually wins. So, how do you know the lc0 nets that get promoted as strong are not encoding, in themselves, the hints their own winning sides need to survive the training regime? That is, is the self play training process optimizing for wins achieved unilaterally, or collaboratively? Of course, this example is very specific for lc0 in that I thought this was possible first while watching those particular games. However, I am interested in the general concept. An issue I see is that, without an audit trail, finding whether an NN did this on its own is going to be very difficult. I'd imagine a way to ameliorate this problem is diversity (as in biological diversity, and for the same reasons). I'm interested to hear more informed opinions on this. Are there any? On 6/23/20 12:13, Allan Wechsler wrote:
There certainly are concerns of this kind, which can mostly be allayed by proper training regimen. You'd have to be a bit more explicit about the training regimen you envisage, before specific concerns about the contestants gaming your regimen could be addressed. In particular: how does a contestant determine that it is on the "losing side"?
On Tue, Jun 23, 2020 at 2:59 PM Andres Valloud <ten@smallinteger.com> wrote:
Hi, suppose you're training a neural network via self play. It looks like it's getting stronger. How do you know the versions that get promoted do not also encode, in themselves, by chance, a collaboration mechanism that helps then win?
That is, how do you know the strongest nets do not also help the winning side win when they play the losing side?
How do you know they are not implementing Thompson's compiler hack?
Andres.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun