Hi, On 6/23/20 12:47, Allan Wechsler wrote:
I am not "more informed", so perhaps you should discount my thoughts on the subject.
Ok, sure :).
I also don't yet understand the cheat or exploit that has you worried.
Hmmmm... I'm not seeing it necessarily in the sense of a cheat or exploit, but rather in terms of an unintended consequence of training. That is, one would like to believe training produces quality X, and it produces quality X + Y --- you test for X and it's there, how do you test for Y not being there too? With software, you could conceivably look at the source code and do something about that. With NN? Now what?
LCZero trains against itself, and that training is complete (and the network frozen, no?) by the time it faces Stockfish. If I understand you correctly, you are worried that LCZero is responding to a change in Stockfish's mood, interpreting the change as a sign that Stockfish has noticed that it is losing. LCZero then changes strategy itself to take advantage of Stockfish's inferred weakness.
I thought it more like this: "lc0 shuffling" means "hey, other side, I have no idea what to do here --- help me win please so we both survive training?". And by shuffling, it causes (in this case) Stockfish to shuffle, and eventually that shuffling causes Stockfish to react, which lc0 interprets as *itself* helping *itself* win, which in training was a good trait because nets that win, survive. That is, such self-helping behavior could be selected for by the training.
(a) What exactly is wrong with this? A strong human chess player constantly strives to interpret their opponent's mood, both by watching the action on the board and by watching the opponent's face and mannerisms. If Alice sees Bob flinch, and infers that she has an advantage she has not noticed to that point, she is likely to press, to look harder for weaknesses. Shame on Bob for not having a better poker-face, and likewise, shame on Stockfish for betraying its judgement so transparently in its play.
Like I said, I do not care that anything called 'lc0' or 'stockfish' or 'fruit loops the color of neon banana peel' are playing. I am using this only as an illustration of the general phenomenon where self play training could conceivably select for unintended behavior traits. And then, the problem is how do you prevent that, and how do you know for a fact that it didn't happen.
(b) How could LCZero conceivably learn this trick? It never sees Stockfish play until its training is over and its network locked down. If part of the training regimen were to play millions of games against Stockfish, I would understand the problem (but again, shame on Stockfish for playing transparently like that).
Like I explained above. It doesn't happen all the time. Sometimes, there is shuffling, and after a long while lc0 seemingly realizes the game was a draw all along. So, no help => draw. Let me emphasize again: I couldn't care less what the names of the chess playing programs are, or even that they are playing chess. I am interested in the phenomenon of unintended traits selected by self-play training.
(c) There is no "trusted compiler" in which to install the feared Thompsonian backdoor, unless you mean the LCZero "engine" (the part that serves as a fixed interpreter for the variable neural net).
The nets are "trusted" to be strong because they win (and demonstrably so). But there is no way to tell what the nets are doing to "win" in self training. How do you know the strongest nets do not cheat against themselves in self-play training, by helping themselves win in ways that other nets cannot do imperceptibly to others? How do you know they are not backdooring their own wins to some extent? Their survival depends on whether they do. How would you detect that is happening? And if you can't detect it, why wouldn't it happen?
In what way could such a cheat work? I'm not seeing enough similar pieces to justify the analogy with Thompson's exploit. (Thompson equipped his gimmicked compiler to specially detect when it was compiling two different programs, the login handler and the compiler itself. On all other programs it behaved as advertised.)
Right, it's an incomplete analogy. The part I'm focusing on is that in Thomson's compiler case, at least you could decompile the debugger and find the unintended code. How do you do that with an NN? Andres.