Re: [math-fun] Ken Thompson's compiler hack --- NN applicability?

23 Jun 2020

      Hi,

On 6/23/20 12:47, Allan Wechsler wrote:
...
I am not "more informed", so perhaps you should discount my thoughts on the
subject.
Ok, sure :).
...
I also don't yet understand the cheat or exploit that has you worried.
Hmmmm... I'm not seeing it necessarily in the sense of a cheat or 
exploit, but rather in terms of an unintended consequence of training. 
That is, one would like to believe training produces quality X, and it 
produces quality X + Y --- you test for X and it's there, how do you 
test for Y not being there too?

With software, you could conceivably look at the source code and do 
something about that.  With NN?  Now what?
...
LCZero trains against itself, and that training is complete (and the
network frozen, no?) by the time it faces Stockfish. If I understand you
correctly, you are worried that LCZero is responding to a change in
Stockfish's mood, interpreting the change as a sign that Stockfish has
noticed that it is losing. LCZero then changes strategy itself to take
advantage of Stockfish's inferred weakness.
I thought it more like this: "lc0 shuffling" means "hey, other side, I 
have no idea what to do here --- help me win please so we both survive 
training?".  And by shuffling, it causes (in this case) Stockfish to 
shuffle, and eventually that shuffling causes Stockfish to react, which 
lc0 interprets as *itself* helping *itself* win, which in training was a 
good trait because nets that win, survive.  That is, such self-helping 
behavior could be selected for by the training.
...
(a) What exactly is wrong with this? A strong human chess player constantly
strives to interpret their opponent's mood, both by watching the action on
the board and by watching the opponent's face and mannerisms. If Alice sees
Bob flinch, and infers that she has an advantage she has not noticed to
that point, she is likely to press, to look harder for weaknesses. Shame on
Bob for not having a better poker-face, and likewise, shame on Stockfish
for betraying its judgement so transparently in its play.
Like I said, I do not care that anything called 'lc0' or 'stockfish' or 
'fruit loops the color of neon banana peel' are playing.  I am using 
this only as an illustration of the general phenomenon where self play 
training could conceivably select for unintended behavior traits.  And 
then, the problem is how do you prevent that, and how do you know for a 
fact that it didn't happen.
...
(b) How could LCZero conceivably learn this trick? It never sees Stockfish
play until its training is over and its network locked down. If part of the
training regimen were to play millions of games against Stockfish, I would
understand the problem (but again, shame on Stockfish for playing
transparently like that).
Like I explained above.  It doesn't happen all the time.  Sometimes, 
there is shuffling, and after a long while lc0 seemingly realizes the 
game was a draw all along.  So, no help => draw.

Let me emphasize again: I couldn't care less what the names of the chess 
playing programs are, or even that they are playing chess.  I am 
interested in the phenomenon of unintended traits selected by self-play 
training.
...
(c) There is no "trusted compiler" in which to install the feared
Thompsonian backdoor, unless you mean the LCZero "engine" (the part that
serves as a fixed interpreter for the variable neural net).
The nets are "trusted" to be strong because they win (and demonstrably 
so).  But there is no way to tell what the nets are doing to "win" in 
self training.  How do you know the strongest nets do not cheat against 
themselves in self-play training, by helping themselves win in ways that 
other nets cannot do imperceptibly to others?  How do you know they are 
not backdooring their own wins to some extent?  Their survival depends 
on whether they do.  How would you detect that is happening?  And if you 
can't detect it, why wouldn't it happen?
...
In what way
could such a cheat work? I'm not seeing enough similar pieces to justify
the analogy with Thompson's exploit. (Thompson equipped his gimmicked
compiler to specially detect when it was compiling two different programs,
the login handler and the compiler itself. On all other programs it behaved
as advertised.)
Right, it's an incomplete analogy.  The part I'm focusing on is that in 
Thomson's compiler case, at least you could decompile the debugger and 
find the unintended code.  How do you do that with an NN?

Andres.

Re: [math-fun] Ken Thompson's compiler hack --- NN applicability?

Andres Valloud