Re: [math-fun] Ken Thompson's compiler hack --- NN applicability?
Before going deep into this... What do you mean by "looks like it's getting stronger," and what do you mean by "helps [them] win?" Helps *which of the two* identical players to win? Win against whom? How could they be changing things that outsiders couldn't see? What could they learn to do except continue to seem to be playing the game? I can imagine, say with chess or go, the NN learning to play a fragile but impressive-looking version of the game. Neither side realizes they are making bad moves, they just have a sort of superstitious view of the game that the other side keeps reinforcing. And the game is confusing enough that human observers can't tell it's crazy. But I can't see how the NN could tell whether it was confusing outside observers, and if it can't tell whether it's succeeding, I don't see how a collaboration strategy could be evolutionarilly maintained. The longer time went on, the more likely the fragile game would be noticed. Or the player itself would notice and suddenly go back to learning the basics. Thompson's hack works because everyone trusts the compiler to referee its own development process. Then Thompson hacks the referee. After that, the hack part never learns. So you would need a situation where people have put an NN in charge of running its own training process. Even then it's harder than Thompson's hack, but I'm going to stop being evil now. There is a C compiler whose compiled code has been automatically proven to match the semantics of its source code. I don't know whether the prover was compiled with that very same C compiler. --Steve
From: Andres Valloud <ten@smallinteger.com> Date: 6/23/20, 2:58 PM
Hi, suppose you're training a neural network via self play. It looks like it's getting stronger. How do you know the versions that get promoted do not also encode, in themselves, by chance, a collaboration mechanism that helps then win?
That is, how do you know the strongest nets do not also help the winning side win when they play the losing side?
How do you know they are not implementing Thompson's compiler hack?
Hi, On 6/24/20 12:08, Steve Witham wrote:
Before going deep into this... What do you mean by "looks like it's getting stronger,"
I mean that the NN seems to be getting stronger as per the self-play training regime.
and what do you mean by "helps [them] win?"
I mean that, since only the NNs that win will survive the self-play training regime, the NNs that help themselves win will have an evolutionary advantage over those than don't. This is fascinating because it appears to imply that emergent collaboration beats individualism.
Helps *which of the two* identical players to win? Win against whom?
The "them" referred to the NNs that help themselves win in self-play. In self-play, you'd have the same network playing against itself. So, I meant that a network that helps the winning side win could get promoted preferentially over networks that obstinately refuse to lose.
How could they be changing things that outsiders couldn't see? What could they learn to do except continue to seem to be playing the game?
See the examples posted by others to this thread. It does happen.
Thompson's hack works because everyone trusts the compiler to referee its own development process. Then Thompson hacks the referee. After that, the hack part never learns. So you would need a situation where people have put an NN in charge of running its own training process. Even then it's harder than Thompson's hack, but I'm going to stop being evil now.
Hmmm, I think you could still have people in charge of the training process, as the posted examples show. The issue here is that the refereeing is done by performance, rather than by auditing whether the method is sound. And determining whether the method is actually sound is hard, because I hear that in general terms it's from very difficult to impossible to audit what an NN has actually learned.
There is a C compiler whose compiled code has been automatically proven to match the semantics of its source code. I don't know whether the prover was compiled with that very same C compiler.
Are you referring to the verified optimizing C compiler Airbus produced? Andres.
participants (2)
-
Andres Valloud -
Steve Witham