Monday, 02 July

On the difficulty of trying again

Chess

I don't even know how I ended up here. Something - was it an article, or a video - but something prompted me to watch some comentary on the chess matches between AlphaZero vs Stockfish. AlphaZero is Google's/Alphabet's AI machine - both the software of deep learning and through their proprietary TPUs hardware. Stockfish is (was) the best chess computer program in the world. Now, ever since 1997 match between the then world champion Garry Kasparov and Deep Blue, playing chess against computers has been useless in terms of actually beating them.

It's of course because computers are not like people - once you create great software, you can copy it, distribute it and improve it - even if it's illegal or whatever, you can do it if you want to. Humans not so much - once a human dies the best way he can (so far) transfer the knowledge of his lifetime experience is through the slow, inneficient and incomplete multi-level interface of brain-words-mouth-air-ear-words-brain. Or similar path in writing. And errors occur on each interface - your thoughts and emotions are transcribed imperfectly with words, the brains on the two sides understand words in a different way and then transcribing them back into similar feelings is difficult. We call this empathy. Something like json.dumps(yaml.loads('{}')) - it may work perfectly sometimes but most of the time it's impossible to get the same message.

Through thousands of generations of painful trial and error we were able to invent science, engineering and machines that are now beating us in some areas. And as argued above, once computers are able to beat us, they will always be able to beat us - because they can copy their collective knowledge. If I have a child, there is no way to transfer all the human knowledge that has been acquired. A tiny fraction takes a lifetime. A tiny, tiny, tiny fraction. How tiny? Play a game - go and click on wikipedia's random article page. Do this a hundred times. Give yourself a point if you have even heard anything about the title. Give yourself two points if you know what the title means. Three if you can say anything other than the description. Five if you consider yourself knowing quite a bit about it. You may score 10-15 points if you are lucky. Generally, you would score around 5. And the amount of articles on wikipedia are still imaginable - in the English wikipedia it's around 5.5 million.

Okay, back on point - chess and AI. Even though it's now meaningless to play chess against computers, people still enjoy analyzing the games between people and machines and between machines themselves. It's an unsolved game - as in it's unknown whether there is a perfect strategy to always win as in much smaller games like Tic-Tac-Toe. And most probably we will never actually solve it in the mathematical way. The only thing which we can do is optimize solutions by trial and error. So far, the trial and error has been inspired by the human grandmasters and the experience that they have gathered through hundreds of years of playing chess, recording and analyzing the games. Through fine tuning parameters of the pieces values, converting chess positions into numbers of advantage for one side versus another, analyzing the possible moves on each move and iterating deeply through a almost brute force (but pruned brute force) search trees of min-max algorithm and doing that quickly on a fast hardware - we were able to beat humans. Computers don't feel emotions, don't get caught in the moment yada-yada.

AlphaZero is a new category.

Why? Well, for one, according to the paper AlphaZero had not had the advantages of these hundreds of years and collective - hundreds of thousands of years - of playing chess vs the grandmasters and fine tuned parameters and optimized search trees. But it had the advantage of playing versus different versions of itself for the equivalent of hundreds of thousands of human years.

Yup. It had only tabula rasa of the chess rules as you would teach a beginner. No "develop a strong center in the beginning", "castle early" not even "keep material advantage up". It was up to itself to figure that out. Basically the idea as I undrestand it now is this (I am still to enter the "Machine learning" thing, so I'm simplifying because I don't know more yet) - develop a basic understanding of the rules and a neural network that assigns weights - initially random - to decisions based on outcome of win/draw/lose in old fashioned code. Then copy yourself with a few slightly tweaked weights. And then play the two versions against each other a hundred times. The one that wins survives and saves these weights. A new copy is developed with slightly tweaked parameters and played against the winner. And so on for millions and millions of games until there is this one version which you actually want to present as a competition. Brute forced trial and error.

As I watched some of the interviews commenting on AlphaZero with the grandmasters of old - Gary Kasparov, Anatoly Karpov, Viswanathan Anand - they were quite amazed at the games the machine played, citing "alien" play. Of course it's alien, we haven't seen intelligence greater than human (so far). By definition, it's alien. And again remember - it played against not a mere mortal human, but the best computer program as of before. Sometimes it would completely sacrifice "common rules" like material advantage for positional, or play lines that the Stockfish would see as errors but later, Stockfish would reconciliate in it's analysis and see it as winning indeed. As a child, I remember being fascinated by chess and the chessmasters, I considered them with unhuman brains, able to think so much better than average Joe, that they see things which we can't see. Now I feel they are pretty much average Joe or me, looking at the machines in the same way.

The bigger picture

I feel curiosity and complete awe but more interestingly - resistance - when learning about these advancements. What I gather from this is that the best way for something to be the best in whatever it's doing is by continuous trial and error. Humans are not the best subjects at that. We feel pain when we lose. Sometimes the pain is too big to move on. Do computers feel that when you give them -1 point when they lose a game? "Don't be silly, computers don't feel!". And probably that's why they can learn so quickly and not "fall into depression". Or maybe, computer depressions are when a neural network goes so bad that instead of improving itself, it enters somehow a negative feedback cycle - is that even possible? I don't know, but I am curious to learn.

I went on a run after these videos and started thinking about the implications of this. Here it is - a machine trained for about 4 hours from scratch learned to beat the best of the best in a small domain like chess. It learned by continuous trial, tweak, play, lose, adapt, try again. It beat the "learn from history". It was able to "live" thousands of generations of human lives completely dedicated to chess. In four hours. Or ten - doesn't matter, in a very small amount of time compared to human lives. This is what exponential means. When you look at the great history of the Universe, life and human development (recently read Sapiens) - it's all an exponent. It took a very long time to develop single-celled life, then a very long-time to devleop multicellular life but then compared to that everything else is an explosion. Watch this video which is the development of the Universe squashed into 10 minutes. It takes until 1 minute until the end for multicellular life to appear, and then everything happens in the last minute - marine life, plants, trees, lizards, dinosaurs. Primates are in the last 2 seconds, humans - in a blip of the last second. Scale the whole human history into that 10 minute video, computers will be on the last bleep in the same way.

And then chess is the same way. Hundreds or thousands of years of play. Versus 4 hours of training of a machine.

The philosophy

It's difficult for humans to fail. It feels painful and if it happens too many times, you avoid failure.

But failure is the best way to learn. Think of it this way...

What if it wasn't millions of neural networks failures to discover chess. What if the neural networks could live your life a million times. Each time trying something a little different, not caring if it loses or dies sometimes. What would be your checkmate? What would be the best version of yourself if failing wasn't painful?

What if you anticipated pain but didn't care?

Humans avoid pain and seek pleasure. But what if you anticipated the pain, accept it as an iteration and knew that this is the best way to learn proven by millions of generations of a computer program.

Food for thought...