AlphaGo won 8 straight matches with two of the top players of one of the most computationally-complex perfect-information games out there: Go.
Its complexity stems from the sheer amount of possible scenarios that can play out in the game.
I thought that AlphaGo wasn’t really a breakthrough from usual playing techniques, although I failed to say so in public – Maybe I should cultivate the habits of predicting things publicly.
In any case, there are the advantages I think AlphaGo has over other Go AIs:
- More computing resources (memory, processing power)
- Access to better players
The first one is obvious. The second one, maybe not.
So here’s my take on the workings of AlphaGo: It has a combination of the Monte Carlo Tree Search algorithm and a Neural Network or other kind of pattern matching mechanism.
The pattern-matching mechanism, particularly if it is a Neural Network, would benefit from playing a lot of games; it could learn to prefer analyzing a particular “branch” or sequence of movements – we could say, roughly, pursue a train of thought – in a way that makes it likelier for it to win.
If a pattern matching algorithm plays only poor players, it will learn to beat them, but it won’t know how to beat good players. If it plays only a certain kind of game – say, the opponent plays always in a similar pattern – then it has gaps in the game, complete situations it can’t have a good heuristic.
Playing good players means that the tool explores many of the best techniques and possibilities frequented by good players, thus become better at choosing how to play against good players.
Now, you may think “well, it won’t be able to beat poor players, then, right?” But you’d be wrong. Because playing actually implies thinking about many turns in the future, and patterns for winning are already there, and there are techniques to secure your position… well, a poor player won’t be able to foresee much, which a good AI would, and even if the AI couldn’t bias itself towards the smartest plays, it can choose decent ones, which would be enough to beat a poor player.
In essence, I think AlphaGo has two mechanisms, then: one which has a bias towards immediate good-looking plays, and another one which has a bias towards statistically good-looking plays over many games.
I didn’t think all of that before the fourth game of AlphaGo vs Lee Sedol; I just had a hunch that you wouldn’t actually need a revolutionary AI to beat people at go. I may still be proven wrong if and when a paper about AlphaGo describes its inner workings.
What makes me feel more certain about it is this:
Mistake was on move 79, but only came to that realisation on around move 87
Demis Hassabis, CEO of DeepMind
And he had access to the data.
After move 87 or so, AlphaGo went haywire.
While about to post this, I found that the tweets by Demis Hassabis confirm my suspicions.
The neural nets were trained through self-play so there will be gaps in their knowledge, which is why we are here: to test AlphaGo the limit
So, while it’s amazing to see that a computer may outperform a person in a well-specified, perfect information game… I think it is at an advantage! Because Lee Sedol’s mind wasn’t trained on playing computers, but on people. I think that by exposing himself to a dozen or so further games with AlphaGo, Lee Sedol could start routinely beating it.
This reminds me of the image recognition neural nets which mistake static-like photos with all kinds of animals. You can find a set of images in which humans will routinely outperform them. As Go is a game where the “picture” is the result of your and your opponent’s plays, you can routinely set up the images.
Let’s see what happens in today’s games, anyway. Fun times to be alive in.