DeepMind engineers didn't rest on their laurels and have since written a new AI called AlphaGo Zero which is, well, insanely good. Only 4.9 million simulated games were needed to train Zero, compared to the original AlphaGo's 30 million.
It also seems that AlphaGo Zero doesn't need to use "rollouts". But the game of Go represents a confined problem with rules and a clear definition of when the game ends-albeit there are a mind-numbing amount of game variations. It makes that calculation - of which paths to prune - based on what it has learned in earlier play about the moves and overall board setups that lead to wins. The aim is to surround more of the board than your opponent, and stones can be placed on the intersection of the lines on the board.
So the people at DeepMind decided to make a Go-playing AI that could teach itself how to play. The goal of the game is to use stones to surround a larger part of the board than your opponent. The rules of good play, in other words, can not easily be explained or written in code.
Go is exemplary in many ways of the difficulties faced by artificial intelligence: a challenging decision-making task, an intractable search space, and an optimal solution so complex it appears infeasible to directly approximate using a policy or value function.
Since Go was invented, thousands of strategy books have been published but even so, there are many creative moves that haven't been explored. Its artificial neural network was trained on a vast library of games played by human masters.
"We are now past the point where we debate the gap between the capability of AlphaGo and humans".
Humans still got this, some might say. "AlphaGo Zero is now the strongest version of our program and shows how much progress we can make even with less computing power and zero use of human data".
AlphaGo wasn't the best Go player on the planet for very long. It was absolutely left to its own devices.
This starts with from random moves with only the board and pieces as inputs and without human data. AlphaGo Zero, along with AlphaGo Master, each only require a single machine with four TPUs.
The version of AlphaGo that scalped Lee Sedol (called AlphaGo Lee) was taught through a kind of supervised learning, where it digested the records of millions of human Go moves, as well as self-play.
After just 40 days, AlphaGo Zero did the unthinkable and achieved a success rate of 90pc over the original AlphaGo.
In doing so, it surpassed the performance of all previous versions, including those which beat the World Go Champions Lee Sedol and Ke Jie, becoming arguably the strongest Go player of all time. In less than two months, this machine went from tabula rasa to reinventing Go. Furthermore, the AI will be subject to human limits, since its learning is bounded to pre-existent human knowledge. Sequences of "laddered" stones, played in a staircase-like pattern across the board, are one of the first things that humans learn when practicing the game.
Though extremely impressive, AlphaGo Zero won't replace humans anytime soon.
Mok also added that general trends in the Go world are now being influenced by AlphaGo's playing style.
The ability of artificial intelligence to teach itself could facilitate answers to some of the questions now beyond scientific understanding. As such, it could prove incredibly useful and lucrative, revolutionizing everything from investing to medicine. "If similar techniques can be applied to other structured problems, such as protein folding, reducing energy consumption or searching for revolutionary new materials, the resulting breakthroughs have the potential to positively impact society", writes the company in its blog post.