Subtitles section Play video Print subtitles Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. Hold on to your papers, because this work on AlphaGo is absolute insanity. In the game of Go, the players put stones on a table where the objective is to surround more territory than the opponent. This is a beautiful game that is particularly interesting for AI research, because the space of possible moves is vastly larger than in chess, which means that using any sort of exhaustive search is out of question and we have to resort to smart algorithms that are able to identify a small number of strong moves within this stupendously large search space. The first incarnation of DeepMind's Go AI, AlphaGo uses a combination of a policy network that is responsible for predicting the moves, and a value network that predicts the winner of the game after it plays it to the end against itself. These are both deep neural networks and they are then combined with a technique called Monte Carlo Tree Search to be able to narrow down the search in this this large search space. This algorithm started out with a bootstrapping process where it was shown thousands of games that were used to learn the basics of Go. Based on this, it is clear that such an algorithm can learn to be as good as formidable human players. But the big question was, how could it possibly become even better than the professionals that it has observed? How could the disciple become better than its master? The solution is that after it has learned what it can from these games, it plays against itself many-many times to improve its skills. This second phase is the main part of the training that takes the most time. Let's call this base algorithm AlphaGo Fan, which was used to play against Fan Hui a 2-dan European Go champion, who was defeated 5 to 0. This was a historic moment and the first time an AI beat a professional Go player without a handicap. Fan Hui described his experience as playing against a very strong and stable player and he also mentioned that the algorithm felt very human-like. Some voiced their doubts within the Go community and noted that the algorithm would never be able to beat Lee Sedol, a 9-dan world champion, and winner of 18 international titles. Just to give you an intuition of the difference, based on their Elo points, Lee Sedol is expected to beat Fan Hui 97 times out of 100 games. So a few months later, DeepMind organized a huge media event where they would challenge him to play against AlphaGo. This was a slightly modified version of the base algorithm that used a deeper neural network with more layers and was trained using more resources than the previous version. There was also an algorithmic change to the policy networks, the details on this are available in the paper in the description, it is a great read, make sure to have a look. Let's call this algorithm AlphaGo Lee. This event was watched all around the world and can perhaps be compared to Kasparov's public chess games against Deep Blue. I have the fondest memories of waking up super early in the morning, jumping out of the bed in excitement to watch all these Go matches. And in a long and nailbiting series, Lee Sedol was defeated 4 to 1 by the AI. With significantly less media attention, the next phase came bearing the name AlphaGo Master, which used around ten times less tensor processing units than the AlphaGo Lee and became an even stronger player. This algorithm played against human professionals online in January 2017 and won all 60 matches it had played. This is insanity, but if you think that's it, well, hold on to your papers now. In this newest work, AlphaGo has reached its next form, AlphaGo Zero. This variant does not have access to any human played games in the first phase and learns completely through self-play. It starts out from absolutely nothing, with just the knowledge of the rules of the game. It was trained for 40 days, and by day 3, it reached the level of AlphaGo Lee, this is above World champion level. Around day 21, it hits the level of AlphaGo Master, which is practically unbeatable to all human beings. And get this, at 40 days, this version surpasses all previous AlphaGo versions and defeats the previously published worldbeater version 100-0. This has kept me up for several nights now and I am completely out of words. In this version, the two neural networks are fused into one, which can be trained more efficiently. It is beautiful to see these curves as they show this neural network starting from a random initialization. It knows the rules, but beyond that, it is completely clueless about the game itself, and it rapidly becomes practically unbeatable. And I left the best part for last - it uses only one single machine. I think it is fair to say that is history unfolding before our eyes. What a time to be alive! Congratulations to the DeepMind team for this remarkable achievement. And, for me, I love talking about research to a wider audience and it is a true privilege to be able to tell these stories to you. Thank you very much for your generous support on Patreon and making me able to spend more and more time with what I love most. Absolutely amazing. And now, I know it's a bit redundant, but from muscle memory, I'll sign out the usual way. Thanks for watching and for your generous support, and I'll see you next time!
B1 UK alphago algorithm deepmind ai neural hui New DeepMind AI Beats AlphaGo 100-0 | Two Minute Papers #201 87 2 jigme.lee888 posted on 2018/08/17 More Share Save Report Video vocabulary