Subtitles section Play video Print subtitles Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. A quick recap for the Fellow Scholars out there who missed some of our earlier episodes. A neural network is a machine learning technique that was inspired by the human brain. It is not a brain simulation by any stretch of the imagination, but it was inspired by the inner workings of the human brain. We can train it on input and output pairs like images, and descriptions, whether the images depict a mug or a bus. The goal is that after training, we would give unknown images to the network and expect it to recognize whether there is a mug or a bus on them. It may happen that during training, it seems that the neural network is doing quite well, but when we provide the unknown images, it falters and almost never gets the answer right. This is the problem of overfitting, and intuitively, it is a bit like students who are not preparing for an exam by obtaining useful knowledge, but students who prepare by memorizing answers from the textbook instead. No wonder their results will be rubbish on a real exam! But no worries, because we have dropout, which is a spectacular way of creating diligent students. This is a technique where we create a network where each of the neurons have a chance to be activated or disabled. A network that is filled with unrealiable units. And I really want you to think about this. If we could have a system with perfectly reliable units, we should probably never go for one that is built from less reliable units instead. What is even more, this piece of work proposes that we should cripple our systems, and seemingly make them worse on purpose. This sounds like a travesty. Why would anyone want to try anything like this? And what is really amazing is that these unreliable units can potentially build a much more useful system that is less prone to overfitting. If we want to win competitions, we have to train many models and average them, as we have seen with the Netflix prize winning algorithm in an earlier episode. It also relates back to the committee of doctors example that is usually more useful than just asking one doctor. And the absolutely amazing thing is that this is exactly what dropout gives us. It gives the average of a very large number of possible neural networks, and we only have to train one network that we cripple here and there to obtain that. This procedure, without dropout, would normally take years and such exorbitant timeframes to compute, and would also raise all kinds of pesky problems we really don't want to deal with. To engage in modesty, let's say that if we are struggling with overfitting, we could do a lot worse than using dropout. It indeed teaches slacking students how to do their homework properly. Please keep in mind using dropout also leads to longer training times, my experience has been between 2 to 10x, but of course, it heavily depends on other external factors. So it is indeed true that dropout is slow compared to training one network, but it is blazing fast at what it actually approximates, which is training an exponential number of models. I think dropout is one of the greatest examples of the beauty and the perils of research, where sometimes the most counterintuitive ideas give us the best solutions. Thanks for watching, and for your generous support, and I'll see you next time!
B1 US dropout network training neural neural network mug Two Minute Papers - Training Deep Neural Networks With Dropout 75 7 alex posted on 2016/05/07 More Share Save Report Video vocabulary