Subtitles section Play video Print subtitles JOSH GORDON: Last episode we trained in Image Classifier using TensorFlow for Poets, and this time, we'll write one using TF.Learn. The problem we'll start on today is classifying handwritten digits from the MNIST dataset, and writing a simple classifier for these is often considered the Hello World of computer vision. Now MNIST is a multi-class classification problem. Given an image of a digit, our job will be to predict which one it is. I wrote an IPython notebook for this episode, and you can find a link to it in the description. And to make it easier for you to configure your environment, I'll start with a quick screencast of installing TensorFlow using Docker. First, here's an outline of what we'll cover. I'll show you how to download the dataset and visualize images. Next, we'll train a classifier, evaluate it, and use it to make predictions on new images. Then we'll visualize the weights the classifier learns to gain intuition for how it works under the hood. Let's start by installing TensorFlow. You can find installation instructions for Docker linked from the Getting Started page on TensorFlow.org, and I'll start this screencast assuming you've just finished downloading and installing Docker itself but haven't started installing TensorFlow. Starting from a fresh install of Docker, the first thing to do is open the Docker Quickstart terminal. And when this appears, you'll see an IP address just below the whale. Copy it down. We'll need it later. Next, we'll launch a Docker container with a TensorFlow image. The image is hosted on Docker hub, and there's a link to that in the description. The image contains TensorFlow with all its dependencies properly configured, and here's the command we'll use to download and launch the image. But first, let's choose the version we want. The versions are on this page, and we'll use the latest release. Now we can copy-paste the command into a terminal and add a colon with the version number. If this is the first time you've run the image, it'll be downloaded automatically. And on subsequent runs, it'll be cached locally. The image starts automatically, and by default, it runs a notebook server. All that's left for us to do is to open up a browser and point it to the IP we jotted down earlier on port 8888. And now we have an IPython notebook that we can experiment with in our browser served by the container. You can find the notebook for this episode in the description and upload it through the UI. OK. Now onto code. Here are the imports we'll use. I'll use matplotlib to display images, and, of course, we'll use TF.Learn to train the classifier. All of these are installed with the image. Next, we'll download the MNIST dataset, and we have a nice one liner for that. The dataset contains thousands of labeled images of handwritten digits. It's pre-divided into train, which is 55,000, and test, which is 10,000. Let's visualize a few of these to get a feel. This code displays an image along with its label, and you might notice I'm reshaping the image, and I'll explain why in a bit. The first image from the testing set is a seven, and you can see the example index as well as the label. Here's the second image. Now both of these are clearly drawn, but there's a variety of different handwriting samples in this dataset. Here's an image that's harder to recognize. These images are low resolution, just 28 by 28 pixels in grayscale. Also note they're properly segmented. That means each image contains exactly one digit. Now let's talk about the features we'll use. When we're working with images, we use the raw pixels as features. That's because extracting useful features from images, like textures and shapes, is hard. Now a 28 by 28 image has 784 pixels, so we have 784 features. And here, we're using the flattened representation of the image. To flatten an image means to convert it from a 2D array to a 1D array by unstacking the rows and lining them up. That's why we had to reshape this array to display it earlier. Now we can initialize the classifier, and here, we'll use a linear classifier. We'll provide two parameters. The first indicates how many classes we have, and there are 10, one for each type of digit. The second informs the classifier about the features we'll use. Now I'll draw a quick diagram of a linear classifier to give you a high level preview of how it works under the hood. You could think of the classifier as adding up the evidence that the image is each type of digit. The input nodes are on the top, represented by Xes, and the output nodes are on the bottom represented by Ys. We have one input node for each feature or pixel in the image and one output node for each digit the image could represent. Here, we have 784 inputs and 10 outputs. I've just drawn a few of them, so everything fits on the screen. Now the inputs and outputs are fully connected, and each of these edges has a weight. When we classify an image, you can think of each pixel as going on a journey. First, it flows into its input node, and next, it travels along the edges. Along the way, it's multiplied by the weight on the edge, and the output nodes gather evidence that the image we're classifying represents each type of digit. The more evidence we gather, say on the eight output, the more likely it is the image is an eight. And to calculate how much evidence we have, we sum the value of the pixel intensities multiplied by the weights. Then we can predict that the image belongs to the output node with the most evidence. The important part is the weights, and by setting them properly, we can get accurate classifications. We begin with random weights, then gradually adjust them towards better values. And this happens inside the fit method. Once we have a trained model, we can evaluate it. Using the evaluate method, we see that it correctly classifies about 90% of the test set. We can also make predictions on individual images. Here's one that it correctly classifies, and here's one that it gets wrong. Now I want to show you how to visualize the weights the classifier learns. Here, positive weights are drawn in red, and negative weights are drawn in blue. So what do these weights tell us? Well, to understand that, I'll show four images of ones. They're all drawn slightly differently, but take a look at the middle pixel. Notice that it's filled in on every image. When that pixel is filled in, it's evidence that the image we're looking at is a one, so we'd expect a highway on that edge. Now let's take a look at four zeros. Notice that the middle pixel is empty. Although there's lots of ways to draw zeros, if that middle pixel is filled in, it's evidence against the image being a zero, so we'd expect a negative weight on the edge. And looking at the images of the weights, we can almost see outlines of the digits drawn in red for each class. We were able to visualize these, because we started with 784 pixels, and we learned 10 weights for each, one for each type of digit. We then reshape the weights into a 2D array. OK. That's it for now. Of course, there's lots more to learn about this, and I put my favorite links in the description. Coming up next time, we'll experiment with deep learning, and I'll cover in more detail what we introduced here today. Thanks very much for watching, and I'll see you then. [MUSIC PLAYING]
B1 US image classifier docker digit pixel dataset Classifying Handwritten Digits with TF.Learn - Machine Learning Recipes #7 21 8 scu.louis posted on 2017/07/17 More Share Save Report Video vocabulary