Placeholder Image

Subtitles section Play video

  • JOSH GORDON: Last episode we trained

  • in Image Classifier using TensorFlow for Poets,

  • and this time, we'll write one using TF.Learn.

  • The problem we'll start on today is

  • classifying handwritten digits from the MNIST dataset,

  • and writing a simple classifier for these is often

  • considered the Hello World of computer vision.

  • Now MNIST is a multi-class classification problem.

  • Given an image of a digit, our job

  • will be to predict which one it is.

  • I wrote an IPython notebook for this episode,

  • and you can find a link to it in the description.

  • And to make it easier for you to configure your environment,

  • I'll start with a quick screencast of installing

  • TensorFlow using Docker.

  • First, here's an outline of what we'll cover.

  • I'll show you how to download the dataset

  • and visualize images.

  • Next, we'll train a classifier, evaluate it,

  • and use it to make predictions on new images.

  • Then we'll visualize the weights the classifier learns

  • to gain intuition for how it works under the hood.

  • Let's start by installing TensorFlow.

  • You can find installation instructions

  • for Docker linked from the Getting Started page

  • on TensorFlow.org, and I'll start this screencast

  • assuming you've just finished downloading and installing

  • Docker itself but haven't started installing TensorFlow.

  • Starting from a fresh install of Docker, the first thing to do

  • is open the Docker Quickstart terminal.

  • And when this appears, you'll see an IP address just

  • below the whale.

  • Copy it down.

  • We'll need it later.

  • Next, we'll launch a Docker container

  • with a TensorFlow image.

  • The image is hosted on Docker hub,

  • and there's a link to that in the description.

  • The image contains TensorFlow with all its dependencies

  • properly configured, and here's the command

  • we'll use to download and launch the image.

  • But first, let's choose the version we want.

  • The versions are on this page, and we'll

  • use the latest release.

  • Now we can copy-paste the command into a terminal

  • and add a colon with the version number.

  • If this is the first time you've run the image,

  • it'll be downloaded automatically.

  • And on subsequent runs, it'll be cached locally.

  • The image starts automatically, and by default, it

  • runs a notebook server.

  • All that's left for us to do is to open up a browser

  • and point it to the IP we jotted down earlier on port 8888.

  • And now we have an IPython notebook

  • that we can experiment with in our browser served

  • by the container.

  • You can find the notebook for this episode in the description

  • and upload it through the UI.

  • OK.

  • Now onto code.

  • Here are the imports we'll use.

  • I'll use matplotlib to display images, and, of course,

  • we'll use TF.Learn to train the classifier.

  • All of these are installed with the image.

  • Next, we'll download the MNIST dataset,

  • and we have a nice one liner for that.

  • The dataset contains thousands of labeled images

  • of handwritten digits.

  • It's pre-divided into train, which is 55,000,

  • and test, which is 10,000.

  • Let's visualize a few of these to get a feel.

  • This code displays an image along with its label,

  • and you might notice I'm reshaping the image,

  • and I'll explain why in a bit.

  • The first image from the testing set is a seven,

  • and you can see the example index as well as the label.

  • Here's the second image.

  • Now both of these are clearly drawn,

  • but there's a variety of different handwriting

  • samples in this dataset.

  • Here's an image that's harder to recognize.

  • These images are low resolution, just 28

  • by 28 pixels in grayscale.

  • Also note they're properly segmented.

  • That means each image contains exactly one digit.

  • Now let's talk about the features we'll use.

  • When we're working with images, we

  • use the raw pixels as features.

  • That's because extracting useful features

  • from images, like textures and shapes, is hard.

  • Now a 28 by 28 image has 784 pixels,

  • so we have 784 features.

  • And here, we're using the flattened representation

  • of the image.

  • To flatten an image means to convert it from a 2D array

  • to a 1D array by unstacking the rows and lining them up.

  • That's why we had to reshape this array

  • to display it earlier.

  • Now we can initialize the classifier,

  • and here, we'll use a linear classifier.

  • We'll provide two parameters.

  • The first indicates how many classes we have,

  • and there are 10, one for each type of digit.

  • The second informs the classifier

  • about the features we'll use.

  • Now I'll draw a quick diagram of a linear classifier

  • to give you a high level preview of how it works under the hood.

  • You could think of the classifier

  • as adding up the evidence that the image is

  • each type of digit.

  • The input nodes are on the top, represented by Xes,

  • and the output nodes are on the bottom represented by Ys.

  • We have one input node for each feature or pixel in the image

  • and one output node for each digit

  • the image could represent.

  • Here, we have 784 inputs and 10 outputs.

  • I've just drawn a few of them, so everything

  • fits on the screen.

  • Now the inputs and outputs are fully connected,

  • and each of these edges has a weight.

  • When we classify an image, you can think of each pixel

  • as going on a journey.

  • First, it flows into its input node,

  • and next, it travels along the edges.

  • Along the way, it's multiplied by the weight on the edge,

  • and the output nodes gather evidence

  • that the image we're classifying represents each type of digit.

  • The more evidence we gather, say on the eight output,

  • the more likely it is the image is an eight.

  • And to calculate how much evidence we have,

  • we sum the value of the pixel intensities multiplied

  • by the weights.

  • Then we can predict that the image belongs to the output

  • node with the most evidence.

  • The important part is the weights,

  • and by setting them properly, we can

  • get accurate classifications.

  • We begin with random weights, then gradually adjust them

  • towards better values.

  • And this happens inside the fit method.

  • Once we have a trained model, we can evaluate it.

  • Using the evaluate method, we see

  • that it correctly classifies about 90% of the test set.

  • We can also make predictions on individual images.

  • Here's one that it correctly classifies, and here's

  • one that it gets wrong.

  • Now I want to show you how to visualize the weights

  • the classifier learns.

  • Here, positive weights are drawn in red,

  • and negative weights are drawn in blue.

  • So what do these weights tell us?

  • Well, to understand that, I'll show four images of ones.

  • They're all drawn slightly differently,

  • but take a look at the middle pixel.

  • Notice that it's filled in on every image.

  • When that pixel is filled in, it's

  • evidence that the image we're looking at is a one,

  • so we'd expect a highway on that edge.

  • Now let's take a look at four zeros.

  • Notice that the middle pixel is empty.

  • Although there's lots of ways to draw zeros,

  • if that middle pixel is filled in,

  • it's evidence against the image being a zero,

  • so we'd expect a negative weight on the edge.

  • And looking at the images of the weights,

  • we can almost see outlines of the digits drawn

  • in red for each class.

  • We were able to visualize these, because we started

  • with 784 pixels, and we learned 10 weights for each, one

  • for each type of digit.

  • We then reshape the weights into a 2D array.

  • OK.

  • That's it for now.

  • Of course, there's lots more to learn about this,

  • and I put my favorite links in the description.

  • Coming up next time, we'll experiment with deep learning,

  • and I'll cover in more detail what we introduced here today.

  • Thanks very much for watching, and I'll see you then.

  • [MUSIC PLAYING]

JOSH GORDON: Last episode we trained

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it