Placeholder Image

Subtitles section Play video

  • ♪ (music) ♪

  • Hi, everyone, and welcome to episode 2 of TensorFlow Zero to Hero.

  • In the last episode,

  • you learned about machine learning and how it works.

  • You saw a simple example of matching numbers to each other

  • and how, using Python code,

  • a computer could learn through trial and error

  • what the relationship between the numbers was.

  • In this episode, you're going to take it a little further

  • by teaching a computer

  • how to see and recognize different objects.

  • For example, look at these pictures.

  • How many shoes do you see?

  • You might say two, right?

  • But how do you know they are shoes?

  • Imagine if somebody had never seen shoes before.

  • How would you tell them

  • that despite the great difference between the high heel and the sports shoe,

  • they're still both shoes.

  • Maybe they would think if it's red, it's a shoe.

  • Because all they've seen are these two, and they're both red.

  • But, of course, it's not that simple.

  • But how do you know that these two are shoes?

  • Because, in your life, you've seen lots of shoes,

  • and you've learned to understand what makes a shoe a shoe.

  • So it follows logically that if we show a computer lots of shoes,

  • it will be able to recognize what a shoe is.

  • And that's where the dataset called Fashion MNIST is useful.

  • It has 70,000 images in 10 different categories.

  • So there's 7,000 examples of each category, including shoes.

  • Hopefully, seeing 7,000 shoes

  • is enough for a computer to learn what a shoe looks like.

  • The images in Fashion MNIST are only 28x28 pixels.

  • So they're pretty small.

  • And the less data used,

  • the faster it is for a computer to process it.

  • That being said, they still lead to recognizable items of clothing.

  • In this case, you can still see that it's a shoe.

  • In the next few minutes,

  • I'll show you the code that will teach you how to train a computer

  • to recognize items of clothing based on this training data.

  • The type of code you write

  • is almost identical to what you did in the last video.

  • That's part of the power of TensorFlow

  • that allows you to design neural networks for a variety of tasks

  • with a consistent programming API.

  • We'll start by loading the data.

  • The Fashion MNIST dataset is built into TensorFlow,

  • so it's easy to load it with code like this.

  • The training images is a set of 60,000 images,

  • like our ankle boot here.

  • The other 10,000 are a test set that we can use to check to see

  • how well our neural network performs.

  • We'll see them later.

  • The label is a number indicating the class of that type of clothing.

  • So, in this case, the number 09 indicates an ankle boot.

  • Why do you think it would be a number and not just the text, "ankle boot"?

  • There's two main reasons: first, computers deal better with numbers;

  • but perhaps more importantly, there's the issue with bias.

  • If we label it as "ankle boot,"

  • we're already showing a bias towards the English language.

  • So by using a number,

  • you can point to a text description in any language as shown here.

  • Can you guess all of the languages that we used here?

  • When looking at a neural network design,

  • it's always good to explore

  • the input values and the output values first.

  • Here we can see that our neural network is a little more complex

  • than the one in the first episode.

  • Our first layer has the input of shape 28x28,

  • which, if you remember, was the size of our image.

  • Our last layer is 10, which, if you remember,

  • is the number of different items of clothing

  • represented in our dataset.

  • So our neural network will kind of act like a filter,

  • which takes in a 28x28 set of pixels and outputs 1 of 10 values.

  • So what about this number, 128? What does that do?

  • Well, think of it like this, we're going to have 128 functions,

  • each one of which has parameters inside of it.

  • Let's call these f0 through f127.

  • What we want is that when the pixels of the shoe

  • get fed into them, one by one,

  • that the combination of all of these functions

  • will output the correct value.

  • In this case, 9.

  • In order to do that, the computer will need to figure out

  • the parameters inside of these functions to get that result.

  • And it will then extend this

  • to all of the other items of clothing in the dataset.

  • The logic is, once it has done this,

  • then it should be able to recognize items of clothing.

  • So if you remember from the last video,

  • there's the optimizer function and the loss function.

  • The neural network will be initialized with random values.

  • The loss function will then measure how good or how bad the results were,

  • and then with the optimizer,

  • it will generate new parameters for the functions

  • to see if it can do better.

  • You probably also wondered about these.

  • And they're called activation functions.

  • The first one is on the layer of 128 functions, and it's called relu,

  • or rectified linear unit.

  • What it really does is as simple as returning a value

  • if it's greater than zero.

  • So if that function had zero or less as output,

  • it just gets filtered out.

  • And softmax has the effect of picking the biggest number in a set.

  • The output layer in this neural network has 10 items in it,

  • representing the probability

  • that we're looking at that specific item of clothing.

  • So, in this case, it has a high probability that it's item 09,

  • which is our ankle boot.

  • So instead of searching through to find the largest,

  • what softmax does is it sets it to 1 and the rest is 0,

  • so all we have to do is find the 1.

  • Training is then very simple:

  • we fit the training images to the training labels.

  • This time, we'll try it for just 5 epochs.

  • Remember earlier we had 10,000 images and labels that we didn't train with?

  • These are images that the model hasn't previously seen,

  • so we can use them to test how well our model performs.

  • We can do that test by passing them to the evaluate method, like this.

  • And then, finally, we can get predictions back for new images

  • by calling model.predict like this.

  • And that's all it takes to teach a computer

  • how to see and recognize images.

  • You can try this out for yourself

  • in the notebook that I've linked in the description below.

  • Having gone through this, you've probably seen one drawback

  • and that's the fact that the images are always 28x28 grayscale

  • with the item of clothing centered.

  • So what if it's just a normal photograph, and we want to recognize its contents,

  • and you don't have the luxury of it being the only thing in the picture

  • as well as being centered?

  • That's where the process of spotting features becomes useful

  • and the tool of convolutional neural networks is your friend.

  • You'll learn all about that in the next video,

  • so don't forget to hit that subscribe button,

  • and I'll see you there.

  • ♪ (music) ♪

♪ (music) ♪

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it