TF Machine Learning for Programmers (TensorFlow @ O’Reilly AI Conference, San Francisco '18)

Subtitles section Play video

LAURENCE MORONEY: All right.
Shall we get started?
So thanks, everybody, for coming to this session.
I'm going to be talking about TensorFlow
and particularly TensorFlow from a programmer's perspective--
so machine learning for programmers.
I'd like to show some code samples of using TensorFlow
in some simple scenarios as well as one slightly more
advanced scenario.
But before I do that, I always like to just do
a little bit of a level set.
And if you were at the previous session, sorry,
some of the content's going to be similar to what
you've seen already.
But when I like to think about AI,
and when I come to conferences like this one about AI,
or if I read the news about AI, there's
always stories about what it can do or what it might do,
but there's not a whole lot about what it actually is.
So part of my mandate and part of what I actually
like to educate people around is from a programmer's
perspective, what AI actually is, what it is for you,
what you can begin to learn how to program,
and then how you can apply it to your business scenarios.
But we're also at the cusp of this revolution
in this technology, and lots of people
are calling it like the fourth Industrial Revolution.
And for me, I can only describe it
as like it's the third big shift in my own personal career.
And so for me, the first one came in the early to mid-90s
when the web came about.
And if you remember when the web came about,
we were all desktop programmers.
I personally-- my first job was I
was a Visual Basic programmer programming
Windows applications.
Anybody ever do that?
It was fun, wasn't it?
And so then the web came around, and what happened with the web
then is it changed the audience of your application
from one person at a time to many people at a time.
You had to start thinking differently
about how you built your applications to be
able to scale it to lots of people using it.
And also, the runtime changed.
Instead of you being able to write something that
had complete control over the machine,
you would write something to this sort of virtual machine
that the browser gave you.
And then maybe that browser would have plugins like Java
and stuff like that you could use
to make it more intelligent.
But as a result, what ended up happening
was this paradigm shift gave birth to whole new industries.
And I work for a small company called Google.
Anybody heard of them?
And so things like Google weren't possible.
Anybody remember gophers?
Yeah, so that's really old school, right?
What gophers where were almost the opposite of a search
engine.
A search engine, like--
you type something into it, and it has already
found the results, and it gives them to you.
A gopher was this little application
that you would send out into the nascent internet,
and it would crawl everywhere, a little bit like a spider,
and then come back with results for you.
So for me, whoever had the great idea
to say, let's flip the axes on that
and come up with this new business paradigm,
ended up building the first search engines.
And as a result, companies like Google and Yahoo were born.
Ditto with things like Facebook--
that wouldn't have been possible with the browser.
Can you imagine trying to--
pre-internet, where there was no standard protocol
for communication, and you'd write desktop applications--
can imagine being able to build something
like a Facebook or a Twitter?
It just wasn't possible.
So that became-- to me, the web was
this first great tectonic shift in my own personal career.
The second one then came with the advent of the smartphone.
So now users had this device that they
can put in their pocket that's got
lots of computational power.
It's got memory.
It's got storage, and it's loaded
with sensors like cameras, and GPS, et cetera.
Now think about the types of applications
that you could build with that.
Now it's a case of companies like Uber became possible.
Now, I personally believe, by the way
that all of the applications are built
by introverts because you'll see all of these great things
that you can do nowadays is because they serve introverts.
I'm highly introverted, and one thing I hate to do
is stand on a street corner and hail a taxi.
So when Uber came along, it was like a dream come true for me
that I could just do something on my phone,
and a car would show up.
And now it's shopping.
It's the same kind of thing, right?
I personally really dislike going to a store
and having somebody say, can I help you?
Can I do something for you?
Can I help you find something?
I'm introverted.
I want to go find it myself, put my eyes down, and take it
to the cash register, and pay for it.
And now online shopping, it's done the same thing.
So I don't know why I went down that rabbit hole,
but it's just one that I find that the second tectonic shift
has been the advent of the mobile application
so that these new businesses, these new companies
became possible.
So the third one that I'm seeing now
is the AI and the machine learning revolution.
Now, there's so much hype around this,
so I like to draw a diagram of the hype cycle.
And so if you think about the hype cycle,
every hype cycle starts off with some kind
of technological trigger.
Now, with AI and machine learning,
that technological trigger really
happened a long time ago.
Machine learning has been something,
and AI has been something that's been in universities.
It's been in industry for quite some time--
decades.
So it's only relatively recently that the intersection
of compute power and data has made it possible so
that now everybody can jump on board-- not just university
researchers.
And with the power of things such as TensorFlow
that I'm going to show later, anybody with a laptop
can start building neural networks
where in the past neural networks
were reserved for the very best of universities.
So that technological trigger that's rebooted,
in many ways, the AI infrastructure,
has only happened in the last few years.
And with any hype cycle, what happens
is you end up with this peak of increased expectations
where everybody is thinking AI's going to be the be-all
and end-all, and will change the world,
and will change everything as we know it,
before it falls into the trough of disillusionment.
And then at some point, we get enlightenment,
and then we head up into the productivity.
So when you think about the web, when
you think about mobile phones and those revolutions
that I spoke about, they all went through this cycle,
and AI went through this cycle.
Now, you can ask 100 people where we are on this life cycle
right now, and you'd probably get 100 different answers.
But I'm going to give my answer that I
think we're right about here.
And when we start looking at the news cycle,
it kind of shows that.
We start looking at news.
We start looking at glossy marketing videos.
AI's is going to do this.
AI's was going to do that.
At the end of the day, AI isn't really doing any of that.
It's smart people building neural networks
with a new form of, a new metaphor for programming
have been the ones who've been able to do them,
have been able to build out these new scenarios.
So we're still heading up that curve
of increased expectations.
And at some point, we're probably
going to end up in the trough of disillusionment
before things will get real and you'll be able to really build
whatever the Uber or the Google of the AI generation's
going to be.
It may be somebody in this room will do that.
I don't know.
So at Google, we have this graph that we
draw that we train our internal engineers
and our internal folks around AI and around the hype around AI.
And we like to layer it in these three ways.
First of all, AI, from a high level,
is the ability to program a computer
to act like an intelligent human.
And how do you do that?
There might be some traditional coding in that,
but there may also be something called machine
learning in that.
And what machine learning is all about is instead of writing
code where it's all about how the human solves a problem,
how they think about a problem, and expressing that
in a language like Java, C#, or C++,
it's a case of you train a computer by getting it
to recognize patterns and then open up whole new scenarios
in that way.
I'm going to talk about that in a little bit more.
And then another part of that is deep
learning with the idea behind deep learning
is now machines being able to take over some of the role
that humans are taking in the machine learning phase.
And where machine learning is all about--
I'm going to, for example, show a slide next
about activity detection.
But in the case of activity detection,
instead of me explicitly programming a computer
to detect activities, I will train a computer
based on people doing those activities.
So let me think about it.
Let me describe it this way.
First of all, how many people in this room are coders?
Have written code?
Oh, wow, most of you.
OK, cool.
What languages, out of interest?
Just shout them out.
[INTERPOSING VOICES]
LAURENCE MORONEY: C#.
Thank you.
[INTERPOSING VOICES]
LAURENCE MORONEY: Python.
OK.
I've written about a bunch of books on C#.
I still love it.
I don't get to use it anymore, but it's nice to hear.
So I heard C#.
I heard Python.
C++?
OK, cool.
Now, what do all of these languages have in common?
Ruby?
Nice.
What do all of these languages have in common?
That you, as a developer, have to figure out
how to express a problem in that language, right?
So if you think about if you're building a problem, if you're
building an application for activity detection,
and say you want to detect an activity of somebody walking--
like I'm wearing a smartwatch right now.
I love it because since I started wearing smartwatches,
I became much more conscious of my own fitness.
And I think about how this smartwatch monitors my activity
that when I start running, I want
it to know that I'm running, so it logs that I'm running.
When I start walking, I want it to do
the same thing, and count calories,
and all that kind of stuff.
But if you think about it from a coding perspective, how would
you build a smartwatch like this one if you're a coder?
Now, you might, for example, be able to detect the speed
that the person's moving at, and you'd write
a little bit of code like this.
If speed is less than 4, then the person's walking.
That's kind of naive because if you're walking uphill,
you're probably going slower.
If you're walking downhill, you're going faster.
But I'll just keep it simple like that.
So in code, you have a problem, and you
have to express the problem in a way
that the computer understands that you can compile,
and then you build an application out of it.
So now I say, OK, what if I'm running?
If I'm running, well, I can probably go by the speed again.
And I say, hey, if my speed is less than a certain amount,
I'm walking.
Otherwise, I'm running.
I go, OK.
Now I've built an activity detector,
and it detects if I'm walking or if I'm running.
Pretty cool.
Pretty easy to do with code.
So I go, OK.
Now, my next scenario is biking.
And I go, OK.
If I'm going based on the speed, the data of my speed,
I can do a similar thing.
If I say if my speed is less than this much, I'm walking.
Otherwise, I'm running, or otherwise I'm biking.
So great.
I've now written an activity detector--
a very naive activity detector-- just by looking at the speed
that the person's moving at.
But now my boss loves to play golf,
and he's like, this is great.
I want you to detect golf, and tell me when I'm playing golf,
and calculate what I'm doing when I'm playing golf.
How do I do that?
I'm in what I call, as a programmer,
the oh crap phase because now I realize that all of this code
that I've written, and all this code that I'm maintaining,
I now have to throw away because it can't
be used in something like this.
This scenario just doesn't become possible with the code
that I've written.
So when I think about going back to the revolutions
that I spoke about--
for example, something like an Uber
wouldn't have been possible before the mobile phone.
Something like a Google wouldn't have been possible
before the web.
And something like my golf detector,
it wouldn't be possible or would be
extremely difficult without machine learning.
So what is machine learning?
So traditional programming I like
to summarize in a diagram like this one.
And traditional programming is a case of you express rules using
a programming language like Ruby, or C#, or whatever.
And you have data that you feed into it,
and you compile that into something
that gives you answers.
So keeping the very simple example
that I have of an activity detector, that's
giving me the answer of you're playing golf.
You're running.
You're walking-- all those kind of things.
The machine learning revolution just flips the axes on this.
So the idea behind the machine learning revolution
is now I feed in answers, I feed in data, and I get out rules.
So instead of me needing to have the intelligence
to define the rules for something,
this revolution is saying that, OK, I'm
going to tell a computer that I'm doing this,
I'm doing that, I'm doing the other,
and it's going to figure out the rules.
It's going to match those patterns
and figure out the rules for me.
So now something like my activity detector for golf,
and walking, and running changes.
So now instead of me writing the code for that, I would say, OK.
I'm going to get lots of people to walk.
I'm going to get lots of people to wear whatever
sensor it is-- like maybe it's a watch or a smartphone--
in their pocket.
And I'm going to gather all that data,
and I'm going to tell a computer,
this is what walking looks like.
I'm going to do the same for running.
I'm going to do the same for biking.
And I may as well do the same for golfing.
So now my scenario becomes expandable,
and I can start detecting things that I previously would not
have been able to detect.
So I've opened up new scenarios that I previously would not be
able to program by using if-then rules or using whatever
language--
C.
Anybody remember the language Prolog?
Anybody use that?
Yeah.
Even Prolog couldn't handle that,
even though they said Prolog was an AI language.
So the idea behind this is it kind of emulates
how the human mind works.
So instead of me telling the computer
by having the intelligence to know what golf looks like, I
train the computer by taking data about what
golf looks like, and the computer recognizes that data,
matches that data.
So in the future, when I give it more data,
it will say, that kind of looks like golf,
so I'm going to say this is golf.
So we talk about learning.
We talk about the human brain.
So I always like to think like, well,
think about how you learn something--
like maybe this game.
Anybody remember this game?
Everybody knows how to play this game, right?
It seems, by the way, this game has
different names in every country,
and it's always hard to remember.
I grew up calling it knots and crosses.
Ed's nodding.
Most people grew up maybe calling
in this country tic-tac-toe.
I gave a talk similar to this in Japan earlier this year,
and they had this really strange name that I
couldn't remember for it.
But this is a very simple game, right?
Now, if I were to ask you to play that game right now,
and it's your move, where would you go?
How many people would go in the center?
How many people would not go in the center?
We need to talk.
So you've probably learned this as a young child--
and maybe you teach this to children.
But the strategy of winning this game.
If it's your turn, you will never win this game,
unless you're playing against somebody
who doesn't know how to play the game,
by not going in the center first.
Now, remember how you learned that.
OK?
If you have a really tough teacher like me,
I would teach my kids by beating them
every time at the game and that kind of stuff.
So if they would start in the corner, I would beat them.
And they would start somewhere else,
and I would beat them-- at the game.
And keep doing this kind of thing
until they eventually figured out
that they have to go in the center,
or they're going to lose.
So that was a case of this is how the human brain learns.
So how do we teach a computer the same way?
Now think about, for example, if your kids goes,
and they've never seen this board before.
So in this society, we read left to right, top to bottom.
So the first thing they'd probably do
is go in the top left-hand corner.
And then you'd go in the center, and then they'd
go somewhere else.
And you go somewhere else, and then they go somewhere else,
and you get three in a row, and you beat them.
They now have what, in machine language parlance,
is a labeled example.
They see the board.
They remember what they did on the board,
and that's been labeled as they lost.
Then they might play again, and they
have another labeled example of they lost.
And they'll keep doing that until they have labeled
examples of tying and then maybe, eventually,
labeled examples of winning.
So knowing how to learn is a step
towards this kind of intelligence.
And this is when we talk about machine learning,
we get our data, and we label them.
It's exactly the same as teaching
a child how to play tic-tac-toe or knots and crosses.
So let's take a look at--
so if I go back to this diagram for a moment
before I look at some code, now the idea
is thinking in terms of tic-tac-toe,
you have the answers of experience of playing the game.
You have the labels for that--
that you won, you lost, whatever.
And out of that, as a human, you'd begin to infer the rules.
Did anybody ever teach you you must go first?
Must go in the center first.
If you don't go in the center first, you go in a corner.
If you don't go in a corner, you block somebody
with two in a row?
You don't learn by those if-then rules.
I know I didn't, and most people I speak to didn't.
So as a result, they ended up playing the game,
and they infer the rules for themselves,
and it's exactly the same thing with machine learning.
So you build something.
The computer learns how to infer the rules
with a neural network.
And then at runtime, you give it data,
and it will give you back classifications,
or predictions, or give you back intelligent answers based
on the data that you've given it
So let's look at some code.
So this is what we call the training phase.
This is what we call the inference phase.
But enough theory.
So I like to explain a lot of this in coding.
So a very simple Hello, World scenario,
as all programmers have, is I'm going to get use some numbers.
And I'm going to give you some numbers,
and there's a relationship between these numbers.
And let's see who can figure out what the relationship is.
Are you ready?
OK.
Here's the numbers .
So where x is minus 1, y is minus 3.
Where x is 0, y is minus 1, et cetera, et cetera.
Can you see the relationship between the x and the y?
So if y equals something, what would y equal?
AUDIENCE: 2x minus 1.
LAURENCE MORONEY: 2x minus 1.
Excellent.
So the relationship here is y equals 2x minus 1.
How do you know that?
How did you get that?
AUDIENCE: [INAUDIBLE]
LAURENCE MORONEY: What's that?
AUDIENCE: [INAUDIBLE]
LAURENCE MORONEY: I can't hear you, sorry.
AUDIENCE: It's called a linear fit.
LAURENCE MORONEY: Oh, linear fit.
OK, thanks.
Yeah.
So you've probably done some basic geometry in school,
and you think about usually there's a relationship.
Y equals mx plus c, something along those lines.
So you start plugging in m and the c in.
And then your mind ultimately finds
something that works, right?
So you go, OK.
Well, if y is minus 3, maybe that's
a couple of x's, which will give me minus 2.
And I'll subtract 1 from that, give me minus 3.
And then I'll try that with 0 and 1.
Yep, that works.
Now I'll try that with 1 and 1.
That works.
So what happened is there were a couple of parameters
around the y that you started guessing
what those parameters were and started trying to fit them in
to get that relationship.
That's exactly what a neural network does,
and that's exactly the process of training a neural network.
When you train a neural network to try and pick
a relationship between numbers like this, all it's doing
is guessing those random parameters, calculating--
look through each of the parameters,
calculate which ones it got right, which ones it got wrong,
calculate how far it got them wrong by,
and then try and come up with new values that would be closer
to getting more of them right.
And that's the process called training.
So whenever you see training and talking
about needing lots of cycles for training, needing lots of GPU
time for training, all the computer is doing is trying,
failing, trying, failing, trying, failing, but each time
getting a little closer to the answer.
So let's look at the code for that.
So using TensorFlow and using Keras--
I don't have the code on my laptop,
so I've got to look back at the screen.
Sorry.
So using TensorFlow and Keras, here's
how I'm going to define a neural network
to do that linear fitting in just a few lines.
So the first thing I'm going to do
is I'm going to create my neural network.
This is the simplest possible neural network.
It's got one layer with one neuron in it.
And this is the code to do that.
So where you see keras.layers.Dense(units=1,
input shape=1), that's all that I'm doing is I'm saying
I've got a single neuron.
I'm going to pass a single number into that,
and you're going to try and figure out what the number I
want to come out of that is.
So very, very simple.
So then my next line of code is remember
I said all a neural network is going to do
is try and guess the parameters that
will make all the numbers fit?
So it will come up with a couple of rough guesses
for these parameters.
And then it has these two functions.
One's called a loss function, and one's called an optimizer.
And all they're doing is--
if you remember that set of six numbers I gave you,
it's saying, OK.
Well if y equals something times x plus something,
I'm going to guess those two somethings.
I'm going to measure how many of my y's I got right.
I'm going to measure how far I'm wrong in all of the ones
that I got wrong, and then I'm going
to try and guess new values for those somethings.
So the loss function is the part where
it's measuring how far it got wrong,
and the optimizer is saying, OK.
Here's what I got the last time.
I'm going to try to guess these new parameters,
and I'll keep going until I get y equals 2x minus 1
or something along those lines.
So that's all you do.
You just compile your model.
You specify the loss function.
You specify the optimizer.
These are both really heavy mathy things.
One of the nice things about Keras,
one of the nice things about TensorFlow,
is they're all done for you.
You're just going to specify them in code.
And so I'm going to say, I'm going
to try the mean squared error as my loss function.
And I'm going to try something called
SGD, which is stochastic gradient
descent as my optimizer.
And every time it loops around, it's
going to just guess new parameters based on those.
OK.
So then the next thing I'm going to do
is I'm going to feed my values into my neural network.
So I'm going to say, my x is going
to be this array-- minus 1, 0, 1, et cetera.
My y is that going to be this array.
So here I'm creating the data.
And so I just get them, and I load them
into a couple of arrays.
This is Python code, by the way.
And now all that I'm going to ask my neural network to do
is to try and come up with an answer.
And I do that with the fit method.
So here I just say, hey, try and fit my x's to my y's.
And this epochs=500 means you're going to just try 500 times.
So it's going to loop 500 times like that.
Remember I was saying it's going to get those parameters.
It's going to get it wrong.
It's going to optimize.
It's going to guess again.
It's going to get it wrong.
It's going to optimize.
So in this case in my code, I'm just saying do that 500 times.
And at the end of those 500 times,
it's going to come up with a model
that if I gave it a y-- sorry.
If I give it an x, it's going to give me what
it thinks the y is for that x.
OK.
And you do that using model.predict().
So if I pass it model.predict() for the value 10,
what do you think it would give me?
If you remember the numbers from earlier, y is 2x minus 1.
What do you think it would give?
19, right?
It doesn't because it will give me something really close
to 19.
It gives me about 18.97, and I'm going
to try to run the code in a moment to show.
But why do you think it would do that?
AUDIENCE: [INAUDIBLE]
LAURENCE MORONEY: What's that?
AUDIENCE: It's predicting.
LAURENCE MORONEY: It's predicting.
And it's just been trained on a very few pieces of data.
With those six pieces of data, it looks like a line,
and it looks like a linear relationship,
but it might not be.
There's room for error there that with the fact that I'm
training on very, very little data,
this could be a small part of a line that, for all we know,
goes like this instead of being linear once you move out
of those points.
And as a result, those kind of things
get factored into the model as the model's training on it.
So you'll see it's going to get a very close answer,
but it's not going to be an exact answer.
Let me see if I can get the code running.
It's a little complex with this laptop.
When I'm presenting, it's hard to move stuff over
to that screen.
Just one second.
This requires some mouse-fu.
All right.
So I have that code.
Let's see.
Yeah.
So you can see that code I have now running up there.
And if you look right at the bottom of the screen over here,
we can see here's where it has actually done the training.
It's done 500 epochs worth of training.
And then when I called the model.predict(),
it gave me this answer, which is 18.976414.
And so that was one that I ran earlier.
I'm just going to try and run it again now, if I can.
But it's really hard to see.
So I'll click that Run arrow.
So this IDE is PyCharm, by the way.
So you see that it ran very quickly because it's
a very simple neural network.
And as a result, I was able to train it
through 500 epochs in whatever that is-- half a second.
What did it give me this time?
Was it 18.9747?
Is that what I see?
So again, very simple neural network, very simple code,
but this just shows some of the basics for how it works.
So next, I want to just get to a slightly more advanced example
once I get my slides back.
Whoops.
OK.
So that was very simple.
That was Hello, World.
We all remember our first Hello, World program which we wrote.
If you wrote it in Java, it was like 10 lines.
If you wrote it in C#, it was five lines.
If you wrote it in Python, it was one line.
If you wrote it in C++, it was like 300 lines.
[LAUGHTER]
Do you remember that--
I remember Petzold's book on programming Windows.
Anybody ever read that?
The whole first chapter was how to do Hello, World in MFC,
and it was like 15 pages long.
I thought it was great.
But that was a pretty easy example.
That, to me, is the Hello, World of machine learning-- just
doing that basic linear fitting.
But let's think about something more complicated.
So here are some items of clothing.
Now, as a human, you are looking at these items of clothing,
and you've instantly classified them.
And you instantly recognize them, or at least
hopefully most of them.
But think about the difficulty for a computer
to classify them.
For example, there are two shoes on this slide.
One is the high heel shoe in the upper right,
and one is the sneaker in the second row.
But they look really different to each other--
other than the fact that they're both red,
and you think they vaguely fit a foot.
The high heel, obviously your foot has to change to fit it.
And the sneaker, the foot is flat.
But as a human brain, we automatically recognize these,
and we see these as shoes.
Or if we look at the two shirts in the image, one of them
doesn't have arms because we automatically
see it as being folded-- the one with the tie.
And then the green one in the lower left--
we already know it's a shirt-- it's a t-shirt--
because we recognize it as such.
But think about how would you program a computer
to recognize these things, given the differences?
It's really hard to tell the difference
between a high heeled shoe and a sneaker, for example.
So the idea behind this is there's actually a data
set called Fashion MNIST.
And what it does is it gets 70,000 items of clothing,
and it's labeled those 70,000 items of clothing
in 10 different classes from shirts, to shoes, to handbags,
and all that kind of thing.
And it's built into Keras.
So one of the really neat things that came out of the research
behind this, by the way, is that the images are only 28
by 28 pixels.
So if you think about, it's faster to train a computer
if you're using less data.
You saw how quickly I trained with my linear example
earlier on.
But if I were to try and train it
with high definition images of handbags
and that kind of stuff, it would still work,
but it would just be slower.
And a lot of the research that's gone into this dataset,
they've actually been able to train and show
how to train a neural network that all you need
is a 28 by 28 pixel image for you
to be able to tell the difference
between different items of clothing.
As you are doing probably right now, you can take a look,
and you see which ones are pants,
which ones are shoes, which ones are
handbags, that kind of thing.
So this allows us to build a model that's
very, very quick to train.
And if I take a look, here's an example
of one item of clothing in 28 by 28 pixels.
And you automatically recognize that, right?
It's a boot, or a shoe, or something along those lines.
And so this is the kind of resolution of data--
all you need to be able to build an accurate classifier.
So let's look at the code for that.
So if you remember earlier on, the code that I was building
was I created the neural network.
I compiled a neural network by specifying
the loss function and the optimizer, and then I fit it.
So in this case, a little bit more complex.
Your code's going to look like-- you're going to use TensorFlow.
From TensorFlow, you're going to import the Keras namespace
because the Keras namespace really nicely gives you access
to that Fashion MNIST dataset.
So think about all the code that you'd typically
have to write to download those 70,000 images,
download their labels, correspond
a label with an image, load all of that in--
that kind of stuff.
All that coding is saved and just put
into these two lines of code.
And that's one of the neat things also about Python that I
find that makes Python great for machine learning because that
second line of code there where it's like train_images,
train_labels, test_images, test_labels equals
fashion_mnist.load_data(), what that's actually doing is
it's loading data from the dataset which is stored
in the cloud.
It's sorting that 70,000 items of data into four sets.
Those four sets are then split into two sets, one for training
and one for testing.
And that data is going to contain-- the one on the left
there, the training images, is 60,000 images
and 60,000 labels.
And then the other side is 10,000 images and 10,000 labels
that you're going to use for testing.
Now, anybody guess why would you separate them like this?
Why would you have a different set
for testing than you would have for training?
AUDIENCE: [INAUDIBLE]
LAURENCE MORONEY: The clue's in the name.
So how do you know your neural network
is going to work unless you've got something
to test it against?
Earlier, we could test with our linear thing
by feeding 10 in because I know I'm expecting 2x minus 1
to give me 19.
But now it's a case of, well, it'd
be great for me to be able to test it against something
that's known, against something that's labeled,
so I can measure the accuracy as I go forward.
So that's all I got to do in code.
So now if I come back here, let's look
at how we actually define the neural net-- oh, sorry.
Before I do that, so the training images
are things like the boot that I showed you earlier on.
It's 28 by 28 pixels.
The labels are actually just going
to be numbers rather than like a word like shoe.
Why do you think that would be?
So that you can define your own labels,
and you're not limited to English.
So for example, 09 in English could be an ankle boots.
The second one is in Chinese.
The third one is in Japanese.
And the fourth language, can anybody guess?
Brog ruitin?
That's actually Irish Gaelic.
Sorry, I'm biased.
I have to put some in.
So now, for example, I could build a classifier
not just to give me items of clothing
but to do it in different languages.
So that's just what my labels are going to look like.
So now let's take a look at the code
for defining my neural network.
So here is-- if you remember the first line of code
where I defined the single layer with the single neuron
for the classification, this is what it's going to look like.
And this is all it takes to build this clothing classifier.
So you see there are three layers here.
The first layer, where it says keras.layers.Flatten(input
shape=(28, 28)), all that is is I'm defining a layer to take
in 28 squared values.
Remember, the image is a square of 28 by 28 pixels,
but you don't feed a neural network with a square.
You feed it with a flat layer of values.
In this case, the values are between 0 and 256.
So I'm just flattening that out, and I'm
saying, that's my first layer.
You're going to take in whatever 28 squared is.
My second layer now is just 128 neurons,
and there's an activation function on them
which I'll explain in a moment.
And then my third layer is going to be 10 neurons.
Why do you think there are 10 in that one?
Can anybody guess?
So 28 squared for the inputs, 10 for the output.
Anybody remember where the number 10 was mentioned?
AUDIENCE: [INAUDIBLE]
LAURENCE MORONEY: Yeah, number of labels.
There were 10 different classes.
So what happens when a neural network,
when you train it like this one, it's not just going to pop out
and give you an answer and say, this is number 03,
or this is number 04.
Typically what will happen is that you
want to have 10 outputs for your 10 different labels,
and each output is going to give you a probability that it
is that label.
So for example, the boot that I showed earlier on was labeled
09, so neuron 0 is going to give me a very low number.
Neuron 1 is going to give me a very low number.
Neuron 2 is going to give me a very low number.
Neuron 9 is going to give me a very high number.
And then by looking at the outputs across all
of these neurons, I can now determine
which one the neural network thinks it's classified for.
Remember, we're training this with a bunch of data.
So I'm giving it a whole bunch of data to say this is what
a number 09 looks like.
This is what a number 04 looks like.
This is what a number 03 looks like.
By saying, OK, this is what they are.
I encode the data in the same way.
And as a result, we'll get our output like this.
Now, every neuron has what's called an activation function.
And the idea behind that-- it's a very mathy kind of thing.
But in programmer's terms, the tf.nn.relu that you see there--
if you think about this in terms of code,
if I say if x is greater than zero, return x.
Else, return zero.
OK?
Very simple function, and that's what the relu is.
And all that's going to do is as the code is being filtered in
and then down into those neurons,
all of the stuff that's negative just gets filtered out.
So as a result, it makes it much quicker
for you to train your neural network by getting
rid of things that are negative, they're by rid
of things you don't need.
So every time when you specify a layer in a neural network,
there's usually an activation function like that.
Relu is one of the most common ones
that you'll see, particularly for classification
things like this.
But again, relu is a very mathy thing.
A lot of times, you go to the documentation,
you'll wonder what relu is.
You'll go look it up.
You'll see a page full of Greek letters.
I don't understand that stuff.
So for me, something like relu is as simple
as if x is greater than zero, return x.
Else, return zero.
All right.
So now I've defined my neural network.
And the next thing I'm going to do,
you'll see the same code as we saw
earlier on where what I'm going to do
is compile my neural network.
And in compiling my neural network,
I've got to specify the loss, and I've
got to specify the optimizer.
Now, there's a whole bunch of different types
of loss functions.
There's a whole bunch of different types
of optimizer functions.
When you read academic research papers around AI, a lot of them
specialize on these to say, for this type of problem,
you should use a loss function of sparse categorical cross
entropy because x.
For this type of problem, you should
use an optimizer, which is an Adam-based optimizer,
because x.
A lot of this as a programmer, you just
have to learn through trial and error.
I could specify the same loss function and the same optimizer
that I use for my linear and then try and train
my neural network, see how accurate it is,
how quick it is.
And then I could try these ones, see how accurate it is,
see how quick it is.
There's a lot of trial and error in that way.
And understanding which ones to use right now
is an inexact science.
It's a lot like, for example, as a traditional coder, which
is better-- using a for loop or a do loop?
Which is better-- using a while or using a when?
Those type of things.
And as a result, you see as you're
building your neural networks, there's
a lot of trial and error that you'll do here.
But reading academic papers can certainly
help if you can understand them.
So in this case now, like for the Fashion MNIST,
after a bit of trial and error, we
ended up selecting for the tutorial
to use these two functions. / But as you
read through the documentation, you'll
see all the functions that are available.
So in this case, I'm training it with an AdamOptimizer.
And remember, the process of training,
every iteration it will make a guess that says, OK.
This piece of data, I think it's a shoe.
OK, it's not a shoe.
It's a dress.
Why did I get it wrong?
I'll use my loss function to calculate where I got it wrong,
and then I'll use my optimizer to change
my weights on the next loop to try and see
if I can get it better.
This is what the neural network is thinking.
This how it works as you're actually training it.
So in this case, the AdamOptimizer
is what it's using to do that optimization.
The categorical cross entropy is what it's using for the loss.
So now if I train it, it's the same thing that we saw earlier
on-- model.fit().
So all I'm going to say is, hey, model.fit().
I'm going to train it with the input images and the input
labels, and in this case, I'm going
to train it for five epochs.
OK.
So that epochs number, it's up to you to tweak it.
What you'll do as you're training your network
and as you're testing your network,
you'll see how accurate it is.
Sometimes you can get the process called converging,
means as it gets more and more accurate,
sometimes you'll find convergence
in only a few epochs.
Sometimes, you'll need hundreds of epochs.
Of course, the bigger and more complex
the dataset, and the more labels that you have,
the longer it takes to actually train and converge.
But the Fashion MNIST dataset, actually
using the neural network that I defined in the previous slide,
five epochs is actually pretty accurate.
It gets there pretty quickly with just five.
OK.
And now if I then just want to test it and the model itself--
again, the important object here is the model object.
So if I call model.evaluate(), and I pass it the test images
and the text labels, it will then iterate through the 10,000
test images and test labels.
It will calculate.
It will say, I think it's going to be this.
It will compare it with the label.
If it gets it right, it improves its score.
If it gets it wrong, it decreases its score,
and it gives you that score back.
So the idea here is-- remember earlier
when we separated the data into 60,000 for training and 10,000
for test?
Instead of you manually writing all that code to do all that,
you can just call the evaluate() function on the model,
pass it the test stuff, and it will give you back the results.
It will do all that looping and checking for you.
All right.
And then, of course, if I want to predict an image,
if I have my own images, and I've formatted them into 28
by 28 grayscale, and I put them into a set,
now I can just say model.predict() my images,
and it will give me back a set of predictions.
Now, what do those predictions look like?
So for every image, because the output of the neural network
was--
there were 10 layers, so every image
is going to give you back a set of 10 numbers.
And those 10 numbers, as I mentioned earlier on,
nine of them should be very close to 0, and one of them
should be very close to 1.
And then using the one that's very close to 1,
you could determine your prediction
to be whatever that item of clothing is.
So if I demo this and show it in code--
let's see.
Go back here.
It's really hard to see it, so forgive me.
Whoops.
I'm going to select Fashion--
oh.
I really need a mouse.
I'm going to select Fashion.
OK.
And can you see the fashion code,
or is it's still showing the linear code?
AUDIENCE: [INAUDIBLE]
LAURENCE MORONEY: Is that fashion right there?
AUDIENCE: [INAUDIBLE]
LAURENCE MORONEY: All right.
OK.
Did I just close it?
I'm sorry.
It's really hard to see.
So let me go back.
Is that one fashion?
AUDIENCE: [INAUDIBLE]
LAURENCE MORONEY: Up one?
All right.
That one?
AUDIENCE: [INAUDIBLE]
LAURENCE MORONEY: OK.
So here's the code that I was showing on the earlier slide.
So this is exactly the same code that was on my slides.
I'm just going to go down.
There's one thing I've done here that I didn't show
on the slides, and that was the images themselves
were grayscales, so every pixel was between 0 and 256.
For training my neural network, it
was just easier for me to normalize that data.
So instead of it being from 0 to 255,
it's a value from 0 to 1, which is relative to that value.
And that's what those two lines of codes there.
And that's one of the things that makes Python really
useful for this kind of thing.
Because I can just say that train_images set
is a set of 60,000 28 by 28 images,
and I can just say divide that by 255,
and that normalized that for me.
So that's one of the things that makes Python
really handy in data science.
So we can see it's just the same code.
So I'm going to do a bit of live audience participation.
Hopefully, I can get it to work with us.
So remember I said there are 10,000 testing images?
OK.
So somebody give me a number between 0 and 99,999.
AUDIENCE: [INAUDIBLE]
LAURENCE MORONEY: Don't be shy.
AUDIENCE: [INAUDIBLE]
LAURENCE MORONEY: What's that?
Just 27?
OK.
So hopefully I can see it so I can get it.
That's not 27, is it?
OK.
27.
And here-- 27.
I tested it earlier what value 4560.
So what's going to happen here is
that I'm going to train the neural network to identify
those pieces of clothing.
And so-- or to be able to identify pieces of clothing.
I have no idea what piece of clothing number 27
is in the test set.
But what it's going to do once it's done is by the end,
you'll see it says print the test labels for 27.
So whatever item of clothing 27 is,
there's is a pre-assigned label for that.
It will print that out.
And then the next thing it'll do is
it will print out what the predicted label will be.
And hopefully, the two of them are going to be the same.
There's about a 90% chance, if I remember right from this one,
that they will.
So if I run it, it's going to take a little longer
than the previous one.
So now we can see it starting to train the network.
AUDIENCE: [INAUDIBLE]
LAURENCE MORONEY: And because I'm doing in PyCharm,
I can see in my debug window.
So you can see the epochs--
epoch 2, epoch 3, epoch 4.
This accuracy number here is how accurate it is against testing.
So it's about 89% correct.
And then you see it's actually printed two numbers below,
and they're both 0.
So that means for item of clothing number 27,
that class was 0.
And then the predicted for that class was actually also 0,
so it got it right.
Yay.
Anybody want to try one more just to prove that?
[INTERPOSING VOICES]
LAURENCE MORONEY: Let's see if we can--
what's that?
AUDIENCE: 42.
LAURENCE MORONEY: 42, I love it.
That's the ultimate answer, but what is the question?
OK.
42.
And I'm guessing 42 is probably also item 0, but let's see.
Hopefully, I haven't broken any of the bracketing.
Let me run it again.
So because it's running all of the code,
it's just going to train the network again.
OK.
There's epoch 2, epoch 3.
Hello.
There we go.
So let's remember earlier I said I'm just
training it for five epochs.
It just makes it a little bit quicker.
And I'm also seeing-- if you look at the convergence,
on epoch 1 it was 82% accurate.
Oh, we got it wrong for 42.
It predicted it would be a 6, but it's actually a 3.
But the first epoch you see, this accuracy figure--
82.45%.
That means it calculated it was 82% accurate.
The second epoch, 86% accurate; the third, 87%;
all the way down to the fifth--
89%.
I could probably train it for 500 epochs,
but we don't have the time.
But then it might be more likely to get number 42 correct.
And thanks, Mr. Douglas Adams, that you've actually
given me one that doesn't work, so I can go back and test it.
OK.
So that's Fashion MNIST and how it works.
And so hopefully, this was a good introduction to you
for really the concept from a programmer's perspective
of what machine learning is all about.
And I always like to say at talks
that if you only take one slide away from this talk,
if you've never done machine learning,
or you want to get into programming machine learning,
take this one here.
Because this is really what the core of the revolution
is all about, and hopefully the code that I
showed you demonstrates that--
that machine learning is really all
about taking answers and data and feeding them in
to get rules out.
I didn't write a single line of code there today that says,
this is a t-shirt, or this is a jacket, or this is a handbag.
This has sleeves.
If has sleeves, then is t-shirt.
If has heels, then is shoe.
I didn't have to write any of that kind of code.
I just trained something on the data
using the below thing, the below part of the diagram--
feeding in answers, feeding in data,
building a model that will then infer the rules about it.
So with that, I just want to say thank you very much,
and I hope you enjoy the rest of the conference.
[APPLAUSE]