ml5.js: Pose Classification with PoseNet and ml5.neuralNetwork() - VoiceTube: Learn English through videos!

Subtitles section Play video

[WHISTLE]
Hello.
And welcome to another video using Posenet and ML5.js.
But in this video, what I'm going
to do is take the output of the Posenet pre-trained model,
and feed that into an ML5 neural network to train,
oppose classifier, to recognize when
I'm making certain motions like a y, and m, a c, and an a.
Before I begin coding, let me quickly mention
something I added between the last video and now.
I'm mirroring the image so that when I raise my left hand,
it's mirrored to me what I'm seeing on the screen in front
of me over there.
This is important for interactivity.
It makes it feel much more intuitive and natural to see
yourself mirrored.
You might recall that the ML5 has
a specific function called Flip Image that will do it for you.
But I actually found, because I'm
drawing all this other stuff, that it's easier for me
to just write the code for it itself,
which involves a translate and a scale.
In other words, typically if I'm drawing an image, it's 00,
I'm drawing it right here, and the image gets painted across
the canvas.
But if I call scale negative 1,1,
it sets the x-axis going in the other direction.
So positive pixels go this way.
And if I translate over to here and put 00 here and draw
the image this way, it will appear reversed-- inverted,
flipped--
to the viewer.
So that's what's happening in these three steps right here.
The two videos that I'm assuming are prerequisites
here are the previous one, where I covered
all of the code for this particular Posenet example
that you're seeing running right here in the web editor,
as well as this train your own neural network set
of videos that covered the basics of how
the ML5 neural network function works to train a model
to play musical notes based on where the user clicks
their mouse in a canvas.
To get started, I could really begin with either one
of these sketches.
For example, I could go and get my Posenet code
and bring it into this particular sketch.
Or I could take the neural network code from this sketch
and bring it into the Posenet one.
I think I want to continue working from the Posenet sketch
itself.
And the first thing that I want to do
is create an object to store the neural network.
So I'm going to call that Brain.
And then after I initialize the Posenet model,
I'll say Brain is a new ML5 neural network.
And you might recall that anytime
you create a neural network, you can
specify a set of options for how you
configured that neural network.
All of the options for how to configure
an ML5 neural network, you can find on the documentation
page for the reference.
I'm just starting with these four basic properties-- inputs,
outputs, task, and debug.
So let's come over here to the whiteboard.
And let's diagram out what's going on.
Now remember, we're starting with the Posenet machine
learning model.
We're sending an image into that model as the input.
The Posenet model then takes that image
and does Pose estimation, making a guess
as to where all the key points are on the human body
that it sees.
And all of those points come in the form of xy pairs,
coordinates.
Here's my elbow.
Here's my shoulder.
Here's my ear.
It doesn't have an ear--
whatever-- nose, there's 17 of them.
All of this data is what I want to send
in as the input to my ML5 neural network.
ML5 neural network will take all these xy pairs
and classify them into a given pose that has a label.
It's a dab pose, or a Saturday Night Fever pose.
I don't know what kind of poses I'm going to make.
I'll do YMCA.
Why not?
This now tells me how I want to configure my neural network.
I want to send it 17 pairs of numbers.
That's 34 inputs.
And I want it to classify those 34 numbers
into one of four labels.
That is four outputs.
34 inputs, four outputs, the task is classification.
And I do want to see debugging as I'm training the model.
And I have to give those options to the ML5 neural network
itself.
This is where things get kind of complicated because I
need to call Brain.AddData.
That's the way I add training data to my neural network.
So somewhere I have to have some kind of interaction.
Maybe I press a key.
I'll press the key Y and then it will wait a little bit.
And it'll know after five seconds,
for when I come over here, to start collecting pose data
for a certain amount of time.
Then it will stop.
And then I'll come back over here, and press a button,
and do something else.
So this requires a lot of thoughtfulness
in terms of how I might build the interaction around this.
I'm just going to try to do it in a simple way
that I can get it to work right here right now in this room.
For a much nicer example around interaction and collecting pose
data, you can take a look at Google Creative Lab's Teachable
Machines.
So I've made video tutorials about training image models
and sound models that can actually be imported into ML5.
At this moment, you cannot import the pose model into ML5.
That's something that we're working on.
And I'm hoping that this video tutorial
will lead the way to that.
But essentially, what I'm building
is a pose teachable machine.
I just won't do as thoughtful of an interaction as here
in the actual Teachable Machine project.
You can see here in Teachable Machine, for example,
there's a button that I can press.
And it's going to give me a 10-second countdown.
And then when I come over here, after 10 seconds
it's going to start collecting my poses.
So this is a much nicer example.
I encourage you to look at it for inspiration.
Of course, that was terrible training data.
But now I'm going to go back to my code
and try to implement my own version of this.
To keep track of the flow of the sketch,
let me add a variable called State.
And I'll just initialize it to waiting.
And then I will add the key pressed function.
And when I press the key, I want to say state equals collecting.
Only, I don't want to start collecting immediately.
I want to wait a little bit because it's
going to take me some time to walk over there and get
into my pose.
So I'll use Set Time Out for a delay.
So, Time Out is a built in function in JavaScript.
It's not part of P5 that will execute a function
after a certain amount of time.
And maybe I want to execute this function after a certain amount
of milliseconds.
So I can put a little function inside here.
I could use the arrow syntax.
There's a variety of ways I could approach this.
Let's just say 10 seconds later.
Right?
So when I press the key, 10 seconds later,
set the state equal to collecting.
Also have a variable called Target Label.
And I'll set the target label equal to the key that
was pressed.
All right, so I have this function going.
When I press the key, whatever key I press
is the target label.
I want to see that in the console.
And then 10 seconds later, I want
to see it say that it's starting to collect.
Let me make it one second later so I
don't have to wait as long.
All right, and I'm going to press the Y key.
Y, collecting-- perfect.
So this is the right idea.
Once that state switches to collecting,
I want to call the ML5 neural network Add Data function.
Where I want to call the add data function
is right here when I have a pose.
So when I have a pose, I want to say Brain, Add Data, and then
the inputs and the targets.
The inputs are all of the xy locations of the pose itself.
There's 34 of them.
I mean, I have kind of an issue where
the camera can't see my legs.
So I probably should ignore some of them.
But I'm just not going to worry about that.
I could also consider using the confidence scores.
Like maybe the confidence score, the neural network
could learn when it's a low confidence score
to kind of ignore that point.
But I'll ask you to try all that stuff if you're making
your own version of this.
I'm just going to use these 17 xy pairs.
So I need them to be in a plain old array.
And if you recall, they're not in a plain old array.
They're in this pose at key points
which each has an object, which is position.x.
So I need to flatten the data.
Whatever format the data is in, I
want to just put it into a plain array.
So I'm going to grab this loop.
I'm going to create an empty array called Inputs.
And I'll just say inputs.push x, inputs.push y.
So this is me going through the entire pose,
getting all the xy's, putting them
in an array, which is the input to the neural network.
And what's the target?
It also wants an array.
But in this case, it's one thing, just the label.
So I can take the target label, put it an array.
And that's what I'm giving an Add Data function.
You might recall in my previous neural network examples,
I was making objects that I passed in with named inputs
and outputs.
So this is just showing you that you can do it either way.
If I want to have names for all the inputs and outputs,
I can build an object with properties.
If I just want a big array of numbers,
I can just make it a plain array.
But there's a new problem.
The new problem is once I start collecting the data,
I'm going to strike the pose.
And maybe I'll collect the pose for a little while.
I've got to stop collecting the pose.
So let's go back up to where I started collecting the pose.
I'm going to do something awful.
This is so painful.
I don't want to do it.
Let's just do it and then we'll revisit it later.
We will.
[MUSIC PLAYING]
I'm going to call set Time Out again right inside here.
Because a second later or 10 seconds later,
I want to stop collecting.
This might be some of the worst code I've ever written.
It's really awful to look at.
It's what's informally known as callback hell.
And there's a variety of ways I could approach this differently
by using promises, and async, and await.
But in this case, really all I want to do
is set the state to collecting in 10 seconds.
Then 10 seconds later, set it back to waiting.
And I think this will work for me.
Let's give it a try.
I need to first press Y. One 1,000, two 1,000, three 1,000,
four 1,000, five 1,000-- collecting.
10 seconds later it should say not collecting.
All r right.
OK, that worked.
What I'm doing here, quite poorly I might add,
is implementing a state machine.
So it might be nice for me to, in a separate video which,
if I can ever get around to making it,
talk about a more proper way of implementing a state machine.
But this works.
I set this state variable to collecting.
10 seconds later, set it back to not collecting.
And during that time, I am adding data
to the ML5 neural network.
[DING]
Sorry for a second.
I'm coming to you from almost weeks-- several weeks,
a month later, a really long time,
look how much my beard has grown--
to issue a correction.
I have made a very significant error in this video
that you're watching.
And I don't correct it any time throughout the course
of this video.
And the error is that I forgot to actually
include an if statement here.
In the Got Poses event, when I receive a pose,
I should only actually call Add Data
when the state is collecting.
I was just doing it anyway.
Sure, I set the state to collecting,
and waiting, and then to collecting, and waiting.
But I didn't actually include a conditional.
So I had a lot of messy extra noise in the data.
So I just redid it now in the time
that I'm talking to you right now,
and collected the data again, retrained the model,
and performed so much better than it actually
does in the video.
And the code that's released has that small correction in it.
Amazingly, it kind of worked anyway, as you'll see,
as we continue watching.
But just note that correction when you go look at the code.
It's an important detail.
Thanks, and enjoy the rest of this video.
Immediately what I want to do right now
is add a function to save the data.
Because I do not want to do this many, many, many times.
So in key pressed, I'm going to say if the key is S--
Brain, save data.
So let me quickly try collecting that one pose again
and make sure it can save to a JSON file
that I can reload later.
Press Y, I've got 10 seconds now.
Collecting, not collecting.
So I can come back over here.
I can press S. And I now have a JSON file that was saved.
Let's take a look at that file.
And this is what that file looks like.
For every single pose it's got X's, those are the inputs.
There should be 34 of them, 0 through 33.
Then it's got the Y's, which is one label, Y.
So these are all of my poses saved here
in this big JSON file.
Great, I can now train them--
I can now collect the data for all four of those poses.
Let's see if I can manage to do that.
First, Y. Collecting.
Not collecting, OK.
Now I'm going to do M. Collecting.
Not collecting, OK.
Should I really do all of these?
C-- really noisy data.
One more, A. OK.
Now we save the data.
Save.
OK, stop.
Stop the sketch.
I've got the data.
Two megabytes-- so that was a large file but not ridiculous.
I'm going to rename it to YMCA.
I'm going to now upload it to my sketch.
And then I'm going to comment out all the data collection
stuff.
Because I'm just going to consider myself
done with data collection.
And I'll actually duplicate the sketch.
And let me call this one Data Collection.
I'll duplicate it and call this one now Model Training.
The next step is when the sketch runs,
to load the existing data.
So now I don't need to collect the data.
I could load the data, collect more data.
There's so many ways you could do this.
But I just want to load what I collected previously
and then, when the data is ready--
when the data is ready, when it's loaded,
then I can call the Train function.
There's a lot of options I could configure Train with.
But I just wanted to go for 10 epochs.
That's running through all of the data 10 times.
I might need a lot more.
When it is finished, I want to just console log
that it's been trained and save the model to my Downloads
folder so that I have it saved.
Let's see if this works.
So, I see the graph pop up that would show me
the loss while it's training.
But it never went down at all.
Let's try giving it 100 epochs.
This is not a good sign.
[DING]
Guess what I forgot to do?
Something very important.
What is the data that I'm collecting?
These xy values are on my P5 canvas,
which has a roulette resolution of 640 by 480.
So they are large values.
They need to be normalized down to a standard between 0 and 1.
So I'm going to let the ML5 library take
care of that for me by just adding the normalized data
function.
So right here in data ready before I train the model,
I can call normalized data.
Let's run it one more time.
Let's just go down to 50 epochs.
And there we go.
Now I see a loss going down the way I had hoped it would.
And the model is trained and presumably saved
to my Downloads folder.
Let's go take a look.
Because I was doing this multiple times,
I have a mess of files down here in the Downloads folder.
But the one that was most recent is number 5.
So I'm going to get rid of everything
and just rename these back to the default names.
And now I can upload the model that I trained back up
to the P5 web editor.
Lets duplicate the sketch one more time.
I'm going to call this Posenet Deploy.
Let's create a folder, called it Model.
Add the model files.
I can see the files over here.
And now instead of loading the data,
I can load the trained model.
But if you recall from the previous videos,
there are three files for the model.
So I need to create an object to store all three file names.
The format for how that has to be,
I can find on the reference page for ML5 neural network.
Copy this to clipboard.
Bring it over here and put in my path,
which is just called Model.
Let's run the sketch and see if I can just get model ready here
in the console.
Oh, oh, I'm just missing a quote.
Thank goodness.
And I'm inconsistently using single and double quotes.
Let's fix that.
Oh, it's not neural network.
I called it Brain.
So close.
Posenet ready, Posenet ready.
Huh?
Oh, I have a callback for Posenet
for when it's ready called Model Loaded.
So this needs to be--
I'll call this Brain Loaded.
So remember, we're using two machine learning models here,
Posenet, which is doing the pose estimation.
Then the outputs of that model are
going into my own neural network that I've trained called Brain.
Let's try this one more time.
Pose classification ready, Posenet ready.
Wonderful.
Incidentally, the classification model
was loaded first because Posenet actually
has to reach out to the cloud and download
the model from Google servers.
We are so close to being done with this.
Just a couple more steps.
Once my brain is loaded, I can actually ask it to classify.
So I can say Brain.ClassifyInputs.
And when you've got a result, call Got Result.
The question is, what are those inputs?
These are those inputs.
The same thing I did when I was collecting the training data--
grabbing the xy pairs, flattening them into an array--
I need to do that for the inputs that I'm
sending in when I'm deploying the model
and asking it to guess, asking it to do that classification.
Here's the loop from the data collection
where I flattened it into a single array.
I can grab that and I can bring that in here.
But I'm going to, once I get the result, want to do this again.
So really, I should take all of this,
write a new function called Classify Pose,
make sure there is a pose in the first place.
Then once the brain is loaded, just call Classify Pose.
I need the Got Result callback which has a two arguments
error, results.
And I'm just going to say console log
results index zero dot label.
Let's console log the whole results as well.
So in theory, the first pose I get should be--
if there is a pose when the brain is loaded, or there
won't be.
So I'm going to have to--
I'm going to put the recursive loop call in here
and call Classify Pose again.
So the idea is when the brain is loaded, classify a pose.
If there's a pose, call Brain.Classify.
But what if there's no pose?
I'll never go get a result. Otherwise, hmm--
this is really silly, but I'm anticipating an issue
that I'm going to have.
So what I'm going to do is if it doesn't detect a pose,
let's just say hey, in a little bit--
why don't you wait like 100 milliseconds
and call Classify Pose again?
So that way it will continue to check.
So at some point eventually, there will be a pose.
It'll call Brain.Classify when that's done.
It will call it again.
If there [? ever loses ?] detecting a pose,
it won't stop.
It will actually continue and just every 100 milliseconds,
keep trying again.
All right, I think--
there's no way this is going to work, right?
Let's give it a try.
All right, pose ready.
Y A, Y--
C, C-- A, Y, M. What?
This actually worked?
Well, let's just throw caution to the wind
and draw the label in big letters on the canvas.
I'm going to set it equal to Y so that I see it
right there at the beginning.
And at the end of Draw, I'm going to say fill 255,
0, 255, no stroke.
OK, there's the Y--
oh, it's going to be reversed.
It's going to be flipped because of the--
so I'm going to also out push here.
It doesn't matter, Y is symmetrical.
Y and the C won't work.
And then pop here so that the Y is always in the center.
OK, now when I get a result, set it equal to that label--
I kind of want to see the confidence.
So I'll log the confidence to the console.
Because I want to see how well it-- how sure
it is about that particular label.
All right, here we go.
[HUMMING "YMCA"] One, two, three, four, one.
It's fun to stay at the YMCA.
It's fun to stay at YMCA.
All right, we need these to be capital letters.
[HUMMING "YMCA"] One, two, three, four, five.
[HUMMING "YMCA"]
Let's go slower.
YMCA.
So, thanks for watching this.
I'm kind of shocked it works.
There's so much more that could be done with this.
First of all, a question came up, which is you
forgot to normalize the data during this classification
process.
And ML5, one of the nice things it does,
is it saves the normalization minimums and maximums
from the training process and then reapplies them later.
So I don't have to--
I don't have to call Normalize Data again.
That's happening behind the scenes.
Otherwise, you know, what other kinds of labels,
what kinds of other outputs-- you know, could
I play different drumbeats based on a pose?
What other kinds of things could you classify?
If I collected a much larger data set,
if I was more thoughtful about how I was collecting the data,
can I get this to be much more accurate?
Give this a try.
Make something.
What I definitely also want to do,
which I'll need to make another video about,
is turn this into a regression.
So could I take that example where I play a frequency--
[HUMMING] --and alter that frequency
by changing my motion, my movements?
And in fact, I could actually have the output
of that regression be color.
That might be something to explore.
I got this idea from a viewer named Darshawn
who submitted a community contribution where,
instead of painting with sound, you're painting
with color with a regression.
And that's really interesting because it
requires you to have three different outputs--
an R, a G, and a B. So take a look at that project.
And maybe I'll think about doing something
with poses and color output in the next video tutorial.
[HUMMING 'YMCA"]
I have an idea.
What if I only update the label if the confidence is
above a given threshold?
Let's take 75%.
Maybe this will eliminate some of the noise.
[HUMMING "YMCA"]
It totally helps, right?
Because it's not flickering as much because it's
only altering it if it's really confident about what
the pose is.
And I can maybe make that threshold even higher.
[HUMMING "YMCA"]
The M and the A are so similar.
You're all right about that.
It is able to get it though.
M is something slightly different.
I'm a little out of breath, but one more thing I want to say.
Again, unlike with my previous examples where
I've trained models that do something similar based off
of images and pixels, in this case,
the pose data is more generic that I think,
if you go to the URL for this sketch, which
is in this video's description, you can make those poses
and hopefully it'll recognize them.
So give that a try.
And I can't wait to see you in a future coding video.
Goodbye, have a great day.
[WHISTLE]