Placeholder Image

Subtitles section Play video

  • [DING]

  • Hello, and welcome to another Beginner's Guide to Machine

  • Learning video tutorial.

  • In this video, I am going to cover

  • the pre-trained model, PoseNet.

  • And I'm going to look at what PoseNet is,

  • how to use it with the ml5,js library with the p5.js library,

  • and track your body in the browser in real time.

  • The model, as I mentioned, that I'm looking at,

  • is called PoseNet.

  • [MUSIC PLAYING]

  • With any machine learning model that you

  • use, the first question you probably want to ask is,

  • what are the inputs?

  • [MUSIC PLAYING]

  • And what are the outputs?

  • [MUSIC PLAYING]

  • And in this case, the PoseNet model

  • is expecting an image as input.

  • [MUSIC PLAYING]

  • And then as output, it is going to give you

  • an array of coordinates.

  • [MUSIC PLAYING]

  • In addition to each of these xy coordinates,

  • it's going to give you a confidence score for each one.

  • [MUSIC PLAYING]

  • And what do all these xy coordinates correspond to?

  • They correspond to the keypoints on a PoseNet skeleton.

  • [MUSIC PLAYING]

  • Now, the PoseNet skeleton isn't necessarily

  • an anatomically correct skeleton.

  • It's just an arbitrary set of what

  • is 17 points that you can see right over here,

  • from the nose all the way down to the right ankle, that it is

  • trying to estimate where those positions are

  • on the human body, and give you xy coordinates,

  • as well as how confident is that it's

  • correct about those points.

  • One other important question you should ask yourself and do

  • some research about whenever you find yourself using

  • a pre-trained model out of the box, something

  • that somebody else trained, is who trained that model?

  • Why did they train that model?

  • What data was used to train that model?

  • And how is that data collected?

  • PoseNet is a bit of an odd case, because the model itself,

  • the trained model is open source.

  • You can use it.

  • You can download it.

  • There's examples for it in TensorFlow and tensorflow.js

  • and ml5,js.

  • But the actual code for training the model,

  • from what I understand or what I've been able to find,

  • is closed source.

  • So there aren't a lot of details.

  • A data set that's used often in training models

  • around images is COCO, or Common Objects In Context.

  • And it has a lot of labeled images

  • of people striking poses with their keypoints marked.

  • So I don't know for a fact whether COCO

  • was used exclusively for training PoseNet,

  • whether it was used partially or not at all.

  • But your best bet for a starting point

  • for finding out as much as you can about the PoseNet model

  • is to go directly to the source.

  • The GitHub repository for PoseNet,

  • in fact there's a PoseNet 2.0 coming out.

  • I would also highly suggest you read the blog post "Real-time

  • Human Post Estimation in the Browser with TensorFlow.js"

  • by Dan Oved and editing and illustrations from Irene

  • Alvarado and Alexis Gallo.

  • So there's a lot of excellent background information

  • about how the model was trained and other relevant details.

  • If you want to learn more about the COCO image data set,

  • I also would point you towards the Humans of AI project

  • by Philip Schmidt, which is an artwork, an online exhibition

  • that takes a critical look at the data in that data

  • set itself.

  • If you found your way to this video, most likely,

  • you're here because you're making interactive media

  • projects.

  • And PoseNet is a tool that you could

  • use to do real time body tracking very quickly

  • and easily.

  • It's frankly, pretty amazing that you could do

  • this with just a webcam image.

  • So one way to get started, which in my view,

  • is one of the easiest ways, is with the p5 Web Editor,

  • in the p5.js library, which very, so I have a sketch here

  • which connects to the camera and just

  • draws the image in a canvas.

  • Also want to make sure you have the ml5,js library imported,

  • and that would be through a script tag in index at HTML.

  • Once you've got all that set up, we're ready to start coding.

  • So I'm going to create a variable called PoseNet.

  • I'm going to say PoseNet equals ml5.posenet.

  • All the ml5 functions are initialized the same way,

  • by referencing the ml5 library dot the name of the function,

  • in this case, PoseNet.

  • Now typically, there's some arguments that go here.

  • And we can look up what those arguments are,

  • by going to the documentation page.

  • Here we can see there are a few different ways

  • to call the PoseNet function.

  • I want to do it the simplest way possible.

  • I'm just going to give it the video element and a callback

  • for when the model is loaded, which I don't even

  • know that I need.

  • [MUSIC PLAYING]

  • I'll make sure there are no errors and run this again.

  • And we can see PoseNet is ready.

  • So I know I've got my syntax right.

  • I've called the PoseNet function,

  • I've loaded the model.

  • The way PoseNet works is actually

  • a bit different than everything else in the ml5 library.

  • And it works based on event handlers.

  • So I want to set up a pose event by calling this method on.

  • On pose, I want this function to execute.

  • Whenever the PoseNet model detects a pose,

  • then call this function and give me the results of that pose.

  • I can add that right here in setup.

  • PoseNet on pose.

  • And then I'm going to give it a callback called, got poses.

  • [MUSIC PLAYING]

  • And now presumably, every single time it detects a pose,

  • it sees me, it sees my skeleton, it

  • will log that to the console right here.

  • Now that it's working, I can see a bunch

  • of objects being logged.

  • Let's take a look at what's inside those objects.

  • The p5 console is very useful for your basic debugging.

  • In this case, I really want to dive deep into this object

  • that I'm logging here, the poses object.

  • So in this case, I'm going to open up the actual developer

  • console of the browser.

  • I could see a lot of stuff being logged here very, very quickly.

  • I'm going to pick any one of these and unfold it.

  • So I can see that I have an array.

  • And the first element of the array is a pose.

  • There can be multiple poses that the model is

  • detecting if there's more than one person.

  • In this case, there's just one.

  • And I can look at this object.

  • It's got two properties, a pose property and a skeleton

  • property.

  • Definitely want to come back to the skeleton property.

  • But let's start with the pose property.

  • I can unfold that, and we could see, oh my goodness,

  • look at all this stuff in here.

  • So first of all, there's a score.

  • I mentioned that with each one of these xy

  • positions of every keypoint, there is a confidence score.

  • There is also a confidence score for the entire pose itself.

  • And because the camera's seeing very little of me,

  • it's quite low, just at 30%.

  • Then I can actually access any one of those keypoints

  • by its name.

  • Nose, left eye, right eye, all these, all the way

  • down once again to right ankle.

  • So let's actually draw something based

  • on any of those keypoints.

  • We'll use my nose.

  • I going to make the assumption that there's always only going

  • to be a single person.

  • If there were multiple people, I'd

  • want to do this differently.

  • And I'm going to make a, hit stop.

  • I'm going to make a variable called pose.

  • Then I'm going to say, if it's found a pose,

  • and I can check that by just checking

  • the length of the array.

  • If the length of the array is zero,

  • then pose equals poses index zero.

  • I'm going to take the first pose from the array

  • and store it into the global variable.

  • But actually, if you remember, the object in the array

  • has two properties, pose and skeleton.

  • So it seems there's a lot of redundant lingo here,

  • but I'm going to say, posesindex0.pose.

  • [MUSIC PLAYING]

  • This could be a good place to use the confidence score.

  • Like, only if it's like of a high confidence actually

  • use it.

  • But I'm just going to take any pose that it gives me.

  • Then in the draw function, I can draw something

  • based on that pose.

  • So for example, let me give myself a red nose.

  • [MUSIC PLAYING]

  • So now if I run the sketch, ah, so I got an error.

  • So why did I get that error?

  • The reason why I got that error is it

  • hasn't found a pose yet, so there

  • is no nose for it to draw.

  • So I should always check to make sure there is a valid pose

  • first.

  • [MUSIC PLAYING]

  • Then draw that circle.

  • And there we go.

  • I now have a red dot always following my nose.

  • If you're following along, pause the video

  • and try to add two more points where your hands are.

  • Now there isn't actually a hand keypoint.

  • It's a wrist keypoint.

  • But that'll probably work for our purposes,

  • I'll let you try that.

  • [TICKING]

  • [DING]

  • How did that go?

  • OK, I'm going to add it for you now.

  • [MUSIC PLAYING]

  • Let's see if this works.

  • Whoo.

  • This is working terribly.

  • It could, I'm almost kind of getting it right.

  • And there we go.

  • But why is it working so poorly?

  • Well, first of all, I'm barely showing,

  • I'm only showing it from my waist up.

  • And most likely, the model was trained on full body images.

  • [MUSIC PLAYING]

  • Now I turned the camera to point at me over here,

  • and I'm further away.

  • And you can see how much more accurate

  • this is, because it seems so much more of my body.

  • I'm able to control where the wrists are

  • and get pretty good accurate tracking as I'm standing

  • further away from the camera.

  • There are also some other interesting tricks

  • we could try.

  • For example, I could estimate distance from the camera

  • by looking at how far apart are the eyes.

  • [MUSIC PLAYING]

  • So for example here, I'm storing the right eye and left eye

  • location in separate variables, and then

  • calling the p5 distance function to look

  • at how far apart they are.

  • And then, I could just take that distance

  • and assign it to the size of the nose.

  • So as I get closer, the nose gets bigger.

  • And you almost can't tell, because it's sizing relative

  • to my face.

  • But it gives it more of a realistic appearance

  • of an actual clown nose that's attached,

  • by changing its size according to the proportions of what

  • it's detecting in the face.

  • You might be asking yourself, well,

  • what if I want to draw all the points,

  • all the points that it's tracking?

  • So for convenience, I was referencing each point by name.

  • Right eye, left eye, nose, right wrist.

  • But there's actually a keypoints array

  • that has all 17 points in it.

  • So I can use that to just loop through everything

  • if that's what I want to do.

  • [MUSIC PLAYING]

  • So I can loop through all of the keypoints

  • and get the xy of each one.

  • [MUSIC PLAYING]

  • And then I can draw a green circle at each location.

  • Oops.

  • So that code didn't work, because I

  • forgot that each element, each keypoint

  • is more than just an xy.

  • It's got the conference score, it's

  • got the name of the part and a position.

  • So I need the keypoints index 0's position dot x.

  • Pose dot keypoints index I dot position dot x.

  • Dot position dot y.

  • Now I believe this'll work.

  • And here we go.

  • Only thing I'm not seeing are my ankles.

  • Oh, it's not.

  • There we go!

  • I got kind of accurate there.

  • Here's my pose.

  • OK, so you can see I'm getting all the points of my body

  • right now, standing about probably six feet away

  • from the camera.

  • There's one other aspect of this that I haven't shown you yet.

  • So if you've seen demos of PoseNet

  • and some of the examples, the points

  • are connected with lines.

  • So on the one hand, you could just memorize like always

  • draw a line between the shoulder to the elbow and the elbow

  • to the wrist.

  • But PoseNet, what I presume is based on the confidence scores,

  • will dynamically give you back which parts

  • are connected to which parts.

  • And that's in the skeleton property

  • of the object found in the array that was returned to us.

  • So I could actually add a new global variable

  • called skeleton.

  • This would've been good for Halloween.

  • Skeleton equals, and let me just stop this for a second.

  • Poses index zero dot skeleton.

  • I can loop over the skeleton.

  • [MUSIC PLAYING]

  • And skeleton is actually a two-dimensional array,

  • because in the second dimension, it

  • holds the two locations that are connected.

  • So I can say a equals skeleton index i index zero.

  • And b is.

  • [MUSIC PLAYING]

  • Index 1.

  • And then I can just draw a line between the two of them.

  • [MUSIC PLAYING]

  • I look at every skeleton point.

  • I get the two parts.

  • Part A, part B, and just draw a line between the x's

  • and y's of each of those.

  • [MUSIC PLAYING]

  • Make it a kind of thicker line, and give it a, the color white.

  • And let's see what this looks like.

  • And there we go.

  • That's pretty much everything you could do

  • with the ml5 PoseNet function.

  • So for you, you might try to do something

  • like make a googly eyes.

  • That's something I actually did in a previous video

  • where I looked at an earlier version of PoseNet.

  • And you could also look at some of these other examples that

  • demonstrate other aspects.

  • For example, you can actually find the pose of a JPEG

  • that you load rather than images from a webcam.

  • But what I want to do, which I'm going

  • to get to in a follow-up video to this,

  • is not take the outputs and draw something.

  • But rather, take these outputs and feed them as training data

  • into an ml5 neural network.

  • What if I say, hey, every time I make this pose, label that a y.

  • And every time I make this pose, label that an m, a c, an a, you

  • see where I'm going.

  • Could I create a pose classifier?

  • I can use all of the xy positions,

  • label them, and train a classifier

  • to make guesses as to my pose.

  • This is very similar to what I did with the teachable machine

  • image classifier.

  • The difference is, with the image classifier,

  • as soon as I move the camera to a different room

  • with different lighting and a different background

  • with a different person, it's not

  • going to be able to recognize the pose anymore, because that

  • was trained on the raw pixels.

  • This is actually just trained on the relative positions.

  • So in theory, somebody around the same size as me,

  • swapping out, it would recognize their pose.

  • And there's actually a way that I could just

  • normalize all the data, so that it would work

  • for anybody's pose potentially.

  • So you can train your own pose classifier

  • that'll work generically in a lot of different environments.

  • So if you make something with ml5 PoseNet

  • or with PoseNet with another environment,

  • please share it with me.

  • I'd love to check it out.

  • You could find the code for everything

  • in this video in the link in this video's description.

  • And I'll see you in the future "Coding Train" ml5 Machine

  • Learning Beginner, whatever, something video

  • [WHISTLE]

  • Goodbye.

  • [MUSIC PLAYING]

[DING]

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it