Subtitles section Play video Print subtitles Hello and welcome to another beginner's guide to machine learning with ML5.js video on pose estimation and posenet. So this is the third, the last one that I'll do in this series here about posenet. First I looked at just what posenet is and how it works and how you can get the key points of a human skeleton. Then I took the output of the posenet model, all those key points, and fed them into another neural network to do pose classification, to recognize different poses that I made with my body. And in this grand finale pose video, I will do exactly what I did in the previous video with post classification. But perform a regression. So the final output instead of being a classifier, am I making a Y, M, C, or A pose, I will make a regression. What do I mean by that exactly? So to review, the setup I have is as follows. [MUSIC PLAYING] The system starts with an image. It sends that image into the pre-trained posenet machine learning model. That model performs pose estimation and gives as its output 17 x,y pairs. Wrist, elbow, shoulder, shoulder, elbow, wrist, et cetera, et cetera, et cetera. And then I take all of those and feed them into another neural network, an ML5 neural network, which then classifies those key points as Y, M, C, or A. So that's the process that I've built in the first two videos. I want the final output to no longer be categorical. It's not one of four option. The final output is any number. So you could think of it as the final output is going to control a slider. And that slider is going to have some sort of range. So what I did previously in other examples of regression in this full series if you go back, I used a neural network to output a frequency value to play a musical note. So I certainly could do that here. I could train the machine learning model to play the note [SINGING] for this pose and [SINGING] for this pose. And I could actually have something that output like [SINGING]. So I could go and do that. And boy wouldn't that be fun to watch? But I want to do something different. That I'll leave as an exercise to you. Make a gesture or posed based musical instrument. I am going to control color. And this comes from a project that I referenced inspired by a viewer, Darshawn, who made a project that does an output. Because specifically what I want to demonstrate here is that the regression output doesn't have to be a single number. In this case, I want to have three values. And I'm going to think of those values as an R for red, a G for green, and a B for blue. So I can say things like, and the training can be, this pose is this particular color. This pose is this particular color. And then this pose is this other particular color. And then as I move, it will interpolate between those colors by trying to guess the value according to the regression. Now I'm ready to start implementing this in code. So I'm not going to write everything again. I'm going to start from the pose classifier. And the first thing that I need to do is adjust the configuration of the neural network. The differences instead of four categorical outputs, Y, M, C, or A, I just need three continuous outputs. So I could actually just change this number to three. Because it's still a number of outputs but the task is now regression. The other thing I really need to do is think about during the training process, how am I going to create these target values? And this is going to be really tricky. So maybe this color scenario isn't the best one. I only was one person here. But I think to demonstrate this idea, the best way would be for me to make these literal sliders. So I'm going to adjust the sliders and make the target outputs based on the position of the sliders. And then when I actually deploy the model, the model will control the sliders themselves and I'll see the color. I think that's going to work. So this target label is no more. I don't have a target label, there's no categorical output. Instead I'm going to have sliders. So let's comment this out and say, three sliders four red, green, and blue. They're all going to have a range between 0 and 255 with some default value, in this case 0. And I'll have the sliders start with red at 255 and G and B at 0. So we can see these are the sliders that I'm now going to control. And match their positions with a given pose. Now if you recall, I had this horribly awkward, for a variety of reasons, interface. As in, no interface at all with just key presses to set a label. And then I'd have this, like, callback hell with nested set time outs. Let me improve this for a little bit for this round. So one thing that I can do to improve this, and I haven't been using this throughout this video series, I've been staying away from it. But I'm going to replace this with something called async and await. These are key words that operate in JavaScript. They're part of ES 8 which is a newer version of JavaScript that allows me to have asynchronous events happen much more sequentially in the code. And I've covered this previously in several videos. If you haven't seen that, you'll want to go watch those or read up about promises and async and a await somewhere else. But what I'm actually going to do is I'm just going to go get the code from a very specific video where I wrote this delay function. I'm going to bring that in here. And then I'm going to change key press to use async and await with that delay function. And let me just do that and then explain what I mean. [MUSIC PLAYING] Oh, it is so lovely, look at it. Look at this nice sequential code that's, like, set the target label, console log it, wait 10 seconds, then do this. Then wait 10 more seconds, then do this. Isn't this lovely? It is really worth taking some time to read up and explore async and await so that you can have some much more readable code. This is all still happening asynchronously. JavaScript, everything happens asynchronously. This is just sweet syntactic sugar to make our lives a little bit more joyful today. But, ah, that's not really the content of this video. That's not the topic. The topic is, I don't have a target label anymore. What I have is-- and actually, let's just change this to if key equals-- like, I'm no longer going to be collecting a particular key press. So let's just have the collection moment happen when I do-- so D for data. And then I'm going to have a target color. And it's going to be an array with the values of all the sliders. [MUSIC PLAYING] So the idea is that when I pressed the D key, I'm going to pull the values from the sliders. I'm going to set that to a target color. I'm going to wait 10 seconds so I can get in position. And then start the collecting process, collect for 10 seconds, and then jump out. Now it would be much better interaction wise if I could manipulate the sliders while I'm making the pose. And if I could just, like, open the magic door and have a volunteer come in and help me with this, that might make more sense. But I guess I didn't think of that in advance so I'll do that another time. I also think that I'm going to be able to get into position a little faster. So let me change this to 3,000. But I haven't done the important part. This target color needs to replace the target label when I collect the data. That's happening right here. So previously I had this target label that was a character that I put into an array. And then passed it and add data. I think I can get rid of this now and just say target color. So this should be good. OK, dare I say that I can collect this data now? Oh, the chat thankfully is pointing out that I missed adjusting these to G and B. Oh, that would have really gotten me later, thank you. So I think also I just want to collect data for, like, 3 seconds. Because I'm going to do things like set the color, set the color. I'm going to move my arms maybe like this. And then just set a lot of different colors with lots of in-between states. That'll really show, I think, the regression more clearly. Let me also console log what colors there just so I see it. I'm going to start with the sliders in their original position. And press D. One, two, three. Collecting. OK, I got some data. Now let me adjust the slider a little bit. Let me add some of this color. I really should pick something where I could see what it is. Oh well, next time. Add, press D. Wait, happened to my pose? Uh-oh, I have a bug. Bug, bad bug. Bad, bad, bad bug. I re-declared target color. I'm making it a global variable so that I can use it across. I mean, there's ultimately a nicer way to organize the code. But I want it to be a global variable. So I set it here and then when I'm adding it I get it here. That was the problem, OK. Now, let's collect some data. Collecting, OK. Now, let me move the sliders around. I really should visualize the color. But what are you going to do? I'll just add a little green and take away a little bit of red. I don't know. And press D again. And, where was I? I'll go like this. Really make this pretty arbitrary. Oh, it really would be good for me to see what I'm doing. I'll make this pose. Let's do this. So you, following this along, if you're going to try to build the same thing, think about how you might really thoughtfully make a bunch of colors with a defined set of poses that means something to you. I'm doing this somewhat arbitrarily just to see if we get some results. Now I could hit S to save the data. And I have a nice JSON file, this default name that downloaded. Let me change this to color poses. Let's take a look at in Visual Studio code just to make sure it makes sense. Looks like it does. It's got a bunch of X's, 34. It's got some Y's. The Y's are the outputs, and it's an R, G, and B value. So I could have done the thing where I named the outputs. If I wanted to have names show up in the data I could change this to-- [MUSIC PLAYING] So ML5, the neural network is just dealing with numbers. But ML5 will allow you to specify names of the output so that when you get them back later you can figure out which is which. But I'm just going to remember there's three and they're in the order, red is 0, green is 1, blue is 2. That'll be simpler right now. Now I can go to the training the model stage. So the truth of the matter is, I could add and key press another option. I press tree T, it trains the model. But the way I made my classifier, I did those in three separate sketches. Collect data, train the model, deploy the model. So I'm going to keep going in that way. I'm going to open up the model training sketch from before. I duplicated it and renamed it to regression. The only thing that I need to change here is the outputs are three, the task is classification. And then I need a new data file. So I'm going to delete other data files I have and upload my new file. Load that file. And then everything is the same. I'm just running the train function. And then when it's done, save the model. So there's very little that I need to change here, just a different configuration. Load a different data file and train the model. I hope this works. I really hope this works. If you're watching this right now, you don't know how many times I've tried to get this to work where it hasn't. That's promising. A little bit of wonky stuff going on, but it trained. The loss went down. I think I've gotten some results here. And it looks like those files have downloaded. And we could see those files there in my downloads directory, which means I can go to the last sketch, the one where I load the train model and deploy it. So I've opened that sketch, I've duplicated it, and now I just want to delete the model files and upload my new ones. Files are uploaded. Adjust the configuration of the network. I'm going to delete some old code that's no longer being used. And I don't have a label anymore. This shouldn't be called classify pose anymore. Let's just call it predict, predict pose or predict color. Call it predict color. And this should be the brain. Because I'm doing regression I shouldn't call the classify function. I should call the predict function. This changes to predict color. And now the main work here is I need to change what happens when I get the results. So before I was looking at a confidence score and getting a label. Now I just want the raw red, green, and blue values. So this should change to predict color. Let me just console log the results. So let's see what the results look like in the case of a regression. They'll be formatted differently than when it came as the classification. It's no longer a sorted list of labels ordered by confidence score, sorted by confidence score. Let me also make sure to comment out this post label, which no longer gets drawn. We're just going to look at the console now. So in theory the first pose that it sees, I should get an output here that has red, green, and blue values. Uh-oh, I don't see any values. What happened? Oh, for some reason the path says model 2. I've been messing around with this code and that says model 2. Weird that it didn't give me an error like it couldn't find it. Is it in the-- oh, yeah. It's saying failed to load here. I don't know whose fault this is, whether the web editor should be showing this or ML5 didn't log it correctly. But that's definitely the problem. Path is model. Let's try this again. No pose, no pose, no pose, no pose, predicting. And I'm seeing, oh, three objects, R, G, and B. Let's take a look inside. An R, G, and a B. An R, G, and a B. So I should be able to use those values now to set the positions of sliders. Oh, I got to put the sliders in. And then also I could just draw the color. I kind of want to see the sliders move, though. I think it would be fun. [MUSIC PLAYING] So I have three sliders there. So now when I get the result, I can assign it to the slider. So I can say R equals results index 0 dot value. Pretty sure if I go back and look at what was in that object, you'll see there was an array of three objects. And the red value was in a property called value. And then there's 1 and 2, so there were three. Then I should be able to set the slider's position to these values. And then I also might as well add something to draw the color. So here before, when I was getting a label, I drew it as text. Let's draw a background overlay on the video with a little bit of alpha. Let's grab the values from the sliders which were set. And set that at the background with some alpha. Did I get this? Let's run it. OK, look at the colors moving. The sliders are sliding based on whatever pose I'm making. If only I could remember what it was that I did. But anything is going to give me a predicted value. I'm controlling sliders with my body. It's all very arbitrary. But hopefully you can see that if you did this in a thoughtful way, maybe color isn't the output that you want. Maybe three isn't the number of regression outputs that you want. Maybe it's music and frequencies or this or that. You must have a creative idea. But you can see that if you can, as you're moving your body, match the position of your body with some set of numbers you could then train a model to learn all of those relationships. And then interpolate between them as you move your body around. So I imagine that there's a very creative, exciting, fun, unique way of doing this. And I hope that you will explore it. So if you do, please share it with me. There's a variety of different ways to do it. You can find the page on TheCodingTrain.com for this particular video. Ask your questions in the chat, on social media, all of the above. We have a new Discord, which I'll just happen to mention in this particular video. Coding Train has a discord, you can find the link to that in this video's description. That's another way you can join the community and share what you've done. So thank you so much for sticking with me. I don't know how easy this was to follow or if this makes sense because I used so much of the previous code in it. So if you didn't watch those previous videos, hopefully those would fill in some gaps for you. But let me know. I can always revisit this in a future video. And thanks for watching goodbye. [MUSIC PLAYING]
B1 pose regression output model label ml5 ml5.js: Pose Regression with PoseNet and ml5.neuralNetwork() 1 0 林宜悉 posted on 2020/04/06 More Share Save Report Video vocabulary