Subtitles section Play video Print subtitles Strike a pose gesture recognition in JavaScript with Machine Learning & Arduino Charlie Gerard PARISS: I have been enjoying all of these presentations from backstage and I'm like reading everything backwards. But it's cool. All right. So, next we have strike a pose gesture recognition in JavaScript with Charlie Gerard. So, can we give some encouragement and give some clapping and whoo! Charlie! [ Applause ] CHARLIE: Thank you. Thanks so much for the introduction. And thanks, everybody, for the warm welcome. We'll see if you feel the same way at the end of the talk. But that was nice. So, yes, my name is Charlie. I'm a developer at Atlassian. I live in Sydney in Australia. You might not know Atlassian, but maybe the product, Jira. But what I'm going to talk about today has nothing to do with Jira. And outside of my job as well, I'm a Google developer expert in web technologies and a Mozilla TechSpeaker. There's been a few other speakers part of these groups in the past few days. If you have questions, feel free to ask us. But the title of this talk was strike a pose. And I don't know if there's any Madonna fans. I forgot. I should say that first. So, I really like human computer interaction. With Jira, nothing to do with that. But at home, I like to explore ways to interact with technology and interface with devices in ha different way. Maybe not with your computer or your phone, or maybe using the phone a different way. So, this is why I called this talk strike a pose because it's from the song Vogue. And I don't know if you know it, but vogueing is a type of dance where you don't follow certain steps. You just express yourself in different gestures and it's who you are and mow with dance. But we're not going to dance today. Not what we're gonna do. But we are going use our gestures to play a game of Street Fighter. At the end, that's the goal. Yes. So, to do this, there's a few different ways. But the way we're going to talk about today, you need this to get started. So, we're going to use Arduino. I'm really glad some of you did workshops yesterday to use hardware. You already know a little bit what I'm going to talk about. This is probably not the one you use. This is a maker 1000 that connects via Wi Fi. On top that have, we're going to be holding that piece of hardware and using machine learning to find patterns in the data that we're tracking in our movements and we're going to do absolutely everything in JavaScript. So, for Arduino, yay! For Arduino, Johnny Five. And for TensorFlow, we're going to use TensorFlow JS. So, step one, gathering data. At first we have nothing. We have an idea. We want to be able to play Street Fighter with our body movements. What I mean by that is when I want to punch I'm going to actual punch. There's three movements. I have punch, and let's just call them like that. So, gathering data. We're going to do that with the Arduino. So, I used a Wi Fi board because if you use a basic board like the UNO, it has to be timed to the computer and I want to be able to move around. Play from anywhere. If I wanted to, I could play from the back of the room as well. But the Arduino is going to render the program, but there's no built in sensor to track motion. We are going to add an NPU1650, an accelerometer and gyro scope. This is good to track movements. I'm going to become an NPU1650 right now. How does an accelerometer work? If you can track motions in three different axis, X, Y and Z. And accelerometer, left to right, that would be the X axis and then the Y axis and then the Z axis. But if you use the accelerometer just by itself, you're not walking like this through life. We use the gyro scope to get the changes of rotation. We have to track three more axis, but with rotation. X axis, rotation, and Y and Z. And in the end, we're going to have six points of data to track over time as we are moving. But there is one last thing that I did as well, which is, oh, sorry, a button. And I did a button because I want to track the data only when I am doing a gesture. You don't have to do that, but as a first prototype, it was easier to only get the data from my gesture to train the algorithm to only to only find patterns in the data when I'm holding the button. When I'm gonna hold the button, do the gesture. And when I release it, take into consideration that data coming from the sensor. So, this is for the hardware. But in terms of code we start by always requiring a few normal. Johnny Five, the system, and the client to set up our board. As it is a Wi Fi board, you have to connect to the IP address of the Arduino and a certain board to communicate with your computer. And then once the board is ready, once that is set up, we set the pin that you want, in this case, A0, and then we create a file, like a stream, to be able to write data to that file. In this code sample, I had coded a name of the file, example punch zero. Because you have to record data a lot of times. So, punch zero, punch one, punch two. And this is how I spend my personal time. I do that at home. After you open the stream to write to a file, you have to instantiate with a sensor the NPU1650, give it the pins to which it's connected. And as soon as you have change of data from the, as soon as you're moving, you can track the file. And when you're holding the button down and writing that data to the file. Once we release the button, we end that stream, meaning that the rest of the data coming from the sensor, we're not adding it to any files. In the end, we expect stuff that look like this. And I say expect because this is not exactly what I got. But this is what I wanted. So, this is real data, but not come from the Arduino. And I will talk about that a little bit later. This is actually not at all what I got. But I really wanted that because I knew that it kind of work when the I built the really first prototype of this. What I got instead was something like this. So, we still have six points of data from the accelerometer in the gyroscope. But it's not personalized enough. Don't worry about reading it, but looking at it later, you see that the points of data look exactly the same. And over time, there's not that much of a change coming from the sensor. Maybe it's my sensor or code, I'm not sure. I just know in the end you can train the algorithm, you get your model. But the thing is the accuracy of this data is not good. I have to copy that later. And this is one thing if you get into machine learning, it's not only about how much data you get, it's also about how good it is. I was getting a lot of data from the Arduino. But the quality of it was not good enough for the model to be accurate. Okay. So, now we recorded our data many times in other files. But you can't really feed that to TensorFlow yet because at the moment it's just lines in files. So, it's not really useful. So, we have to do a bit of data processing to transfer the data into a way that TensorFlow can work with. So, if we imagined that I have the nice file that I wanted. And start with lines. But the first step to transform it into something that we can use it. And change it from features and label. I think this is probably one of like the main terms in machine learning. Features will be the characteristic of your gesture. So, all the data that's in the file. And the label will be so, you can't really work with the strings. And it has to be numbers. So, we're just converting the name of the gesture to the number. If you have a gesture array with the punch and punch is index zero. You set the label to zero. But that's only for one file. So, one sample. What we want is to do that with all our files. So, we're going to end up with a massive array of objects that represent our gestures in features and label. But that's not it neither. Because that's the first step to kind of put the data in a way that we understand it a bit better. We can work with objects. But TensorFlow doesn't work with objects. TensorFlow works more with stuff that look like arrays, but it's more take tensors, but let's just talk about arrays. And at the moment, we have objects, that's not good. We need to split between features and label. So, we're going to have here. We're going to end up with something like this. I'm going to go line by line, that's like what? The labels have zero, one or two, three gestures, punch, round, and upper cut. And we will have a multidimensional array of features to map gestures to each other. So, the first level of our multidimensional array for labels is going to be mapped to our first level of features as well. And if we go a bit deeper, it means that the first punch that I record is going to be mapped to the data that I get from an in features array. And you're going in, the second one mapped to the second one in the first level of the array. And you go through that one by one. So, we're getting a bit I don't know if you noticed, I didn't show you any code to transfer the data because it has nothing to do with TensorFlow yet. At moment, it's if you know lots of array methods and pushing objects and stuff like that. You do that with normal JavaScript functions. We're not using any TensorFlow methods. But we are getting more towards something that TensorFlow can work with because it's only multidimensional arrays and just numbers. No strings, no numbers. We're getting closer. And now the thing now we need to move on to converting to tensors. Think of tensors as a data type that TensorFlow can work with. This is where I'm going to show a little bit more of code that is TensorFlow specific because we have to use the built in methods to be able to transform our multidimensional arrays into tensors. We don't opt to look at the labels and features. And I had an array of zeros and two, and one to map all the gestures. But with this, the algorithm is not going to really try to really look at the data and think of a gesture. It's probably going to look at, well, treat previous ones that you gave me were punches, I'm going to assume this is a punch. But if you shuffle the data, you can force the algorithm to try to understand what makes a punch and an upper cut and things like that. So, once the data is shuffled, we start using well, in this case it's a tensor 2D from TensorFlow where what is going on here is that we're using our shuffled features in the second parameter of the function is going to be the shape of the tensor. And that means that it's going to be expecting the data to be coming in a certain way. And our first argument is the number of samples by gesture. If a punch gesture 20 times, I'm going to use the number 20. And the total number of data per file, the second one in the array is the number of, let's say, data input in a file. So, you know how I'm tracking six different values for my gestures, X, Y, Z for accelerometer and gyroscope. You have to multiply the two numbers and that will be the shape of your tensor. Specifically the amount of data in one gesture that it's going to be looking for. And then labels, tensor. So, this is gonna be a tensor 2D because the label is an integer, zero, one or two. And we move from a tensor 2D to a tensor 1D. Really we have the data type that TensorFlow can work with. We have tensors. But what we need to do is the next step, splitting between the training and the test set. What we want food is we know our data is already labeled. We already recorded it in files and we named them with the right gesture. We're going to use 80% of our dataset to give the algorithm, to learn patterns. And we're going to keep 20% to validate the predictions are right. We have the algorithm, so, if it says it's a punch, but you know from a data file it's an upper cut, you can retrain until the accuracy gets a bit better. To do this, we're just calculating what 20% of our sample is. But then we're just slicing tensors. So, I'm not going too much into that because I think that's not the part that matters the most. But it's just like, you don't dare how it slices, you can use the slice method. But we have a training set and a test set of features and labels. So, now that we have this, we really have our data kind of ready. So, what we have to do next is we create the model. At the moment we just prepared our data. But we don't have the model yet. So, to create the model, this is where it kind of becomes it's not really a science. It's like a try things until it works. That's what I do. Where I create a sequential model and add layers. Have two or four or six and change parameters. I'm not quite there to understand what everything does. When it works. I don't touch it anymore. Let's just pretend that it's fine. And before fitting the model, before launching the training, you have to pass a few different parameters of the optimizer that you want to pick and what you want to track and things like that. If you want to create your own, you could copy and paste, but you could also play around with different features and see what makes your accuracy better. So, once we have that, we have to fit training features and training labels inside our model and we just pass it. The number of steps for the training and the validation data is our test features that we're going to validate against. Once the training is done, we save our file. So, the actual model is going to be saved as a file in your application. So, you can just use it later. So, now we created our model. It's there. With all our data. But we want to use it, right? So, this is going to be to predict. To do this, we require TensorFlow. We probably don't need to look at it. But we require TensorFlow and then we have an array of classes or gestures. We load our model to be able to use it. And we have the same code as before recording the data, but this time using it to predict where the data changes from the Arduino in the sensor. And when the button is held, we are kind of keeping all of our data into a variable. We're pushing it into an array. Once we release the button, I want to use the new live data that the model has never seen before to match it against one of the gestures that it's been trained against. Once we release, we keep the data that we just recorded and but at the moment we also need to transform that into a tensor because it's just an array and TensorFlow can't work with just arrays. So, we create a tensor 2D as well with our new sample data. But this time the shape is going to be different because we're giving it only one input. The first number is going to be one. And 300, that's kind of arbitrary. You can change it. It's basically because I used 50 lines in each file. 6 times 50 is 300. So, the model is expecting 300 points of data to work. You can change that. It's fine. And once our live data has been transformed into a tensor, we can just call the predict method on the model and it will give us back a label, so, a number. And so, zero, one or two. And then you can look at that in your gesture classes array to get the actual label of the live prediction. This is like what is really cool about it, is the way every time I do a punch, there is no way I could do the exact same punch I did before. Because if we have 300 or more points of data, there's no way that the value from my accelerometer would be exactly the same even if you tried, like, I don't know, a hundred times. So, it means that you are more free to do whatever gesture as long as it looks like a punch to the algorithm. So, the way it learns the pattern from data, it would probably understand the values and the stuff from the accelerometer and understand it and match it to one of the gestures that you attract. Okay. I feel like that was heavy. Okay. So, now that I talked about how it is supposed to work, I'm gonna try to show hopefully that it works. I mean, I know it works. But I don't know if it will work here. [ Laughter ] Yes. Okay. So, you know how at the beginning of the talk I told you about the data that wasn't really good because it was well, I don't know yet why. But it wasn't really good. So, this is the sketch. This is how I put the sensor button in the Arduino together. But the thing is I knew I wanted to do a demo. So, I actually switched senors because I didn't want to come on stage knowing it wouldn't work. Maybe it will work because I switched. It works the same way which is really cool because I was just able to not even change the code that much. Just change where the data is coming from. I used a daydream controller that you usually use with the VR headset from Google. And I use that one because it has a built in accelerometer and gyroscope. And an Arduino. And it gives me the precise data and makes it quite accurate. It was like, awesome. So, what is supposed to happen now is something like this. I have a game. I put a GIF because if it doesn't work, at least you can show it works. I'm going for the punch. Try the three gestures and it's supposed to be doing this. Okay. My god. I'm going to try to demo it a little bit, last time it did not quite work as planned. okay. If anybody has Bluetooth on right now, feel free to disconnect it. I've had issues in the past. Okay. Let me just have a sip and okay. A predict file. There's an error. Don't worry. It still works anyways so I didn't fix it. [ Laughter ] Okay. I need come on. Don't let me down. It's not on. Oh, I knew don't okay. I'm waiting for the message that tells me that this okay. It's on, it's on, it's on. I want to be quick. I haven't done anything yet. So, I'm going to go like this. Okay, don't let me down. You fucking let me down. Okay. No! No, no, no, no... never try Bluetooth ever. Okay. It's okay. Okay. You're on, you're on. Refresh. Oh, why not? It's not even going to refresh? I have other demos. So, I have two like is there a okay, okay, we're in. Come on. Give me the notification on. Okay. I'll try one more time and then I have to move on to the next. Okay. So, we'll try that. Yay! [ Applause ] Whoo! [ Applause ] I was supposed to have I was supposed to have sound. I don't know if you need to turn it on. Or if it's my thing. [Ack sound] Try it again? That's cool. I'm done with this one. Okay. So, but the thing is, then I started thinking, okay, I have a gesture recognition system in JavaScript. What else can I do with it? So, this is where it's gonna get even more lame. It's supposed to is there sound? [harry Potter music] What you can do as well is wand movements. So, if there is any Harry Potter fans I know there's a few because we talked about it. If you don't know Harry Potter, you're supposed to tell the spell, expelliarmus and stuff like that. As you grow, you don't have to say the spell, no movement. And there's a third level, you can just think the spell. We're not going to do that today. I'm going the one where you move. I just trained two gestures. And it's supposed to do it on the screen. I hope it's going to work. It's going to be very embarrassing if it doesn't. I trained lumos and expelliarmus. I really like that demo. So, you've got to work. Oh, nope. Hold on. Wait. Give me the second most okay. I just need it to work once. Just work once. Oh, why are you crashed? I don't know why this is crashing. It's happening it's fine in my room. And never fine on stage ever. Come on, I want the Harry Potter one to work. This is really weird. Hello? Well, I haven't even done anything. Okay. Maybe third time is the right time again? Oh no! You were listening. Dude. Come on. I really like that one. It didn't do anything. What? All right. Oh, okay. So, okay. That's the thing is like the shape of I'm gonna talk when I try to do it again. The error here is the thing, it's like the model is making the shape go a certain way. And if there's in the enough data, it will crash. I just haven't done that yet. Come on, I need to do the expelliarmus. Otherwise I'll just move on. This is like not even that has nothing to do with Bluetooth. Come on, come on, come on, come on... no. No. Okay. Well, I'm just gonna go back to my room and cry. That's fine. I need to move on. That's cool. Another time. So, but what else? So, what I was really as I got into kind of like that rabbit hole of trying with stuff, I was thinking, well, okay. I'm having fun, but I want more people to have fun. Like that's boring to play by yourself. So, I was like, okay, it works with an Arduino and it works with a daydream controller. But, you know, what else has a built in accelerometer and gyroscope? Your phone. It has a built in sensor to track post, you know, direction and orientation as well. And you can access that stuff in JavaScript with the generic sensor API. What we can do is replace the code that gets the data from the piece of hardware. In this case, obviously just pseudo code. But you would create the gyroscope and with the event listener reading and you can do whatever you want with web sockets or lining up hardware. When I thought you could why say you can and not do it? I'm going to try to do it? Actually I could do the Harry Potter one with my phone. Because it will work. I'm going to do the game one with my phone. This is just running locally. I have I'm using ngrok to be able to communicate with my phone on the same port. So, if I I'm gonna do the game one just to show that it's working. It should be working. That would be easier because it's not Bluetooth. I have the game here. And I have the page on my phone. Connected oh, yeah, okay, that's fine, that's fine, that's fine. oh, the Wi Fi on my phone. No! Are you for real? Oh, I'm back, I'm back, I'm back. Okay. I'm on. All right. Okay. So, what's gonna happen is I'm gonna do the thing, press on my screen. And record data, and when I release my thumb on the screen, it should do the thing. Well, that wasn't the one I tried. But that's fine. Let me try if I do that? That wasn't the one I tried neither. But it's doing something, so, yeah. [ Applause ] It really didn't go as planned from beginning to end. I probably won't have the time to do the Harry Potter one. I can try it later with you if you want. So, yeah. Just before I finish, just a little recap, if you want to know how to build that kind of stuff. You have to start by getting the data. I use sensors, but if you have others, you can use whatever you have with data. I used an accelerometer and gyroscope, but you can use a sensor. You need to change the data to work with TensorFlow and split it between the training set and the test set. Train your algorithm, and finally you can run the predictions. Just before I finish, I'm going to say something I've said it before, but I usually say useless is not worthless. You're not going back to work and say, fuck everything, let's just do recognition. But I learned a allot. I learned a lot about Bluetooth you shouldn't use it. But doing that, and that's something that really gets my super excited about just building stuff because I think there's so many different ways we could interact with technology in ways that we want. Because the thing is I train it with my gestures, but I could let anybody actually use the sensor, but anything that look like a punch, it's your gesture. It's going to be mapped to something it's been trained with. Sorry for the bad demos. That's all I had to do. I'm going to share the slides and the resources probably tomorrow. I need to clean up some stuff first. But thank you very much for your time. [ Applause ]
A2 data gesture punch arduino accelerometer tensor Strike a Pose - Gesture recognition in JavaScript w/ ML & Arduino - Charlie Gerard - JSConf US 2019 1 0 林宜悉 posted on 2020/03/28 More Share Save Report Video vocabulary