Subtitles section Play video Print subtitles Hello and welcome to a coding challenge. Oh, my ukulele is very out of tune. What luck this happens to be the make a ukulele tuner coding challenge. What a coincidence. Well, I'm here in my new studio located-- it's not my studio, but it is a studio. It's in Brooklyn at New York University. And I'm doing my first coding challenge from here. I don't really know what makes sense. If this is a moment to do some sort of particular coding challenge, I don't know. I'm just going to do we're going to tune the ukulele because the tuner I have, the battery is broken. My ears are not so good. So we'll see if this works. I'm going to use the ml5.js library. This is a JavaScript library that I'm lucky enough to participate in its development and I try to make tutorials with, so this works well. It has a pretrained model for pitch detection, which I'll talk about in a moment. And then I'm also going to use the p5 Web Editor. It'll be a little tricky to do this in the p5 Web Editor because it's going to involve having to upload a bunch of stuff, but the good news is I already have the ml5 library right here. All you need to do if you're in the p5 Web Editor at editor.p5js.org, if you go to the ml5 website under Getting Started, click on this link, which will open a sketch in the p5 Web Editor with the ml5 library already imported. OK, so we're there. I can make sure this is working by saying something like, console.log ml5.version. I'm going to hit Run. And I see the version down here, so I've got ml5 going. Now, what do I need to do next? So if I go to the ml5 website, I'm going to Reference. And in Reference, I can see that the various functionality of ml5 is divided into different categories based on media. And what I definitely want to do is look at sound here because I want to find a sound model. And lo and behold, there is a model called pitchDetection. Now, I should mention that there are ways of analyzing a sound for pitch that you don't need machine learning for. You could do FFT analysis and look at the different various amplitude of different frequencies and pick the one that's-- there's a variety of ways. And people much smarter who know much more about sound could tell you how to do that. And I'm sure you could find other tutorials, but I'm here. I want to try to use the ml5 library. But this really begs the question, like, well, how is this working? You said something about pretrained model. So ml5 comes with a certain model known as CREPE. I don't know if that's how you pronounce it. I like to say crepe, which is a Convolutional Representation for Pitch Estimation or CREPE. I like fruit-- a little banana-- and maybe a little Nutella is kind of good. I don't know. It's too much for me, the Nutella. This is about pitch detection though. That's what this video is about. And so you can read a lot more about this particular model and what data was used to train it, which is always a question you should ask when you're using someone's pretrained model because there's a lot of things that can go wrong, or right, or be problematic potentially about a model based on the data it was trained. And this paper describes that in more detail. You could click on this link over here to see a demo of it in the browser, but we're going to do this in ml5. And a big thank you to Hannah Davis, who actually did the porting of this model into ml5. And I'll include some links to her work in this video's description. OK, so here I am on the ml5 documentation page. And it looks a little bit like, what's going on here? So I need to create a pitch object. And there's some sort of, like, string ./model/. What's that? First thing I want to tackle here is what this ./model/ is. So a lot of times when using ml5, it's going to load the model files from a URL. And you might actually put the URL into your code or ml5 might just know the URL automatically-- it's saved on a Google server or some other server, it's saved on GitHub and pointed to by ml5. In this case, this is a case where I actually need to have the model files with my code. This was probably going to change. Just by making this video, I've realized we probably should host a version of the CREPE model that you could access through ml5 more easily. But luckily, if I go to the ml5-data-and-models GitHub repo, where I am right here-- github.com/ml5js. I can navigate to models/pitch-detection/crepe. And these are all the model files. So this is very typical of any pre-trained machine learning model. There will be a JSON file. This is essentially a file that describes the model. And then there's all these other files, which are the actual weights, the numbers, the secret sauce of the model after it's been trained. All of those little parameters of the neural network are all stored in all of these files. Now, I've actually downloaded all of these already. And you can see them. They're right here in a folder on the desktop here. So what I want to do right now is add all of them to the P5 web editor. Let's see how that goes. I've really only worked with uploading little media files and sound files, but I think this is going to work. So I'm going to go here. I'm going to create a new-- it's hard for you to see this, but I'm going to create a new folder. And I'm going to call this crepe. And then I'm going to do Add File. And I'm hoping that I could just select all of these and drag them here. So I forgot that the web editor currently only supports certain file types, like JSON, or CSV, or JPEG image files, sound files. So these model files that include all the weights, they can't be uploaded to the web editor. And that's something that might change in the future. But luckily, I can actually just point to the model files that are on GitHub itself. So this particular URL right here where all these model files are stored, there's actually a way to turn any file that's sitting on GitHub into a URL that you can load from a content delivery network. And so a way of doing that, this is a nice blog post that I found on gomakethings.com that just shows this base URL. So I can always access files through this URL, cdn.jsdelivr.net/gh for GitHub, and then the path to the user name, the repo, and the path of the files. So I actually have done that right here. I'm going to hit Refresh. And you can see look, this is that model.json. file. And now, I can actually look and see, oh, it's all of the configuration information about this particular model. And I can grab this, and I can put this into my code. So let me close this. I'm going to go to the top. I'm going to say const modelurl equals, and I'm going to paste that in there. So now, I actually want just the path, because I want the model to load all the files. So I'm going to then remove model.json. And this is the path to the crepe model. And I can go ahead and just delete this from here, this folder. It's gone. I guess I need to also delete the files one at a time. Let me do that. And now, I am ready to start putting in some code. So I'm going to make a variable called pitch detector. Let's just call it pitch. I'm going to say pitch equals ML5 pitch detection. Let me give myself some more space. Pitch detection. I'm going to go back to the reference page. And these are the things that I need to load. So let me copy-paste. These are the parameters that I need to pass in. I'm sitting here wondering why I have an error. And of course, if I declare a variable as a const, I can't just have it not equal anything and then assign it later. So this is going to have to be let. And I'm going to make a lot of people angry right now by making everything let just to simplify things. And now, I have the-- I've now created a pitch detection object. Now, what I want is to give it a bunch of arguments to create itself, one of which is the model itself. So this is no longer a local directory of files. I'm going to say modelurl. I need to get this audio context and mic stream. Let me come back to that. But I also need a model loaded function, so that I know that the model has been loaded. All right. So audio context and mic stream, what are those things? Well, let's hope the documentation tells us. Audio context, the browser audio context to use. Stream, media stream, the media stream to use. I'm a little bit lost, to be honest. Can I get some more information? You know what I think we might do here is let's just look at the example. So the example here, if I look at the example code, is showing, ah, perfect. So an audio context I can just get by saying get audio context. Perfect. So this happens to be something that's built into JavaScript. It's part of the Web Audio API, I would assume. And I'm sure somebody in the chat or someone will leave a comment to explain what this is a little bit more. But I'm just going to go ahead and put it in here. I'm going to say audio context equals get audio context. And then for the mic stream, this is actually me connecting to the built-in microphone. Or potentially, I could specify a different microphone. And that I'm going to connect by using the P5 Sound Library. So I can make a variable called mic, and then I can say mic equals P5 Audio In, I believe is the function. So this is the function. This is part of p5sound.js, which incidentally is a library that I am accessing here in indexed HTML. And look at this. I am on such an old version of P5. Let's update this stuff. And I think the current version is 9.0. So while I'm here, I'm going to update that, go back to here. And then I think I can get this mic stream from the P5 Mic Object. I'll just look it up here, mic.stream. Perfect. So this P5 audio-- oh, and I need to say mic.start, start pitch. I don't see any function. Oh, interesting. Look at this. So I wasn't paying close attention. I didn't really think what I'm doing, because I've got to do two things here. I need to load them-- I need to access the microphone, and I need to load the model. When I load the model, I want to connect it to the microphone. And maybe I need to think about the sequence going on here, because as you know if you've done programming or watched a Coding Train video before, things in JavaScript happen asynchronously. So maybe I don't need to do it this way that's in the example, but I probably do. Let's try to do it not the way that's an example and see if it works. It's probably going to break without being thoughtful about the order. So I'm just going to say mic.stream right now. Then I'm going to say mic.start. And I'm going to have a callback like listening. And then I'll write a function called listening. And I'll say console.log. I'm going to take off this auto refresh, because it's doing crazy stuff, listening. So let me, Uncaught SyntaxError, invalid or unexpected token, line 20. What? I don't see any invalid token. Line 10. Listening, cannot read property start of undefined. Let's go back to the example. New, new. I forgot to say new. So the new keyword is very important. When you are calling a constructor to create an object, you are required to say new. There's something really interesting that's going on here, which is that I need the new here, new P5 audio in. But how come I'm not saying new ml5 pitch detection? Oh, do you know why? It's because people like to do things in different ways. This is actually a little bit more standard from what I understand in the world of JavaScript. This is not actually calling an object constructor. The little clue to that is the lowercase p right there. This is calling a function that's part of the ML5 Library. The function itself calls an object constructor. But you actually don't-- and the new happens in there. But our interface to it as the user of the library, we just call the pitch detection function. That's why sometimes I say these are the list of functions in P5 instead of these are the list of objects or classes. But this is actually calling the P5 audio in constructor, all right? Let's see if that fixed things. All right, listening, model loaded. That's promising. So it was happy. It seems to be happy with the order that I'm doing things in now, right? The order of the example is to make sure the mic is started, and then load the. model. But this doesn't seem so upset. Maybe it's going to work. So what's the next step? Well, what I want to do just to get this working is the pitch is going to come in as a number, a frequency value. So I just want to draw that frequency value in the canvas. So how do I get it? Presumably, there's some callback. There's a callback model loaded, but I need to actually tell it to listen. So let's go back to the-- I could look at the example, but let's look at the documentation. Pitch get pitch, and here's a callback. So this is what I want to do. This is like the function that I call to ask for a pitch. And then when it hears something, it console logs it. So I should be able to do this in the model loaded function. I'm going to say pitch get pitch, and then I'm going to write a function called got pitch, because I'm trying to do this in a very long-winded, highly descriptive way where I now have a separate function called got pitch. And then it receives an argument like frequency, and I'm going to actually draw that. Well, no, let's just console log it. Console log frequency, OK. Let's run this. Model a loaded. (WHISTLING) (SINGING) La, pitch, frequency. It is not working at all. This is a failure. All right, this is not working, because it failed to execute create media stream source on the audio context. Parameter one is not a type of media stream. I have a feeling that this is the problem, right? I did things out of-- the order now is the-- I'm assuming the order is the problem. So I need to make sure the microphone is ready before I start trying to load the model. So in this sense, let's try to re-- let's try to redo this order. So mic audio in, mic start listening. Once I am listening, then I will load the model. Then once the model is loaded, I will call get pitch, got pitch. If I look at the documentation, get pitch-- oh, oh, oh, womp, womp, womp. I forgot something else really important. I can never remember this. I don't know why. ML5 functions, callbacks are all written with this style known as error first callbacks. So you must include two arguments. The first one is the error. This enforces you to think about error handling, which is a thing I don't really think a lot, but at least ML5 is trying to get me to do it. And I should try to be an-- I should be an error-checking kind of person. So this should have error frequency. And I could do a little error handling. Like, I could say, if error console.error error. Otherwise, console.log the frequency. And I'm now actually realizing that there's a slight inconsistency in the way that the pitch detection model works in ML5. One of the things that is all of the other features identified do is they return an object. So maybe, it gives you an object that has the value you're looking for, a confidence score. So probably, the raw frequency shouldn't be in there. It should be an object. Maybe it actually already is and I'm wrong about that. Let's try running this one more time. Oh no, I like that. I'm seeing something. So this is promising in that something came out. Did the error come out? No. If it was an error that came out, it would have been red, because console.error will print something to the console that's red. So frequency came out, but frequency came out as null. That's fine. It detected no frequency. I wasn't making a sound. And it's not checking anymore. The reason why it's not checking anymore is it doesn't know to keep checking. I have to explicitly ask it to keep checking. So I say give me the pitch. And then once it's got the pitch, it logs it, and then I say give me that pitch again. So this is a little bit of a way of calling this recursively. This is kind of recursion, because it's not exactly-- it's a loop really. So let me run this. There we go. So when it detects a frequency, it console logs it. And let's see. Higher frequency, lower frequency. Higher frequency. Excellent. So now, I don't want to just see this in the console anymore. I want to create a variable. I'm going to call it freq. And I'm going to set it to 0, just so it has some value to start with. And then here, I'm going to say whenever I get a frequency, if-- I don't want to assign it null. So I'm going to say if frequency, freq equals frequency. I might want to account for null in a better way. And then now in the draw function, which is quite unnecessary, but I'm going to do this anyway, I'm now going to say text align center. Text frequency, I'm going to put width divided by 2, height divided by 2. And let's say let's do text align center, so it centers in both horizontally and vertically. I'm going to say fill 255 and text size, I like 64. And then I'm going to say frequency 2, fixed 2. So I want to see two decimal places. And now, let's run this. OK. Hey, so the ukulele notes are A, E, C, G. Is that right? A, E, C, G? So this should be an A. This should be 440 if I'm right. So there's math that you could do this with, but I can also just look it up here. Yeah, right there. Look at that, it's even highlighted. I guess it's highlighted because it's like A4. So this is the frequency I want. So let's just say I just want to tune-- I'm going to just tune the A string. And then maybe I'll do the rest, but speed up the video or something for you. So I'm going to tune the A string. People in the chat are pointing out that I could also just use a tone generator. That's probably a smart idea. Let me do that right now. All right, so thank you to Alca in the chat who suggested this online tone generator. I'll include a link to that also in the video's description. I'm just going to play it. And as I do the slider, we can see the pitch is pretty much-- the pitch detection is matching. I mean, I'm talking, which is messing it up. It's matching the tone that is generating. Now, you'll notice it's not perfect, right? This is a machine learning model that's been trained on some data set of sounds, and then it's making a guess, a prediction of what it thinks it is. This is not a 100% accurate analysis that you could probably do mathematically, especially with a pure tone. But this is an approximation that would hopefully work with a variety of different kinds of sounds that might be harder to analyze and pull out that exact pitch. Also, I have no idea what I'm talking about when it comes to sound. I'm just trying to get this to work. All right. So now, what I want to do is I want to make some kind of visual indicator. So I think what would be useful here is for me to draw-- maybe I should draw some type of rectangle that is big when I'm way off or smaller-- or maybe it's when I'm above the pitch, I'm drawing it to the right. When I'm below, I'm drawn it to the left. Some type of-- and I'm sure you, hopefully, will make a version of this with a much nicer and more thoughtful interface. But to do this, I think all I would need to do is say let difference equal-- I want the frequency, what that frequency is, minus 440. So right now, I'm just tuning for A, just tuning the A string. So then I want to draw-- let's do rect. I'm going to make it white for right now. I'm going to say its position is-- what's the size of the canvas? 200, 200, 400, 400. So I'm going to say it's at 200 comma 50. Let's move the text way down. And then so I'm going to move this down here, and then I'm going to make the width of it different. I'm just going to multiply it by 10, just to scale it, and have the height be 50. So let me just try this for a second. And let me play the tone. Whoops. So this actually works. The nice thing is if I give it a negative width, it seems to draw it as a negative. It draws the rectangle in the opposite direction. I don't actually have to flip it. So the question is-- I think times 10 is actually quite a bit. So let's actually not multiply by anything. Let's think about what are these differences like. If it's 500, that makes sense. 60 pixels is pretty reasonable. Let me play it. So I'm not seeing any rectangle. There's a little rectangle. Talking messes it up, because it gets a different picture of my voice. But I could also do something right now-- I could be a little bit more thoughtful about this and I could map-- I could say let amount equals map the difference between if it's between 100, if off by like 10 and 100 to a value between 0 and 1. And the reason why I'm doing this is I could use the function lerpColor. So let me say I have the color red, which is 25500 and I have the color green which is 02550. And what I want to do is when it's all the way, I want to get the actual-- I want it to have the actual color be lerpColor. So lerpColor gives me a linear interpolation between two colors, like red and green. Oh, this is actually not what I want to do. I want it to be green when it's in the center. So actually, I want to map the absolute value of the difference. I want to map the absolute value of the difference-- and this is much easier now-- between 0 and 100. And I want to then-- when it's zero, it's perfectly green. When it's 100, it's red. And I don't know if this is going to actually look right, but let's try it. (SINGING) So we can see here now, if I try to tune the ukulele. (TUNING) Let's make this an A4 again. (TUNING) Yeah, let's not use lerpColor. I think the lerpColor was an interesting idea. And really, just having like dials that like fill in or low-- there's just so many nicer ways of doing this. There we go, OK. Here's my interface. I worked very hard on this. I am now going to play a tone and I'm going to try to tune it. And then when you get there-- and I'm going to let myself be within three frequency values. I'm going to say also this. Auto refresh was a terrible idea. Fill 02550. OK, ready? Let's tune this ukulele. (TUNING) Five is way too much. Let's have the threshold be one. (TUNING) There's my ukulele tuner. You can see that is not the right note. So so many things need to be thought of. First of all, that does not sound right. This is going to be a much better way of tuning it. (TUNING) Oh, it's actually not so bad. It's better than I thought. Now, really quickly what I want to do is actually allow myself in this one sketch to tune all four of the strings. And I really should stop this video right now and not go any further, but I would like to do this. There's so much that needs to be [MUSIC PLAYING] And you will do that. You will also make a version of this with an interface that looks like an actual, thoughtful tuner. Let's make an array of the notes. And I'm going to create a bunch of objects. And each object is going to have the note and the frequency. And the note, for example, is A and the frequency is 440. So I don't know why I did all that work when I just want to do this. So there are four strings on the ukulele. There is A, E, C, and G. So let's look up those frequencies. So now, the thing that I want to do is I want to find out what am I tuning against. So first of all, this is horrific. I cannot bear this code that I have written. So let's at least make this a little bit better. Let's make a variable called threshold. And again, I should do a mapping or whatever, but let's make that threshold one. And I will at least put that there so we know. And again, it's ridiculous that I have these if statements in two places, but that'll be fine. But the first thing that I need to do is actually figure out which note am I trying to compare it to. And I want to automatically do that. So I don't know actually which string I'm playing unless I did some kind of crazy computer vision thing, but I'm just going to like find the note that it's closest to and tune against that. So what I want to do here is I'm going to loop through all of the notes. And I'm going to find the closest note is an index like 0. And then I'm just going to say negative 1 right now. And then the record difference is-- I'm going to start with infinity. And then I'm going to say the-- I'm going to say the difference is notesindexi.frequency, the actual frequency minus that. And I'm sure I could do some kind of fancy higher order array function, but let me just do it this way. If the difference is less than the record difference, then the closest note is i. And then once I've done that, I have the closest note. So now-- and let me just keep that difference. Oh, I have it in record difference. So difference equals record difference. And it's not less than, it's absolute value. Sorry, got to have that absolute value in there, always got to have that absolute value in there. And then the actual note itself-- can I use this variable name? is the notes index closest note. I mean, I could've just-- I don't have to save the index. I could've saved the object. But yeah, let me save the object, notesindexi. I don't need to save the index. And that's the closest note. And then what I'm doing is I'm showing the value, and then I want to draw-- instead of where I'm drawing this value right here, I actually want to just put the note on the screen as well, which would be-- where do I do that? Where do I actually draw that difference? Oh, the text is up here. So I'm going to make this much smaller. And then I'm also going to right here say-- I'm going to say text size 64. I'm going to say closestnote.note. And this I'm going to place 150 pixels up and this 50 pixels up. Let's see what that does. Great. So now, I should be able to tune all of the strings. So it should detect here. (TUNING) It doesn't work. So why is it not giving me any information? We're going to have to debug this. Oh, you know I didn't do? I'm missing a super obvious thing. I forgot to set the record difference equal to that difference. Well, duh. OK, here we go. So my A is pretty tuned. (TUNING) Ah, I'm the worst. Let's put the absolute value here. And then I won't make this mistake. If the difference-- this has to be the absolute value as well. That's definitely a problem. All right, here we go everybody. I think this is good now. (TUNING) That's A. I need the negative difference. Oh, I need it here. OK, everybody. Everything's going to be fine. So when I'm getting the record, I want the smallest one. But then when I use it down here, if it's negative or not is kind of important, right? No, I'm using the absolute value everywhere. Oh, except for drawing where it is. How about this? How about that? Huh? How about that? If the absolute value of the difference is less than the absolute value the record difference, then the record difference-- but I see the negative for use down here. (TUNING) I would like to be able to see more movement. So where is this divide by 10? Give me a break. Divide yourself by two, people. (TUNING) Whoops, wrong string. (TUNING) This has been my first coding challenge in the new Coding Train studio over here at New York University in Brooklyn. I made a ukulele tuner and it has the worst ever interface for a ukulele tuner, but I do think there are some nuggets in here. It's nice to see how that pitch detection model works. I would love it if you made your own version of this. You could go to the codingtrain.com, find the page for this coding challenge, look at the instructions to submit your own. I actually video tutorial for how to do that. If you make your own ukulele tuner and you put it on the web, I will tune this ukulele with it in my next live stream. So thank you very much for watching. And I will see you in a future Coding Challenge. Goodbye. [MUSIC PLAYING]
A2 pitch ml5 frequency model ukulele tuning Coding Challenge #151: Ukulele Tuner with Machine Learning Pitch Detection Model 5 0 林宜悉 posted on 2020/04/06 More Share Save Report Video vocabulary