Placeholder Image

Subtitles section Play video

  • (bell dinging)

  • - Hello, and welcome to another Beginner's Guide

  • to Machine Learning with ml5.js in JavaScript.

  • So I'm here.

  • It's been a while since I added a video to this playlist,

  • and a bunch of things

  • about the ml5 library itself have changed.

  • There's a new release, 0.3.1.

  • There is a brand new website,

  • which you can find right here at ml5js.org.

  • So to some extent, this video is really an update

  • about the library, but I'm also going to look

  • at one particular feature,

  • a new feature of the library, sound classification.

  • The machine learning model that I'm gonna use

  • in this video is the Speech Command Recognizer,

  • and this is a model available from Google

  • as part of TensorFlow.js models.

  • Now, so this is a really important distinction.

  • I am not here to train a sound classifier.

  • I might do that in a future video

  • and show you about how to apply transfer learning,

  • which is something I did with images, also to sounds.

  • I just gonna make use of a freely available,

  • pre-trained machine learning model.

  • Anytime you use one of those things,

  • even in just a playful and experimental way,

  • which is what I'm doing,

  • it's good to do a little bit of research

  • and take a look at like well, how was this trained,

  • what the data, what are the considerations

  • around how the data was collected?

  • And so I encourage you to read through the read me

  • here on GitHub and in particular,

  • to click over and read the original paper

  • about this speech commands model,

  • and there you'll see, if you look,

  • it talks about some of the datasets

  • like Mozilla's Common Voice dataset,

  • 500 hours from 20,000 different people,

  • this LibriSpeech, 1,000 hours of read English speech.

  • I don't know how to say this, TIDY DIGITS,

  • TIDIGITS, T DIGITS, 25,000 digit sequences,

  • which apparently was probably neat, right?

  • It's just like hours and hours of me reading

  • this random number book over and over again.

  • But so I encourage you to check out this paper,

  • and you can also find code for how to use this model

  • at TensorFlow.js in the tfjs models, GitHub repo itself.

  • I also want to interrupt this video for a second

  • to talk about how the sound classifier actually works.

  • This is kind of a surprising little tidbit,

  • and I'll come back to this more

  • if at some point I create a video

  • about training your own sound classifier.

  • Now, there different ways you could do this.

  • This isn't the way to make a sound classifier,

  • but this is the way that this particular model works.

  • It's actually shockingly,

  • amazingly doing image classification.

  • So if you image we have this thing

  • that's called a convolutional neural network.

  • This is the underlying architecture,

  • the structure of that machine learning model

  • that does the classification.

  • Typically this kind of model is something

  • that we would put images in.

  • Like we might have images of cats.

  • We might have an image of a turtle.

  • That's not really turtle, but whatever.

  • So the idea is that we're sending these images in

  • and getting back a label

  • and maybe a confidence score.

  • So it's the same idea.

  • The only thing is now we wanna send in audio

  • and get back a label like up

  • or one and a confidence score.

  • So how would we convert sound into an image?

  • Now, again, there are other neural network architectures

  • which you could receive sound data

  • in maybe a more direct fashion,

  • but if you have ever looked at a graphic equalizer

  • or some type of sound visualization system,

  • I've made examples like this in p5,

  • you can draw something that's often referred

  • to as the spectrogram,

  • which is basically a graph of all the various amplitudes

  • of frequencies, the wave patterns of the sound itself.

  • So if we took a one second spectrogram

  • and made that into an image,

  • we could then send that image

  • into a convolutional neural network

  • saying that's the image that is produced

  • from the spectrogram of somebody saying the word, up.

  • So underneath the hood, this machine learning system,

  • even though it's designed to work with audio data,

  • it first takes that audio data,

  • converts it into an image

  • and then sends it through a very similar types

  • of neural network architecture

  • to standard image classification models.

  • And you can read more about that in that paper itself.

  • However, I'm gonna show you how to access this model

  • in a quick way with the ml5 library.

  • And this is the new as of today, which is I dunno.

  • What's today's date?

  • June 13th, 2019 (laughing).

  • I'm gonna show you how to use this with the ml5 library

  • as it stands today.

  • So I'm gonna click here under reference.

  • One thing you should see, there's a lot of new features

  • have been added to the ml5 library.

  • I'm gonna come back and do videos about more of those,

  • but the one I wanna highlight is sound classifier.

  • So I'm gonna click on this,

  • and for all of the different functions available in ml5,

  • you'll find a documentation page

  • with some narrative documentation,

  • a little bit of a code snippet

  • and then some written documentation

  • about what the function names are

  • and the various parameters and things like that.

  • And by the way, I'm noticing now (laughing).

  • This will hopefully not read.

  • This is like a mistake (laughing).

  • This is documentation that's actually

  • for either Body-Pix or maybe the U-Net model,

  • which does something called image segmentation.

  • So we gotta get that fixed.

  • I'm sure many GitHub issues and fixes

  • will be out and done by the time you see this.

  • So in case you've forgotten how to use the ml5 library,

  • I'm just gonna show you as it's documented

  • on the ml5 webpage.

  • So first of all, you can go here to this Quickstart.

  • You can actually just click on this

  • open p5 web editor sketch with ml5js added.

  • You know what, I'm gonna so that.

  • That's the way I'm gonna do it.

  • But you also could just put a script tag in your HTML page

  • referencing the current version of the library,

  • which, as I said, is 0.3.1 as of today,

  • but probably while you're watching it,

  • it will be a higher number.

  • So lemme go and just open up this link here,

  • and now I'm in the p5 web editor.

  • You could see the name of the sketch is ml5js boilerplate.

  • Thank you, Joey Lee who's a contributor to ml5.

  • He's done a ton of work on the website

  • and all of the different features.

  • And oh, this should actually be 3.1.

  • I'm gonna fix that, uh-huh.

  • I'm gonna hit save, and then I'm gonna rename it

  • to sound classifier.

  • And I am going to then go over here

  • and go to sketch.js,

  • and I'm then I'm gonna run this,

  • and we should see.

  • There we go.

  • So now we know it's working

  • because there's a little console log

  • to log ml5.version.

  • If I hadn't imported the ml5 library,

  • I wouldn't see that, and we see that here.

  • So, what are we gonna do?

  • Let's load the sound classifier.

  • Now, most of the models, I haven't been using this

  • in my previous videos,

  • most of the models in ml5 are now actually available to you

  • in preload, meaning you don't need a callback function.

  • You can just load the model in preload,

  • and it'll be ready by the time you get to setup.

  • So I'm gonna make a variable called soundClassifier.

  • In preload, I'm gonna say soundClassifier

  • equals ml5.soundClassifier.

  • Now, I need to tell it

  • what model I want to load.

  • So I need to, in here, put the name

  • of the model I wanna load,

  • and in theory, in the future,

  • there might be a bunch of different options,

  • different kinds of sound classifiers

  • or maybe a sound classifier you've trained yourself

  • that you wanna put in there,

  • and I'll come back eventually,

  • show you videos about how to do that.

  • But for right now, I'm just gonna say

  • SpeechCommands,

  • and then I already forgot what it was called.

  • So I'm gonna go back to the ml5 website, which is here.

  • I'm gonna go to reference.

  • I'm gonna go to soundClassifier,

  • and I'm looking for it here.

  • So it's SpeechCommands18w.

  • This is a particular model

  • that's been trained on 18 specific words,

  • and you can see what those are.

  • The 10 digits from zero to nine,

  • up, down, left, right, go, stop, yes, no, that's 18.

  • 10 digits, eight different words.

  • All right, so now I'm gonna go,

  • so it was 18w,

  • and then, once that model is loaded,

  • I need a callback.

  • So I could just say soundClassifier.Classify,

  • and I might just call it gotResults.

  • So in other words, I'm.

  • Oh, it's not defined, right?

  • So I'm telling the sound classifier to classify.

  • Now, by default, it's just going to listen

  • to the microphone's audio.

  • Maybe in the future, part of ml5 will be able to offer

  • hooks to how you can, to connect it

  • to a different audio source,

  • but it's basically just gonna work

  • with the microphone's audio.

  • Then I can write a function called gotResults,

  • and I'm gonna get rid of the draw loop

  • 'cause I don't need that right now.

  • Lemme just turn off auto refresh

  • so that it doesn't keep refreshing.

  • And then now, if you remember,

  • ml5 employs error first callbacks,

  • meaning the callback function requires two arguments,

  • an error argument in case something went wrong,

  • and a data or results or some other argument

  • where the actual stuff is.

  • So I'm gonna say error,

  • and then I'm gonna say results.

  • And then I could do a little like basic error handling.

  • I'm just gonna say console.log

  • something went wrong,

  • and then I can also actually log the error, all right.

  • And then, so now,

  • and then I'm gonna say console.log(results).

  • So let's see if we get anything.

  • Oh, I have to run it again.

  • And you could ignore this error.

  • Oh, (gasping) something came in!

  • Ready?

  • Up.

  • I just wanna stop and mention

  • that if you're following this along,

  • hopefully your browser is asking for permission

  • to use the microphone.

  • The reason why that didn't happen here in this video

  • is because I've already set my browser

  • to allow use of the microphone on the p5 Web Editor pages,

  • but for security, you can't just access anybody's microphone

  • from a webpage without the user giving permission.

  • So hopefully you saw that happen,

  • and if you you didn't,

  • that might be why you run into an error

  • if you haven't given that permission.

  • This is getting a little hard to debug

  • just because so much stuff is happening here

  • on the console and this huge arrays,

  • but there's actually something that I missed

  • that I could add here, which is an options variable.

  • So one of the things I could tell,

  • there's a lot of things I can set as properties

  • or parameters for how the sound classifier should work,

  • but there's a very simple one,

  • which I'm gonna just look it up in the documentation

  • 'cause I don't remember.

  • It's called the probabilityThreshold.

  • I'm actually just gonna copy-paste this here.

  • What this means is basically

  • the sound classifier is going to trigger an event.

  • Right now I'm console logging all of this information

  • about what it thinks it heard

  • based on a confidence level for how sure it is

  • it heard one of those keywords.

  • And right now, a lot of those events are triggering

  • because I don't know

  • what the default probability threshold is.

  • Maybe it was .7.

  • Maybe it's .5, but I'm gonna make that really high.

  • I'm gonna say .95.

  • So it has to have a 95%,

  • the machine learning model has to calculate a 95%

  • confidence score before it

  • gives the event back to me in ml5.

  • Once I've created that options variable with .95,

  • I need to pass it into the constructor

  • as the second argument.

  • So now we pass it in there.

  • I'm gonna run the sketch.

  • I'm gonna say the keyword up,

  • and then I'm gonna try to look into the console

  • to see if that's what came in.

  • Up.

  • And there we go.

  • Look at that!

  • Now other stuff is coming in, but you saw it there!

  • So rather than kind of debug with the console,

  • let me actually put what I said

  • onto the webpage itself.

  • Also, to make this easier to see,

  • let me actually console.log(results index zero label

  • and results index zero,

  • I believe it's called confidence.

  • So rather than have this big array logging in the console,

  • let me do this.

  • All right, we need to have a 95% confidence,

  • and I'm gonna run this.

  • Up.

  • Three, four, five,

  • six, seven, eight.

  • I'm quickly adding background color white

  • to the HTML body

  • because what I wanna then do, just quickly,

  • before I finish this off, but to finish this off,

  • let me just add a DOM element using the p5 DOM library.

  • I'm gonna just say resultP for results paragraph.

  • I'm gonna say resultP equals createP

  • waiting, and then, right now, I'm gonna say

  • resultP.html.

  • Then I could turn these results into a string

  • by using a string literal.

  • So back tick and then put curly brackets.

  • Put a colon here and curly brackets

  • and a closed back tick, okay.

  • And let me also say resultP.style,

  • is it font size, font-size,

  • just 32 point so we'll be able to see it.

  • All right, here we go.

  • Ready for this?

  • One, two, five,

  • up, down, left, right.

  • Okay, so (clapping),

  • you could imagine now what you could do with this.

  • For example, you could control a game with your voice.

  • And in fact, I'm gonna do that

  • in one of my coding challenge videos.

  • So take a look in this video's description.

  • I'm gonna do a coding challenge where I program

  • the Google Dinosaur game,

  • and then I'm gonna add this sound classifier

  • to have the dinosaur jump

  • except it won't be a dinosaur.

  • It'll be a unicorn,

  • to have the unicorn jump when I say the keyword, up.

  • All right, thanks for watching

  • this additional ml5 tutorial video

  • about sound classification in the browser.

  • (bell dinging)

  • (energetic dance music)

  • (bell dinging)

(bell dinging)

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it