Placeholder Image

Subtitles section Play video

  • [TRAIN WHISTLE]

  • Hello, and welcome to another Teachable Machine

  • video tutorial.

  • This time, I'm going to look at sound classification.

  • So I have to say goodbye to my friend the unicorn

  • but hello, again, to the ukulele, hello to the bell,

  • and the train whistle.

  • I am going to look at how I can use

  • Teachable Machine to train a model

  • to recognize different sounds.

  • In the previous video, I looked at image classification.

  • So you can see here, this example

  • is recognizing the guitar.

  • And now I want to make this exact same example but have

  • it show the guitar when I play the ukulele,

  • have it show the train whistle when I play the train whistle.

  • To do this, instead of starting with an image project,

  • I'm going to start with an audio project.

  • Now, you might remember, in my previous video

  • where I made an image classifier, I talked

  • about the process of transfer learning,

  • how Teachable Machine starts with a base model,

  • in this case, MobileNet, which knows how to classify images

  • into 1,000 categories, removes all those categories,

  • replaces them with those of our own--

  • unicorns, train whistles, rainbows, and a ukulele--

  • and then retrains the model with new images.

  • The sound classifier is going to do exactly the same thing.

  • This time, however, the base model

  • is something called speech commands 18w.

  • The speech commands model is pre-trained to recognize 18

  • words in English that a person might say so the digits 0

  • through 9--

  • that's 10-- up, down, left, right-- up to 14--

  • stop, go, yes, no-- that's 18.

  • It also has a category for unknown word and background

  • noise, so really there's 20 but 18w for those 18 words.

  • So when I make an audio project, I'm going to use that model,

  • remove all of those labels, and put in my own

  • with my own training sounds instead of images this time.

  • And my labels will be train whistle, bell, and ukulele.

  • So let's get started and make an audio project.

  • The very first thing you need to do

  • when making an audio classifier in Teachable Machine

  • is record some samples or audio examples of background noise.

  • The model-- we'll need this during the training

  • process to have something to compare the other classes to.

  • So let's do that.

  • I'm going to attempt to be very quiet, which

  • is quite hard for me, and record background samples.

  • Add Samples, Use Mic, and then I'm

  • going to record a 20-second sample

  • and attempt to meditate during that time.

  • [MUSIC PLAYING]

  • All right.

  • I've got the 20 seconds of background noise.

  • Now the model as its input doesn't

  • take 20 seconds of audio.

  • Any given example presented to the model

  • during the training process has to be 1 second of audio.

  • But I just recorded 20 seconds, so Teachable Machine

  • required one additional step to prepare that training data.

  • And that is through this button, Extract Sample.

  • So if I click the Extract Sample button,

  • it's going to take that 20 seconds of audio

  • and convert it into 20 samples.

  • Let's just record 20 more seconds of background audio

  • just so I have 40 samples because I

  • want to have a little bit more than the minimum required.

  • [MUSIC PLAYING]

  • Now I can start adding some of my own audio.

  • Let's begin with the train whistle.

  • So I'm going to change class 2 to train.

  • I'm going to click Use Mic.

  • Now you might notice that my browser is not

  • asking me for any permission to use the mic.

  • Most likely, you're going to see that.

  • I just have it set already to allow that.

  • So Use Mic and record two seconds of samples.

  • [TRAIN WHISTLE]

  • Once again, I need to extract those, and I have two samples.

  • Now I need at least eight minimum,

  • so let me add a whole bunch.

  • And I'll speed through this for you.

  • [TRAIN WHISTLE]

  • And I've got 16 audio samples of train whistle sounds.

  • Now we move on and try the bell.

  • Add a class, bell, and Add Samples.

  • [BELL RINGING]

  • Two samples.

  • Here's the thing.

  • I know that I'm going to need--

  • maybe I want to have 16 bell samples.

  • You know what?

  • I'm going to just record for 16 seconds.

  • [BELL RINGING]

  • In some cases, I might want to consider when I'm actually

  • hitting the bell in relationship to how it's chopping it

  • up into 1-second increments.

  • But let's just see if it works even

  • without me being thoughtful about this.

  • I'm just ringing this bell and recording 16 seconds of it

  • and letting Teachable Machine do its extraction however

  • it's going to do it.

  • And there we go, 18 samples.

  • I'm well above the minimum.

  • I can move on to the ukulele.

  • Another feature under the Settings is a delay.

  • And I want to give myself two seconds from when

  • I hit that Record button to when it starts recording,

  • so it can give me a minute to get set up with the ukulele.

  • Save Settings, record 16 seconds, 2, 1.

  • [UKULELE PLAYING]

  • Now I can extract those, and I've

  • got background noise, train whistle, the bell,

  • and the ukulele.

  • And I'm ready to train the model.

  • Before I start the training process,

  • let me address something.

  • What are these images?

  • And wait a second.

  • Is this actually an image classifier?

  • Because this kind of looks like what we had before.

  • Only the images aren't things from the camera.

  • They are these pictures that somehow

  • appear based on the sound.

  • And what they are-- in a way, this is kind of true--

  • because what these are visualizations

  • of the audio signal, specifically the spectrogram.

  • What are the various amplitudes of the different frequencies

  • of the sound?

  • Is it a very high pitch sound?

  • Is it a very low pitch sound?

  • So that's the actual data.

  • That spectrogram of 1 second of audio

  • is what is being sent into the machine learning model itself.

  • Let's train the model.

  • (SINGING) Don't switch the tabs.

  • Don't switch the tabs.

  • [INAUDIBLE]

  • [BELL RINGS]

  • Bell.

  • [TRAIN WHISTLE]

  • Train whistle.

  • It works.

  • And now we can take this model and follow the same steps we

  • did with the image classifier.

  • Step 1, export the model.

  • I want to upload it.

  • I can copy that URL.

  • Click.

  • Switch over to my p5.js sketch.

  • In my code example from the previous video,

  • which was trained to recognize a train whistle or a rainbow

  • image--

  • train, rainbow, train, rainbow--

  • and I can switch it to instead of having an image

  • classifier, a sound classifier.

  • I can change the model URL to my new model URL.

  • I don't need the video anymore, so I can delete that.

  • I'm going to change this to Classify Audio.

  • Unlike with the image classifier,

  • the audio classifier doesn't need

  • you to specifically say which sound you want to link it to.

  • It's going to default to the microphone.

  • So I can remove this video here.

  • Keep this as gotResults.

  • There's no video to draw.

  • The categories, instead of train, rainbow, unicorn,

  • and ukulele here are, well, train, then I've got bell,

  • no unicorn-- it's so sad--

  • and ukulele.

  • And in the audio case as well, something that's different

  • is, instead of having to explicitly say now go ahead

  • and classify the video again, the audio engine is going

  • to just continue listening.

  • So I can get rid of this classifyVideo function.

  • And I can run this sketch.

  • A train already?

  • Wait, wait, wait, wait, wait.

  • So I don't want--

  • I want to not make the same mistake I

  • made in the first video.

  • Let's consider what it should display

  • if it doesn't hear anything.

  • I'll just use headphones, so let me put some headphones in.

  • Then, I'm going to actually say if the label is train,

  • put in the train emoji.

  • Now I'm going to start the scratch,

  • and I'm going to attempt to be very quiet while I do so.

  • [BELL RINGS]

  • [UKULELE PLAYING]

  • Ukulele.

  • [TRAIN WHISTLE]

  • [BELL RINGS]

  • [UKULELE PLAYING]

  • That works.

  • Oh, that's so exciting.

  • Interestingly, I wonder if I'm talking, what it thinks it is.

  • [BELL RINGS]

  • Hello.

  • This is me talking.

  • I probably was saying things while I was recording

  • those ukulele sounds, or it just matches the most closely

  • because me talking is not background noise.

  • So I could have put another category of just me talking

  • or specific words, oh, so many possibilities.

  • Would it even work to train the model on different chords

  • of the ukulele?

  • My suspicion is that's not going to work particularly

  • well because the quality of the sound of those chords,

  • particularly if they share some of the notes,

  • some of the frequencies is going to be quite similar.

  • And the base model was trained on human speech.

  • But let's give it a try, and maybe we

  • can control the snake game with different ukulele chords.

  • So I actually just went and trained a model

  • with this idea with four chords--

  • C, G7, F, and A. And you could see it mostly works,

  • or it kind of works.

  • [UKULELE PLAYING]

  • Give me a C. But it's really not getting

  • as clearly distinct high confidence

  • scores for what I want.

  • Maybe if I try individual notes, it'll work better.

  • The first note on the ukulele is A, the A string.

  • [UKULELE PLAYING]

  • E.

  • [UKULELE PLAYING]

  • C.

  • [UKULELE PLAYING]

  • And finally G.

  • [UKULELE PLAYING]

  • The images of the spectrogram look

  • kind of distinct and different to me,

  • so I view that as a good sign.

  • Let's try training the model.

  • And let's see how it performs.

  • A. Yeah.

  • We got a big bump there in the A confidence score.

  • That's good.

  • Let's try G. Big bump there in the G confidence score.

  • Let's try C, E. So you can see this isn't perfect.

  • These sounds are maybe not as distinct to this model that's

  • based on how the pre-trained model what kinds of audio

  • it was trained on, but I'm getting something there.

  • Let's see how well I can control that snake game

  • with just these four notes.

  • If you remember from before--

  • left, right, up, down-- still working.

  • Go this way, left, left.

  • Get that food.

  • Let's try changing it to use the audio classifier.

  • I forgot to export the model.

  • Export, Upload, Copy.

  • [MUSIC PLAYING]

  • Now you need to decide which notes go with which movement.

  • A is left.

  • E is right.

  • C is down.

  • G is up.

  • I made all the same exact changes just

  • to convert this from classifying a video to classifying

  • audio from the mic.

  • Let's see what happens.

  • For whatever reason, I typed audioClassifier

  • when it's actually soundClassifier.

  • And you can actually look at the ml5 website

  • to find all the documentation for the soundClassifier.

  • [UKULELE PLAYING]

  • I was so close.

  • I think I have a way of making this work better for my brain.

  • Up is A. Down is g.

  • So those are the outer strings.

  • Then right and left are the inner strings.

  • Up, up, right, left, down.

  • Here we go.

  • [UKULELE PLAYING]

  • Yay.

  • I got the food.

  • So that perhaps wasn't the best solution.

  • Maybe something I could have done

  • was work with those confidence scores in a more intentional

  • way to make sure that some classifications that I got back

  • of the audio that weren't as confident

  • didn't disrupt the correct direction of the snake

  • that I had gotten in the first place.

  • I'm trying this again one more time off of my own speech

  • because I want to see if I can really get

  • finer control over that snake.

  • So right now, I've collected some data of me saying up,

  • down, and a meow sound.

  • And what I've been doing is I've been

  • trying to time the word-- the time my saying of up and down

  • and meow with 1-second intervals.

  • Let me show you what that looks like.

  • Up, up, down, down, meow, meow, meow, meow, meow.

  • Now I'm going to go and add whistling.

  • [WHISTLING]

  • Time to train this model.

  • Up, down, meow.

  • [WHISTLING]

  • So we can see how this is much more accurate than my attempts

  • with a ukulele chord and notes.

  • Export, Upload, Copy, Paste, change the labels--

  • up, down, right will be meow, and left will be whistle.

  • Meow.

  • [WHISTLING]

  • Down.

  • Meow.

  • [WHISTLING]

  • Down.

  • Meow.

  • Oh, yes.

  • I got the food, and then I died.

  • But I don't care.

  • It worked.

  • But I think you'll remember, from this example,

  • it works quite well.

  • And hopefully, this opens up a lot of possibilities for you.

  • If you made something, I've got a link

  • in this video subscription to a page at the codingtrain.com

  • where you could submit a URL to a product you've made.

  • And I would love to check it out,

  • share it on a future Livestream.

  • I can't wait to see what kind of stuff people

  • make from my strange projects.

  • And I look forward to seeing you in a future Coding Train video.

  • Goodbye.

  • [TRAIN WHISTLE]

  • That was the first time that something

  • happened when I blew the train whistle

  • at the end of the video.

  • That is great.

  • [BELL RINGS]

  • [MUSIC PLAYING]

[TRAIN WHISTLE]

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it