Placeholder Image

Subtitles section Play video

  • Hey, John-Green-bot.

  • I've been thinking really hard about a HUGE life decision.

  • I want to adopt a pet, and I've narrowed it down to either a cat or a dog.

  • But there are so many great cats and dogs on adoption websites.

  • John Green Bot: The Grey Parrot (Psittacus erithacus) has an average lifespan in captivity

  • of 40 to 60 years.

  • Jabril: Yeah, birds are great and all but I was thinking maybe a cat or a dog.

  • John Green Bot: Turtles will need a tank approximately 7.5 to 15 times their shell length in centimeters.

  • Jabril: Yeah, you're no help.

  • Come on Spot and Mr. Cuddles.

  • It looks like I'm going to have to figure this out myself, and by myself I mean

  • make an AI figure it out.

  • Today we're going to train an AI to go through the list of pets and make the best decision

  • for me based on data!

  • That'll make things less stressfulsurely, nothing will go wrong with thisright?

  • INTRO

  • Hey, I'm Jabril and welcome to Crash Course AI.

  • Today we're going to build a fairly simple AI program to find out if adopting a cat or

  • a dog will make me happier.

  • This is a pretty subjective question, and if I use data from the internet, I'll have

  • a lot of strong opinions.

  • So, I'll conduct my own survey where I collect data about people's cats and dogs and their

  • happiness.

  • I don't care what pet I get, as long as it makes me happy, so I won't even include cat

  • and dog labels in the model.

  • Like in previous labs, I'll be writing all of my code using a language called Python

  • in a tool called Google Colaboratory.

  • And as you watch this video, you can follow along with the code in your browser from the

  • link we put in the description.

  • In these Colaboratory files, there's some regular text explaining what I'm trying

  • to do, and pieces of code that you can run by pushing the play button.

  • These pieces of code build on each other, so keep in mind that you have to run them

  • in order from top to bottom, otherwise you might get an error.

  • To actually run the code or make changes to it, you'll have to either clickopen

  • in playgroundat the top of the page or open the File menu and clickSave a Copy

  • to Drive”.

  • And one last time, I'll give you this fyi: you'll need a Google account for this.

  • Creating this AI to help me decide between a cat and a dog should be pretty simple, so

  • there are only a couple of steps: First, I have to gather the data.

  • I have to decide on a few features that could predict if a cat or dog makes people happy.

  • Then, I'll make a survey that asks about these features, and go out in the world and

  • ask people if their pet fits these features and makes them happy.

  • It might be a little biased or imperfect, but I think it'll be juuust finnne to help

  • me make my decision.

  • Second, I have to build an AI model to predict if a specific pet makes people happy.

  • Because I'm not collecting a massive amount of data, it's helpful to use a small model

  • to prevent overfitting.

  • So I'll plan on using a neural network with just one hidden layer.

  • And for our final step, I can go through an adoption website of adorable cats and dogs,

  • put in their features, and let the AI decide which pet will make me happy.

  • No more stressing about this tough decision, the machines have my back!

  • Step 1.

  • Instead of importing a dataset this time, we've got to create our own!

  • So browsing through some adoption websites, the most common features I saw represented,

  • that are important to me are cuddly, soft, quiet (especially when I'm trying to sleep),

  • and energetic (because playing with an energetic pet might remind me to get up from my computer

  • a little more).

  • In the AI I'm programming, I'll use these four values to predict their answer todoes

  • your pet make you happy most of the time: yes or no?”

  • For the data collection part of this process, I gave this five-question survey of yes/no

  • questions to 30 people who own one cat or one dog.

  • I want to avoid bias based on the kind of pet, so I put everyone's answers into one

  • big list.

  • Every row is one person's response, and yes's are represented as 1 and no's as 0.

  • By representing the answers as numbers, I can use them directly as features in my model.

  • The first four questions are my input features and the last question about happiness is my

  • label.

  • And I'm not using cat or dog labels anywhere in my model.

  • I also have to split this dataset into the training set and the testing set.

  • The training set is used to train the neural network, and the testing set is kept hidden

  • from the neural network during training, so I can use it to check the network's accuracy later.

  • Step 2.

  • Now that I have a dataset, I need to build a neural network to help make predictions.

  • And if you did episode 5's Neural Network Lab (when I digitized John-Green-bot's handwriting),

  • this step will sound familiar because I'm using the same tools.

  • I'm going to use a multi-layer perceptron neural network or MLP.

  • As a refresher, this neural network has an input layer for features, some number of hidden

  • layers to learn representations, and a final output layer to make a prediction.

  • The hidden layers find relationships between the features that help it make accurate predictions.

  • Like in the Neural Networks Lab, we're going to import a library called SKLearn (which

  • is short for Sci Kit Learn).

  • SKLearn includes a bunch of different machine learning algorithms, but I'll just be using

  • its Multi-Layer Perceptron algorithm.

  • You can easily change the number of hidden layers and other parts of the model, but I'll

  • start with something simple: four input features, one hidden layer, and two outputs.

  • We'll set our hidden layer to four neurons, the same size as our input.

  • SKLearn will actually take care of counting the size of my input and output automatically,

  • so I only have to specify the size of the hidden layer.

  • Over the span of one epoch of training this neural network, the hidden layer will pick

  • up on patterns in the input features, and pass a prediction to one of two output neurons:

  • yes, happiness OR no, unhappiness.

  • The code in our Collab notebook calls this aniterationbecause an iteration and

  • an epoch are the same thing in the algorithm we're using.

  • As the model loops through the data, it predicts happiness based on the features, compares

  • its guess to the actual survey results, and updates its weights and biases to give a better

  • prediction in the future.

  • And over multiple epochs of the same training dataset, the neural network's predictions

  • should keep getting better!

  • We'll just go with 1000 epochs for now.

  • Now, I can test my AI on my original training data to see how well it captured that information,

  • and on the testing data I set aside.

  • The output here lets us know how good our neural network is at guessing if these pet

  • features predict owner happiness.

  • And it looks like our model got 100% correct on the testing data and 85% correct on the

  • training data!

  • Well guys, thanks for tuning in, but I think this project is almost over!

  • Everything was easy to do, performance looks great.

  • I'll just put in some pet features and let it help me with this big life decision!

  • Man, AI really is awesome.

  • Step 3.

  • Let's see... here's a pet I could adopt.

  • The description says it's cuddly, soft, quiet at night, and isn't that energetic.

  • Let's put in those features and see what the model says.

  • What?

  • Why not?

  • It seemed nice

  • But I guess that's why I programmed an AI, so I wouldn't be swayed by my FLAWED human

  • judgment!

  • Let's move on to the next one.

  • Let's see, this pet isn't cuddly, isn't soft, isn't quiet, and is really energetic

  • but let's see what my AI says.

  • Yes?!

  • I'm not so sure that pet would've made me happy, but my AI model had 100% accuracy

  • on the testing set!

  • I think I'm gonna test a few more...

  • Ok, so I've tested a bunch of animals and something weird is happening.

  • The AI rarely told me that adopting a cat would make me happy, but it almost always

  • said a dog would make me happy.

  • Maybe everyone I surveyed hates their cats?

  • But, that seems unlikely.

  • Besides, I never even told my AI what a cat is!

  • I combined all the surveys into one big dataset withoutcatordoglabels!

  • And I only taught the model about if a pet is soft, cuddly, quiet, or energetic.

  • Both cats and dogs can have all of those traits, right?

  • Is there a war between cats and AIs that I don't know about, and THAT'S why it's biased?

  • Hey John-Green-bot….

  • Do you guys hate cats?!

  • John-Green-bot: No, Jabril.

  • We love hairy babies...

  • Jabril: Ugh, I don't understand!!!!

  • So, obviously, AI doesn't have a grudge against cats.

  • I collected the survey data and I built the AI, so if something went wrong and introduced

  • an anti-cat biasit's on me, and I can figure out what it is.

  • So I should go back to analyze the data and my model design.

  • First, I'll look for patterns and correlations in my data by hand and make sure there's

  • nothing fishy going on.

  • This means a new step!

  • Step 4.

  • What's weird is that the model's predictions don't seem to make sense to me despite the

  • high performance.

  • Specifically, I'm noticing a bias towards dogs.

  • So there might be something strange about the data.

  • Earlier, I decided to just pool all the survey results together, but now I'll split them apart.

  • Now I can create plots that compare the percentage of dog owners I surveyed who are happy, the

  • percentage of cat owners who are happy, and the percentage of all the people who are happy

  • with their pet (no matter what kind).

  • To do this, I just need to compute the number of happy dog owners divided by the total number

  • of dog owners, the same for cat owners, and the same for everyone I surveyed.

  • Interesting.

  • According to my survey results, cats make people really happy.

  • But when I put in the features for a cat, my AI usually says it won't make the owner

  • happy.

  • How can I have such good accuracy at predicting happiness and always be wrong about cats?!

  • I still don't have answers about why the data is skewed towards dogsso I guess

  • I should look at who even filled out my survey?

  • Let's make a plot that compares the total number of dog owners and the total number

  • of cat owners in my dataset.

  • Yikes!

  • Why are there so few cat responses in here?!

  • I guess when I surveyed random people to make my dataset bigger, I was at a park, and

  • that's where I might have accidentally biased my data collection.

  • A lot of people who responded to my survey in the park must have been dog owners.

  • So the first mistake I made is that my data doesn't actually have the same distributions

  • as the real world.

  • Instead of collecting the true frequencies of each feature from a large random group

  • of pet owners, I sampled from a dog-biased set.

  • That's definitely something that should be fixedbut it still doesn't answer

  • why the model seems so biased against cats.

  • Both cats and dogs can be energetic, cuddly, quiet, and soft, or not.

  • That's why I chose those features, they seemed like they'd be common for both pets.

  • But we can test this.

  • I'll make a plot where I divide the number of times each feature is true for each animal

  • by the total number of survey responses I have for each animal.

  • It looks like there are lots of different types of dogs in my dataset.

  • Some are energetic and some are cuddly, but none of the cats are energetic.

  • So this is a correlated feature, which is a feature that is (unintentionally) correlated

  • to a specific prediction or hidden category.

  • In this case, knowing if something is energetic is a cheat for knowing it's a dog even though

  • I didn't tell the model about dogs.

  • My model might have then learned that if a pet is energetic, it makes owners happy, just

  • because there was no data to tell it otherwise.

  • We can see this correlation if we plot pet energy vs owner happiness.

  • In my data, if a pet is energetic, a person is likely to be happy with it... no matter

  • what other features are true.

  • But if the pet isn't energetic, it's a mixed bag of happiness.

  • This is my second mistake: the data had a correlated feature, so my AI found patterns

  • that I didn't want.

  • To fix the first mistake, I need to collect new data and make sure I balance the number

  • of cat owners and dog owners.

  • So I'll go to the park, the pet store, the grocery store... you get the idea.

  • And I'll keep track if I end up with too much of one pet or the other.

  • To fix the second mistake, I should make sure the features are actually the most important

  • things I care about when it comes to happiness.

  • Honestly, I don't NEED my pet to be energetic.

  • So I could just cut it out of my dataset, and not worry about it becoming a correlated

  • feature as I train my AI.

  • Although, I will be more careful and make sure the other three features don't get

  • biased either.

  • It's important to note that every problem isn't this easy.

  • For some AI, we can't just remove features that don't have a clear meaning, or we might

  • need to keep features because they're the only measurable values.

  • In either case it's usually EXTRA important to have a human checking the results and ask

  • a few important questions to avoid bias:

  • Does the data match my goals?

  • Does the AI have the right features?

  • And am I really optimizing the right thing?

  • And these questions aren't that easy to answer

  • So far in our labs we've demonstrated the amazing abilities that AI can grant you, but

  • as you can see, it's important to be cautious.

  • As far as my dog-or-cat decision goes

  • I'm going to have to do more work on this algorithm.

  • And collect a lot more survey data.

  • So I guess the main takeaway for this episode (and the last of our labs) is that when building

  • AI systems, there aren't always straightforward and foolproof solutions.

  • You have to iterate on your designs and account for biases whenever possible.

  • So, our next and final episode for Crash Course AI is all about the future and our role in

  • shaping where AI is headed.

  • I'll see ya then.

  • Crash Course AI is produced in association with PBS Digital Studios!

  • If you want to help keep all Crash Course free for everybody, forever, you can join

  • our community on Patreon.

  • And if you want to learn more about research methods to build good surveys and datasets,

  • check out this episode of Crash Course Sociology.

Hey, John-Green-bot.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it