Placeholder Image

Subtitles section Play video

  • [MUSIC PLAYING]

  • DAVID MALAN: All right.

  • This is CS50, and this is lecture three.

  • And today, the goal is to be a lot more algorithmic than code-oriented,

  • and to really kind of come back to where we started in lecture zero,

  • when we talked about computational thinking, and algorithms.

  • Because now you have a few weeks under your belt,

  • and you have a few new tools, and hopefully skills, under your belt,

  • even though odds are, you're still getting

  • comfortable with some of those skills.

  • And so let's look, then, at this piece of hardware

  • that we talked about a couple times.

  • And this is an example of what?

  • AUDIENCE: Memory.

  • DAVID MALAN: Yeah, memory, or RAM, random access memory.

  • And this happens to be pretty big on the screen,

  • but it's actually pretty small, if you were

  • to find it in your Mac, or PC, in your laptop, or some other device.

  • And this is just memory.

  • And so, memory is places you can store information-- numbers and characters

  • and strings, and bigger things still.

  • And recall that if you want to use this memory, you need to address it.

  • You need to put something in particular locations.

  • And we're going to start doing this all the more So in particular, there's

  • at least, like, five--

  • four black chips on here, which is actually where the zeros and ones are

  • stored, in the green circuit board.

  • And little gold connectors you see on the screen,

  • too-- that just allows all of those black chips

  • to intercommunicate, ultimately, and for the zeros and ones to move around.

  • But if we zoom in on just one of these chips of memory,

  • arbitrarily, and then zoom in here, you can think of this, wonderfully enough,

  • as a rectangle.

  • Could be any shape, certainly, but if we think about it as a rectangle,

  • we can divide this chip of memory up into a grid, like this.

  • Completely arbitrary how I've drawn this, but the fact, now,

  • that I have a whole bunch of small boxes--

  • you can start to think of this as being one byte, and then

  • another byte, and then another byte.

  • And I coincidentally wrote eight bytes across, but it could be any number.

  • It doesn't really matter.

  • And so, you can start to think about these bytes

  • as having specific locations, or numbers, or addresses,

  • as we're going to start calling them.

  • So if I zoom in on the top of this, you could

  • start to think of the top left as being byte number 0,

  • because computer scientists generally start

  • counting at zero, and then byte 1, and 2, and 3, and 4, and dot dot dot.

  • And then, just as a refresher, in a typical Mac or PC these days,

  • how much RAM, or memory, would you typically have?

  • AUDIENCE: Four?

  • DAVID MALAN: Four what?

  • AUDIENCE: Four gigabytes.

  • DAVID MALAN: Four gigabytes.

  • And giga means billion, so that's four billion bytes.

  • And maybe you have a little less, maybe you have a little more,

  • but on the order of billions of bytes.

  • So we're only scratching the surface of just how much storage there is,

  • and how many numbers and characters you can store.

  • But the point is that we can address them, in some way-- zero, one, two,

  • and so forth.

  • So this is how we stored Stelios' name in memory.

  • Recall, this was an example last time, where

  • we needed to put the characters of his name somewhere.

  • And so we actually put the S, and then the T,

  • and then so forth, starting at the leftmost location on forward.

  • But we also put something else.

  • It's not sufficient just to put the letters of his name in memory.

  • What else did we have to do?

  • AUDIENCE: Put a zero in there?

  • DAVID MALAN: Yeah, the zero command, or more specifically, the zero byte,

  • or the so-called null byte--

  • N-U-L. And it's represented as backslash 0.

  • Because if you were just to type a normal zero,

  • that would technically be a character from your keyboard.

  • It looks like a number, but it's technically a character, a char.

  • And so backslash zero literally means 8-0 bits in that byte's location.

  • OK.

  • So now that we have this ability to represent Stelios

  • with a sequence of characters, null-terminated at the end,

  • we don't really need to worry about the hardware anymore.

  • Right?

  • We can abstract away from that just as we did in week zero,

  • and not worry about how those zeros and ones are stored.

  • We can just trust that they are, somehow.

  • And so we can abstract away, and just start

  • thinking about it as a sequence, or a grid, of letters like this.

  • But that backslash zero is important, because it

  • is the only way that this language we're currently using,

  • C, knows where strings end.

  • If you didn't have that, it would-- printf, for instance,

  • might just keep printing all of the contents, all four gigabytes, of memory

  • that your computer has, no matter what those characters actually are.

  • And then, of course, you couldn't distinguish Stelios' name from Maria,

  • or someone else altogether in memory.

  • So, let me go ahead and open up the IDE, just to demonstrate now what

  • you can do with this kind of knowledge.

  • Suppose I wanted to write a program that allows

  • me to let a user type in his or her name,

  • and then I just want to print out his or her initials.

  • And for simplicity, I'm going to assume that the person's initials,

  • or whatever the capitalized letters are, in the input they type.

  • So, I have to be a little nit-picky, and type in my name properly.

  • But with that said, let me go ahead and whip this up.

  • So, I'm going to save this as initials.c.

  • Just out of habit, I'm going to start by including the CS50 library,

  • because I'm going to want to get a string from the user.

  • I'm going to go ahead and include standard io.h, so that I

  • can print something to the screen.

  • And we'll decide later if we need anything more.

  • I don't need any command line arguments for the purpose of this program,

  • so I'm going to go back to our old version of main, where you just

  • specify void-- no argc, no argv.

  • And then here, let me go ahead and declare a string called s,

  • and get a name from the user, as with, "name," and get string.

  • And then, what do I want to do after this?

  • I want to go ahead and iterate over the characters in the user's name,

  • and print them out only if they're uppercase.

  • But you know what, rather than just print them out,

  • you know what I want to do?

  • I actually want to create a new string, myself, containing the user's initials.

  • So, I don't want to go ahead and just print out, with percent

  • c, one character after the other-- rather, I

  • want to go ahead and store the user's actual initials in a new string,

  • so that I've got one string for their name,

  • and one string for their initials.

  • And, ladies and gentlemen, Ian.

  • SPEAKER 2: Sorry for the video glitches.

  • DAVID MALAN: Thanks.

  • All right.

  • So, to be ever more clear, let me go ahead and rename this

  • to name, literally, and then I want to have a string for their initials.

  • But we know what a string is, as of last time.

  • It's really just a sequence of characters.

  • And a sequence really has another name in programming.

  • What is another synonym we've used for a sequence of something?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: An array.

  • An array is that data structure we had when we

  • started using square bracket notation.

  • So, if I actually kind of roll this back and break the abstraction,

  • if you will-- don't think about it as a string.

  • What is a string?

  • It's a sequence of characters.

  • Technically, I could say char, and then I could say initials,

  • and then I can specify however many letters I

  • want to support in a human's initials.

  • And by-- assuming we have a first name, maybe a middle name, and a last name,

  • three characters should do it.

  • Three characters.

  • So, David J. Malan, DJM, three characters.

  • Is that enough chars?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: I'm hesitating, because it doesn't--

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: It's not, but why?

  • AUDIENCE: You need for the null character.

  • DAVID MALAN: Yeah.

  • So if we want to terminate even my initials-- which isn't technically

  • a word, but it's certainly a string, it's

  • a sequence of characters-- we need a fourth character,

  • or we need to anticipate a fourth character,

  • so that whatever we put in the computer's memory is technically,

  • in my case-- like, D-J-M backslash 0.

  • You need that fourth byte.

  • Otherwise you do not have room to actually terminate the string.

  • So, now, even though this doesn't look like a string,

  • insofar as I'm not saying the word string, it really is.

  • It's a sequence of characters of size four that I can put characters in.

  • Now, what are the characters in this array, by default, have we said?

  • When you just declare a variable of some size, what values go in it?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: Sometimes zeros, but generally,

  • what's the better rule of thumb?

  • AUDIENCE: You don't know.

  • DAVID MALAN: Yeah, you don't know.

  • It's so-called garbage values.

  • Nothing-- you should not trust the value of a variable, generally

  • speaking, unless you yourself have put the value there, as by storing it,

  • with the assignment operator, or by manually typing it in yourself.

  • So, just to be clear, if I wanted this program

  • to be kind of useless for anyone except myself, I could actually do this--

  • I could go ahead and do--

  • initials, bracket 0, get "d", initials, bracket 1, get "j",

  • and then finally initials, bracket 2, get "m".

  • And then lastly, and this is the thing you might forget sometimes,

  • you actually need to do the backslash zero there.

  • But of course, this is not at all dynamic.

  • But I have, in this these lines of code now,

  • created a new string called initials.

  • It's of length-- it's of human length three, DJM,

  • but the computer is actually using 4 bytes to store it.

  • But this is not the point of the exercise,

  • because we already asked the user for his or her name.

  • I need to now figure what that is.

  • So just logically, or algorithmically, if you will,

  • what process could we use, given a name like David J. Malan, or Brian Yu,

  • or anyone else's name--

  • how could we look at that input and figure out

  • what the user's initials are?

  • What's the thought process?

  • Let me go a little farther back.

  • So, David J. Malan, or any other name.

  • What's the process?

  • What do you think?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: OK, good!

  • So, iterate with a for loop over the letters in the name-- and you've

  • done that before, or in the process of doing that,

  • for something like caesar or vigenere, most likely.

  • And then you can use something like is upper,

  • or you can even do asciimath, like we did a while ago, to actually determine,

  • is it in the range of A to Z capitals on both ends?

  • So we have a couple of options.

  • So, let me try to convert that to code.

  • Let me get rid of the hard-coded name, which

  • is just kind of nonsensical, but demonstrative of how

  • we could store arbitrary characters.

  • And now let me do this.

  • For int i get 0, i is less than the string length of name, i plus plus--

  • and then I'm going to do something like this.

  • If the i-th character character in name is an uppercase letter--

  • and I'm going to use a function that you might not have used yourself,

  • but recall that it does exist--

  • is upper will return, essentially, true or false, this is an uppercase letter--

  • so, if this is uppercase, I'm going to go ahead and do what?

  • Well, the story is not quite complete.

  • It's not enough to just iterate over the names and--

  • the letters in the name--

  • we now need to decide where to put the first capitalized letter that we find.

  • It's obviously going to go in the initials array,

  • but what more do I need to add to my code to know where,

  • in this block of four characters, to put the first character, D,

  • in the case of my name?

  • Yeah.

  • AUDIENCE: [INAUDIBLE] initials i?

  • DAVID MALAN: Initials i.

  • OK, so if I do that--

  • good thought.

  • So let's do, initials, bracket i, gets name bracket

  • i-- that would seem to put the i-th character of name

  • at the -th location in initials, which at the moment is perfectly correct,

  • because i is 0.

  • And if I typed in David J. Malan, D is at the zeroth location.

  • So, we're good.

  • But there's going to be a bug here.

  • AUDIENCE: [INAUDIBLE] continue to the name, then you'll have less slots.

  • DAVID MALAN: Exactly.

  • I is going to continue marching forward, one character at a time,

  • through the entire name of the user, but you only want to index--

  • you don't want to keep doing that same amount in the initials array,

  • because again, the initials is much smaller.

  • So even as i moves through the user's name,

  • you want to take baby steps, so to speak, through the initials.

  • So, how can we solve this?

  • I can't use i, it would seem.

  • AUDIENCE: You could use a variable that's like a [INAUDIBLE]

  • DAVID MALAN: OK, great.

  • Let's do that.

  • We need another variable.

  • So, I could put this in a few different places,

  • but I'm going to go ahead and put it here for now.

  • So, int counter gets zero--

  • I'm just initializing my counter to zero--

  • and then as soon as I won't find an uppercase letter,

  • I think I want to do this?

  • Put it at whatever the value of counter is?

  • And then there's one more step.

  • What do I need to do once I, yeah, put it at the counter location?

  • AUDIENCE: Increment counter by one.

  • DAVID MALAN: Exactly.

  • Increment counter by one.

  • So, I can do this in a few ways.

  • I can do it very literally-- counter gets counter plus one, semi-colon.

  • It's a little annoying to type.

  • You could do plus equals one, which is slightly more succinct.

  • Or, the ever-prettier, plus plus, which achieves the same result.

  • Fewer characters, same exact result, in this case.

  • OK, so now, I think we have a for loop that iterates

  • over all the letters in the name.

  • If it's an uppercase letter, it stores that letter, and only that letter,

  • in the initials array, and then increments

  • counter so that the next letter is going to go

  • at the next location in the initials array.

  • So, if all that seems to be correct--

  • ultimately, I want to do this--

  • I want to go ahead and print out percent s, backslash n, initials.

  • I want to just print the user's initials.

  • But there's one key step in this blank line that I should not forget.

  • What's the very last thing I need to do?

  • Yeah.

  • AUDIENCE: You need to print a null character [INAUDIBLE]

  • put a null character at the end of the [INAUDIBLE]..

  • DAVID MALAN: Exactly.

  • I need to put a null character at the end of the array.

  • So, how do I do that?

  • Well, I have the syntax, and I think--

  • you know, I want to say, end of array--

  • but how can I do that?

  • What values should I put here?

  • Yeah.

  • DAVID MALAN: The string length name?

  • DAVID MALAN: Yeah, I could do strlen of name, well, not of name--

  • AUDIENCE: Of the initials.

  • DAVID MALAN: The initials, but now, you kind of got me in a circular argument,

  • because I'm trying to--

  • it's kind of a catch-22 now.

  • I am trying to store a backslash n at the end of the string.

  • But recall from last time, the way strlen

  • knows where the end of the string is, is by putting the backslash 0.

  • So, it's not there yet.

  • So we can't use strlen.

  • But, we already have, I think, the solution--

  • AUDIENCE: Can't you just put initials four?

  • [INAUDIBLE]

  • DAVID MALAN: OK.

  • So, we could absolutely do that, or almost that.

  • It's not quite four.

  • One tweak here, yeah?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: In back, yeah?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: OK, good.

  • So, actually-- so, counter--

  • is, yeah.

  • Spoiler, counter is going to be the answer.

  • But let me just fix this one bug here.

  • Four was the right intuition, but remember,

  • if you have four characters possible, it's 0, 1, 2, 3, is the very last one.

  • So, we needed to fix that to 3.

  • But even more general would be to put counter, because the value of counter

  • is literally the length of the string we just read into the initials.

  • And so if we want to terminate that string,

  • we already know how many characters we've counted up.

  • And in fact, it would technically be wrong to just blindly put backslash

  • zero only at the very end of these four characters.

  • Why?

  • In what situation would that be-- logic be incorrect?

  • Yeah?

  • AUDIENCE: If someone has more than three initials.

  • DAVID MALAN: If they have more than three initials,

  • we really have a problem, because we don't have space for anything

  • beyond three actual letters and a null terminator.

  • And there's another problem, potentially.

  • Yeah?

  • AUDIENCE: If they don't have a middle name?

  • DAVID MALAN: Yeah, if they don't have a middle name,

  • there's going to be, only maybe two letters, first name and last name.

  • And so, you're putting the backslash zero

  • at the end of your array, which is good.

  • But what's going to be that third value, that second to last value,

  • in the array?

  • AUDIENCE: A garbage value.

  • It's just garbage, so to speak.

  • And so it could be some funky character, you

  • might get weird symbols on the screen-- you don't know,

  • because it's just incorrect.

  • The backslash zero has to go at the very end of the string.

  • So, let me go ahead, and if I've made no syntax errors here--

  • let me go ahead now and save this, and go ahead and do make initials--

  • OK, so, hmm.

  • Implicitly declaring library function strlen with type unsigned long.

  • So, there's a lot of words, only some of which I kind of recognize immediately.

  • I could run this through help50, which should be your first instinct,

  • on your own or in office hours.

  • But let's see if we can't tease apart what the error actually is.

  • What did I forget to do?

  • Yeah.

  • AUDIENCE: Uh, [INAUDIBLE]

  • DAVID MALAN: Yeah, I needed the string library, which now I'll

  • start to get in the habit of more.

  • Anytime I'm using strlen, just try to remember to use string,

  • otherwise you'll see that error message again and again.

  • Maybe there's more errors, but I'm not sure,

  • so I'm going to go ahead and just recompile, because again, you

  • might have more errors on the screen than you actually have in reality.

  • But there is indeed one more.

  • So, similar error, but different function.

  • Implicit declaration of function is upper.

  • So that's, again, same mistake.

  • I've forgotten something.

  • Anyone recall?

  • This one's a little more--

  • less common, I would say.

  • Yeah.

  • AUDIENCE: You need the character type library.

  • DAVID MALAN: Yeah, the character type, or C type, library.

  • So again, you'd only know this by checking the [? man ?]

  • page, so to speak, or reference.cs50.net,

  • or checking your notes, or whatnot, or Google.

  • And so that's fine.

  • It happens to be in Ctype.h.

  • Now, let me go ahead and recompile.

  • Seems to work, so, initials-- let me go ahead now and type David J. Malan,

  • enter-- seems to work.

  • Let me try a corner case, like Rob Bowden, without his middle name.

  • RB.

  • And so, it seems to work.

  • This is not a rigorous testing process, but let's trust that we did

  • at least get the fundamentals right.

  • But the key here is that we broke this abstraction, so to speak,

  • of what a string is.

  • Because if you understand that, well, a string is just

  • a sequence of characters, and hey, a string must end with a backslash zero,

  • you can do that yourself.

  • And this is what you can do in C. You can put characters anywhere

  • you want in memory, you can put numbers anywhere you want in memory.

  • And this ultimately gives you a lot of power and flexibility,

  • but also, as you'll soon see, a lot of opportunities for bugs,

  • with this power.

  • All right, any questions on that particular example?

  • Yeah.

  • AUDIENCE: Can you explain the initials counter?

  • DAVID MALAN: Sure.

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: Sure.

  • Let's explain the initials counter, which is used here,

  • and is declared up here.

  • So, we essentially want to keep track of two locations.

  • If my name is sort of of this length, I'm going to start at the first

  • character, D, and with i, iterate from D-A-V-I-D, space, and so forth.

  • So, that one just kind of marches on, one step at a time.

  • The initials array, meanwhile, is another chunk of memory.

  • It's of size four, somewhere else in that green silicon chip,

  • with all those black chips on it.

  • And we want to move through it more slowly, because we only

  • want to move to the next location in the initials array

  • once we've put a capital letter in it.

  • And that's going to be less frequent, presumably,

  • unless the user typed in all caps.

  • So, in order to achieve that, we need a second variable.

  • And you proposed that we use a variable called counter.

  • But we could have called it j, or something else.

  • And so, the counter is initialized to zero,

  • and it is incremented here any time we encounter a capital letter.

  • So, it has the effect of literally counting the capital letters

  • in someone's name: D, J, M. And so it should be 3, at the end of those loops.

  • And that's perfect, because as we realized earlier,

  • 3 happens to be the correct location for where you put the backslash 0, even

  • though it wants to go in the fourth location, the address of it,

  • or the location is technically 3.

  • So, we sort of solve multiple problems at once, in this way.

  • Good question.

  • Other questions?

  • Other questions?

  • All right.

  • So.

  • With that said, let's not worry as much about that low-level kind

  • of implementation, and consider what more we can do with these things

  • called arrays.

  • If we start to get rid of the addresses, and just

  • know that we have sequence of characters, or anything else in memory,

  • and really, at the end of the day, we have the ability

  • to lay things out contiguously in memory.

  • Back to back to back to back, like this.

  • So, here are, then, eight boxes, inside of which we can put anything.

  • But we've kind of been cheating as humans for some time.

  • When we had Stelios' name on the screen, all of us in this room

  • just kind of glance up, and you kind of absorb his name all in one fell swoop.

  • But it turns out that computers are not quite as intuitive, or as all-seeing

  • as we are, where we can sort of take in the whole room, visually.

  • A computer can only look at one thing at a time.

  • And so, a better metaphor than a box like this

  • for the locations that you have in your computer's memory

  • would really be like eight lockers in a school,

  • where a computer, in order to look at the value in any of those boxes,

  • actually has to do a bit of work and open the door to see what's inside.

  • And you cannot, therefore, see all eight locations simultaneously.

  • So, this is going to have very real world implications, because now,

  • if we want to start checking the length of a string,

  • or counting the number of things in an array, or moving things around,

  • we're going to have to do one thing at a time.

  • Whereas we humans might just kind of look at it

  • and say, oh, sort these numbers in some intuitive way.

  • And that actually is a good segue to a very different problem,

  • which is that of sorting values.

  • So, for this, we have, say, the equivalent of seven lockers,

  • up here now.

  • And behind these doors, so to speak, is a whole bunch of numbers.

  • So we'll transition away from characters and strings,

  • and just generalize it to numbers, because they're

  • convenient to work with, and we have so many of them at our disposal.

  • But I'd like to find one number in particular.

  • So, suppose that a computer were storing an array of numbers,

  • a whole bunch of integers, back to back to back to back.

  • Here's what they might look like.

  • The doors are closed, though, so we need an algorithm for finding a number,

  • because a computer can't just look at it and say, there's your number there.

  • The computer has to be more methodical, probably

  • going from left to right, from right to left, starting in the middle,

  • randomly opening them.

  • We need an algorithm.

  • So, for that-- could we get one brave volunteer?

  • OK, come on down.

  • What's your name?

  • CHRISSY: Chrissy.

  • DAVID MALAN: Kristen?

  • CHRISSY: Chrissy.

  • DAVID MALAN: Chrissy.

  • Come on down, over this way.

  • All right, so Chrissy, all you know is that there are seven doors--

  • nice to meet you--

  • here on the screen.

  • And using your finger, you should be able to just touch a door to open,

  • or unlock it, and we'll see what it is.

  • And I would like you to find the number 50.

  • Dammit.

  • [LAUGHTER]

  • Very good!

  • [APPLAUSE]

  • I feel-- we need a better prize than a stress ball for that.

  • But very nicely done.

  • And let me ask you, if we can-- if you don't mind me putting the mic

  • in your hand--

  • here you go.

  • So, what was your amazing algorithm for finding 50?

  • CHRISSY: I just clicked on it.

  • DAVID MALAN: OK, that's great.

  • CHRISSY: It looks nice.

  • DAVID MALAN: OK.

  • So, you-- OK.

  • So, good.

  • So, wonderfully effective.

  • Let me go ahead and reveal-- actually, let's do this.

  • So, here's where you could have gone wrong any number of places.

  • And let me ask the audience before we try one other example.

  • What strikes you about these numbers?

  • Anything in particular?

  • AUDIENCE: They're unordered.

  • DAVID MALAN: They're all in order?

  • AUDIENCE: Unordered. DAVID MALAN: Unordered.

  • Kind of random.

  • Although, the very astute might notice something.

  • Or, someone who watches too much TV.

  • Yeah.

  • AUDIENCE: They're the numbers from Lost.

  • DAVID MALAN: Yes, thank you.

  • The two of us watch a lot of Lost.

  • So-- plus we had a seventh number, so we added ours.

  • So, they are actually in random order.

  • I just kind of shuffled them arbitrarily.

  • They're not sorted from smallest to biggest,

  • they're not sorted from biggest to smallest.

  • There's really no pattern, because I really just did shuffle them up.

  • And that's problematic, because even though Chrissy got lucky, finding 50--

  • was just so good at finding 50--

  • it might be hard to replicate that algorithm and find it

  • every time, unless you know a little something

  • about the numbers behind the doors.

  • And so, we do know that in this next example.

  • So, in this next example, there are still seven doors, and still

  • the same seven numbers, but now they're sorted.

  • And knowing that, does that change your approach?

  • CHRISSY: Uh, well, I guess if they're sorted, like, lowest to highest,

  • then it would, because I would know it's closer to the end.

  • But if I don't know how it's sorted, then--

  • I guess I wouldn't really know.

  • DAVID MALAN: OK, good.

  • So let me stipulate they're sorted from smallest to biggest,

  • and let me propose that you, again, find us the number 50.

  • Yeah, this is not working out very well.

  • That is very well done.

  • OK, so, congratulations.

  • [APPLAUSE]

  • So, you'll see-- thank you.

  • So, you'll see just how much more efficient

  • her second algorithm was, when she leveraged that information.

  • But in all seriousness, you could do better, especially

  • for very large datasets, if you know something about the data.

  • If you know that these numbers are sorted,

  • you could, as Chrissy did very intuitively and very correctly,

  • go to the very end, knowing that that's the biggest number,

  • it's probably all the way on the right.

  • If she didn't know that, and just knew that the numbers were sorted,

  • and did not know if 50 was a small number, a medium number,

  • the largest number-- just it was a number behind doors.

  • What would a smart strategy be, given that lesser information?

  • AUDIENCE: [INAUDIBLE] halfway, and then if it's

  • greater, you move it to the right, if it's less, move it left.

  • DAVID MALAN: Yeah, we can try that same divide

  • and conquer approach we did in the very first class, with the phone book,

  • right?

  • Where we looked roughly in the middle, because we

  • know that the Ms, give or take, would be in the middle.

  • And then if the--

  • we're looking for Mike Smith, whose name starts with an S,

  • we know that he would be to the right, and so we would go to the right,

  • and then to divide the problem--

  • [LAUGHTER] Today is not going very well.

  • So, we would divide and conquer the problem again and again.

  • And we can do that here.

  • Even though it's not quite as visually engaging as a phone book,

  • you can kind of go to the middle.

  • And if Chrissy had opened that middle door, and seen 16,

  • you would know-- what?

  • And actually, I can recreate this.

  • I'm just going to refresh the screen.

  • So in this case, we have the same doors.

  • I know 50's somewhere, and I don't know how-- where it is, but 16.

  • Now I know it's to the right, as you propose.

  • So, now I can essentially eliminate all these doors, not even worry about them.

  • And indeed, if I open them, we can confirm

  • that I don't need to waste any time actually searching them.

  • Now I've got three doors.

  • What would you propose we do next?

  • AUDIENCE: Go in the middle again?

  • DAVID MALAN: Yeah, go in the middle again.

  • So here, 42 is a great answer, but it's not the one we're looking for.

  • And indeed, we can throw this half of the problem

  • away, and finally search the right half, which now has been whittled down

  • to one, and then we would find 50.

  • And if I can reconstruct what would have been

  • a great history, in the first example, how well might Chrissy

  • have done theoretically, or if we did this exercise again and again

  • and again, with the first example.

  • If you don't know anything about the numbers,

  • you can get lucky, as one might by just choosing a door and, wow,

  • that happens to be the number.

  • But that's not going to happen all the time, most likely.

  • And so you might have to just start, you know, maybe at the beginning-- no--

  • no-- no.

  • Maybe you can get clever and skip ahead-- no--

  • no-- OK, eventually, you will find it, if it's actually there.

  • But if you don't know anything about the numbers, the best you can do

  • is what's called brute force.

  • Just brute force your way through the possibilities

  • until you find the answer.

  • But in the worst case, how many doors might I

  • have to open to find the number 50, if I knew nothing about them?

  • AUDIENCE: All seven.

  • DAVID MALAN: Yes, all seven.

  • Right?

  • If n is the number of doors, it might take me as many as n steps.

  • In this case, it was like n minus one, or six steps.

  • But in Chrissy's case, clearly, there's a really exciting lower bound,

  • because if you do get lucky, it might only take you one step.

  • So, that's an interesting range to consider.

  • The solution to your problem might take one step, or n steps,

  • or any number in between.

  • But the binary search that you proposed in the second approach, where

  • you divide and conquer--

  • recall that when we did that in the past,

  • we had a very nice shape to the curve that we drew that described

  • the efficiency of that algorithm.

  • And we'll come back to that before long.

  • But let's just, for clarity, formalize just what these two algorithms are.

  • If I start from the left and go right, or start from the right

  • and go left, following a line, we would call that linear search.

  • And it can be written in any number of ways.

  • I came up with this pseudo code, but again, any reasonable person

  • could come up with an alternative one and say it, too, is linear search.

  • These are not official definitions.

  • And I wrote it as follows.

  • For each element in array, if the element you're looking for,

  • return true.

  • So, this is a very concise way of saying, for each element in the array,

  • just look at it from left to right, or right to left.

  • If it's the one you're looking for, return true.

  • I found the number 50, or whatever it actually is.

  • Otherwise, return false at the very end.

  • And notice the indentation.

  • I un-indented it because only as the very, very last step do I return false,

  • if none of my iterations through the loop actually return true.

  • So, that would be linear search.

  • Binary search, you have to be a little more verbose to explain it, perhaps.

  • And there's many different ways to write this out,

  • but this is very similar in spirit to something

  • we saw before, with Mike Smith and the phone book.

  • So if we go ahead and look at the middle of the sorted array,

  • just as you proposed, and if the element you're looking for is right there,

  • go ahead and return true.

  • I found 50.

  • I got lucky, it was dead center in the middle of my array,

  • in one particular running of this program.

  • Else, if the element is to the left, search the left half of the array.

  • Else, if it's to the right, search the right half of the array.

  • Otherwise, return false, because it's presumably not there.

  • So, this would be binary search.

  • And even though it's more lines of code, or pseudo code,

  • it arguably should be a little faster, right?

  • Because of that dividing and conquering, and throwing half, half, half, half,

  • half, half of the problem away, the problem

  • gets smaller much, much more quickly.

  • All right.

  • So with that said, it seems to be a very good thing

  • that having things sorted for you is a very powerful ingredient to a problem.

  • Right?

  • It takes more work-- should have taken Chrissy more work;

  • would take all of us, in general, more work

  • to find a number using linear search than by using binary search.

  • Much like it would have taken me forever to find Mike Smith flipping one phone

  • page at a time, versus using divide and conquer,

  • where I found him much more quickly by dividing the problem in half, and half,

  • and half again.

  • So, that invites the question, how do you get something sorted?

  • Well, let me go ahead and pull up these things, which

  • we happen not to use in this class, but elsewhere on campus,

  • you might have these blue books.

  • And at exam time, you might write your name, and the class,

  • and all that on them.

  • And we've kind of simplified, so, this is a person whose last name starts

  • with A, last name--

  • a person's name starts with B, C, and all the way to Z.

  • But suppose that people finish at different times

  • during the exam, and of course, when you're in the science center

  • or wherever, everyone just starts handing them in,

  • and they kind of come in like this, and then

  • the TFs at the front of the room, or professor,

  • has to actually sort through these values.

  • Well, what's this algorithm going to be that we actually use?

  • If you've got a pile of exam books like this, all of them

  • have a letter associated with them, how would we go about sorting these?

  • What do I do?

  • AUDIENCE: Compare two at a time.

  • DAVID MALAN: Compare two at a time?

  • OK, so let me go ahead and pick up a couple here.

  • And since I'm the only one that can see them,

  • you should be able to see them on the overhead, thanks to Ian and Scully.

  • So, P and H, so, H goes before P, alphabetically.

  • All right, now what do I do?

  • Pick up another two?

  • AUDIENCE: Pick up one.

  • DAVID MALAN: Yeah, so maybe just pick up one,

  • and I happened to get N, so, that goes in between H and P.

  • So I can kind of slide it in.

  • Now I picked up G. That goes before H. Now I

  • picked up another one, E. That goes before G. So,

  • it's actually getting kind of easy.

  • Uh-oh, U-- this goes at the end, after P.

  • And so, I can keep grabbing one at a time, F in this case,

  • and then kind of insert it into its appropriate location.

  • And we won't do this the whole way, because it's going to get tedious,

  • and eventually I'm going to embarrass myself by getting one of the letters

  • wrong.

  • But that's an algorithm, right?

  • For each book in the pile, pick it up, compare it

  • to all of the elements you're already holding,

  • and insert it into the appropriate location.

  • And we can actually call that something, and that's

  • something called insertion sort, insofar as the emphasis of the algorithm is--

  • thank you-- on inserting letters, in this case,

  • into their appropriate location.

  • So, with that said, are there other ways than that?

  • Let's go ahead, here-- and we have enough stress balls to do it with just

  • one more human demo here, beyond these books--

  • these numbers here.

  • Suppose we wanted to sort these.

  • I have eight stress balls, eight pieces of paper--

  • could we get eight other volunteers?

  • So, yes, right and back, two three-- let's go farther back-- four.

  • Can I get farther back?

  • Five, six, seven, and eight.

  • Come on up.

  • Ah, next time.

  • Next time.

  • All right, come on down.

  • OK.

  • What's your name?

  • JERRY: Jerry.

  • DAVID MALAN: Jerry.

  • OK, If you want to go ahead and step in front of a board there.

  • ARMAH: [? Armah. ?]

  • DAVID MALAN: [? Armah, ?] David.

  • CHRIS: Chris.

  • DAVID MALAN: Chris, David.

  • Thank you.

  • KAYLIND: Kaylind.

  • DAVID MALAN: Kaylind.

  • NOLAN: Nolan.

  • DAVID MALAN: Nolan, David.

  • JAY: Jay.

  • DAVID MALAN: David. MATTHEW: Matthew.

  • DAVID MALAN: Matthew, David.

  • OK.

  • So, this is statistically anomalous, insofar

  • as we're actually very excited to say, in CS50, this semester,

  • for the first time ever-- we watch these numbers annually--

  • and we actually have 44% women in CS50 this year.

  • So, as you can see, none of my demonstrations

  • are going correctly today.

  • But trust in that data.

  • So if each of you could stand behind one of the music

  • stands here-- hopefully I have counted out exactly eight people.

  • We have, in front of you guys, numbers.

  • So go ahead and turn around the pieces of paper,

  • which represent the numbers that we have on the board here,

  • if I got the same order right.

  • So, there's going to be a bunch of different ways

  • we can sort these eight values.

  • So, how do we go about doing this?

  • Well, much like you proposed earlier-- just pick up a pair of blue books

  • and compare them-- why don't I try that same intuition?

  • So, four and two-- these numbers are obviously out of order.

  • So, what do I want to go ahead and do?

  • Yeah, so I can go ahead and swap these.

  • And now, problem is solved, which is great.

  • I've taken a bite out of the problem.

  • And then we move on.

  • Four and seven?

  • Those look OK.

  • Seven and five?

  • Not OK.

  • So, what do I want to do here?

  • AUDIENCE: Switch them.

  • DAVID MALAN: Yeah, so I can swap those, thank you.

  • So, seven and six?

  • Also out of order.

  • Let's swap those.

  • Seven and eight?

  • That's good.

  • Eight and three?

  • Not correct, so if you want to go ahead and swap those.

  • And, eight and one, if you want to go ahead and swap those.

  • So, what was the net effect?

  • Have I sorted this list of numbers?

  • So, obviously not.

  • But I did improve it.

  • And what's the most glaring example, perhaps, of the improvement, in front

  • of these?

  • AUDIENCE: Eight's at the end.

  • DAVID MALAN: Eight is now at the very end.

  • So the biggest number, if you will, bubbled up to the end,

  • as though it was bigger and sort of bubbled up.

  • So that's good, but there's still work to be done here.

  • So, I can, again, just try to fix these problems locally.

  • Just pick up a couple of problems and try to solve it.

  • So, two and four?

  • We're good.

  • Four and five?

  • Five and six?

  • Six and seven?

  • Ooh, seven and three?

  • If you guys want to swap those?

  • Wonderful.

  • And then, seven and one, we want to do it again.

  • And now, do I need to bother comparing seven and eight?

  • Technically no, right, because if we know eight made its way-- now

  • we can start cutting some corners, but correctly,

  • just to shave some time off of the algorithm,

  • make it a little more efficient.

  • Good.

  • So, sorted now?

  • No, so we have to keep doing this.

  • And let me let you guys now execute this, pair-wise at a time.

  • So, here we go.

  • Two, four.

  • Four, five.

  • Five, six.

  • Six, three.

  • Six, one.

  • And we can stop there, because we know seven is in order.

  • Now we do it again.

  • Two and four.

  • Four and five.

  • Five and three?

  • Five and one?

  • Improved, good.

  • And now, next, two and four, four and three, four and one,

  • and then two and three, three and one--

  • and then two and one--

  • OK, now it's sorted.

  • Yes, very well done.

  • Very well done.

  • So, it's kind of tedious, frankly, and I didn't

  • want to keep walking back and forth, because thankfully we have, like,

  • all of this-- this manpower, these multiple CPUs, I guess, literally

  • today.

  • So, we have all of these CPUs, or computers,

  • that are able to help me out.

  • But it was still very slow.

  • I mean, it's a long story.

  • So, let's rewind just once and do one more.

  • If you guys could rearrange your pieces of paper

  • so that it matches the screen again, just to reset to our original location.

  • Let's go back there.

  • And let's try one other approach.

  • I've tried the insertion approach, whereby I just take a problem,

  • like the blue book, and insert it into its correct location.

  • But honestly, that was getting a little tedious, and that's why I aborted,

  • because it's going to take me longer, and longer,

  • and longer to find the right location among all of those blue books.

  • So, let me try just a more intuitive one.

  • If I want to sort these numbers, let me just go and select

  • the smallest number I see.

  • OK, four, at the moment, is the smallest number I see.

  • So I'm going to grab it for just now.

  • Now, again, these are lockers.

  • Even though we humans can see them all, the computer

  • can only see one location at a time.

  • And this is, indeed, the smallest number I have seen thus far.

  • So I'm going to grab it.

  • But very quickly, I can abort that, because now I've

  • found an even smaller number.

  • So I'm going to hang onto that instead.

  • Seven, I'm not going to worry about that; five, six, eight, three, one--

  • I've found a smaller number.

  • Now, I need to kind of do something with this.

  • I want to grab the one, so I could just put the two here.

  • And what do I want to do now with the number one?

  • Yeah, I kind of just want to put it here.

  • And so, I can do this in a few different ways,

  • but you know what, I'm just going to evict

  • whoever's here, because it's a pretty low number, but it's also random.

  • It could have been a big number.

  • So let me just make room for it and do that.

  • I have selected the smallest element.

  • Is the list sorted?

  • I mean, obviously not.

  • But is it better?

  • It is, right?

  • Because the one is now, at least, in the right location.

  • So I've solved one eighth of the problem so far.

  • So that's not bad.

  • What could I do next?

  • Let me apply the same algorithm.

  • Let me select the smallest, which is currently this one, still

  • this one, still this one--

  • nope, three is smaller-- oh, two is even smaller, so let me ultimately

  • grab this.

  • And you know what, four, you really don't need to be there;

  • I'm just going to evict you again, and put two where it belongs,

  • and move this one over here.

  • So now, the list is even more sorted.

  • And if I proceed to do this again and again, if you want to just keep

  • handing me the smallest number-- three?

  • OK, so I'm going to go ahead and just evict seven, because it's

  • kind of a random number anyway.

  • And now, thank you, four--

  • I'm going to go ahead and evict five, even though, kind of feels

  • like I'm making a little bit of work for myself, on average,

  • it's not going to matter.

  • Sometimes it will be good, sometimes it'll be bad.

  • So let me go ahead and put five over here.

  • Now I need to select the next smallest element, which happens to be five.

  • So we recovered pretty quickly.

  • So I'm going to evict six over here.

  • Now I'm going to look for the smallest element.

  • Now it's indeed six.

  • I'm going to evict eight, put this over here--

  • and now seven is good, eight is good--

  • done.

  • But it's still kind of a long story, right?

  • Like, I'm going back and forth, looking for these numbers.

  • But this would be what we'd call selection sort.

  • So thank you all very much-- if you'd like to keep your pieces of paper,

  • you're welcome to.

  • And let me give you guys a stress ball as well.

  • And a round of applause, if we could, for you guys.

  • If you want to hand those out.

  • So, as before, let's see if we can apply--

  • let's see if we can apply some pseudo code to this algorithm,

  • because it's one thing to talk about it, and it's one thing to sort of reason

  • through it intuitively, but at the end of the day,

  • if you want to program this, we've got to be more precise,

  • and we've got to consider the lower level operations that the computer is

  • going to execute.

  • So, here's how we might implement bubble sort.

  • Repeat until no swaps.

  • For i, from 0 to n minus 2-- and n is just

  • the size of the problem, the number of doors, the number of humans,

  • the number of numbers, whatever the input to the problem actually is.

  • So, for i, from 0 to n minus 2, if the i-th and the i-th plus 1 elements

  • are out of order, swap them.

  • So, this is kind of a mouthful.

  • But if you think about it, I'm just kind of applying some of the vocabulary

  • that we have from C, and kind of sort of from scratch,

  • to an otherwise very organic human experience,

  • but using more methodical language than I was just kind of doing off the cuff,

  • when we were doing it with humans.

  • Because what does this mean?

  • For i from 0 to n minus 2.

  • That means start at, like, the 0 location,

  • and if there's n elements here-- this is 0--

  • and this, at the very end, is location--

  • it's going to be n minus 1.

  • Right?

  • If you start counting at 0, you have to readjust your whole life

  • to subtract 1 from the tail end of that range of numbers.

  • Right?

  • 0-- if that was 1, this would be n.

  • But if that were 0, this is now n minus one.

  • So I'm saying, though, for 0--

  • for i from 0 to n minus 2, which is technically this.

  • So I'm using sort of for loop-like language to start iterating here,

  • and do something like this, up until the second

  • to last element, which we've not done before.

  • Seems almost buggy.

  • But if you read ahead in the pseudo code, why did I do that?

  • And only iterate until the second to last element with i?

  • What jumps out at you?

  • Yeah.

  • AUDIENCE: Because then you're gonna swap the i in the i-plus-one-th elements?

  • DAVID MALAN: Good.

  • AUDIENCE: When you get to n minus 2, you'll swap it with n minus 1.

  • DAVID MALAN: Exactly.

  • Recall that bubble sort was all about swapping, or potentially swapping

  • pair-wise elements, neighbors, if you will.

  • So, you have to make sure that if you're iterating through all of these numbers,

  • you have to stop short of the end of the array, so that i plus 1

  • actually refers to an element that's actually in your list,

  • and not, for instance, way over here, which

  • would be some garbage value that you shouldn't actually touch.

  • So, if those elements are out of order, we swap them,

  • and I have a big outer loop there that just says,

  • keep doing this, again and again and again, until you don't swap anything.

  • At which point you can infer that you're done.

  • Because every time I walked back and forth on the list,

  • and the guys helped out by swapping their numbers as appropriate,

  • I kept doing it again if there was still room for improvement.

  • And intuitively, why is it absolutely, logically safe

  • to stop that whole process once you have not made any swaps on a pass

  • through the list?

  • Why is that a safe conclusion?

  • So, if I walk through the list--

  • no, these are good, these are good, these are good--

  • OK, you didn't want your number--

  • these are good, these are good--

  • how do I know that I don't need to do that again?

  • AUDIENCE: It's sorted already.

  • DAVID MALAN: It's sorted already, right?

  • And it would be kind of irrational--

  • if you've walked through the list, looking at everything pair-wise,

  • found nothing to swap, to even bother doing that again--

  • why would you expect different results, if the numbers themselves

  • are not moving and you didn't move them yourself?

  • So you can just stop.

  • But this still is going to invite the question, well, how expensive was that?

  • How many swaps, or comparisons, did we make?

  • And we'll come back to that before long.

  • Selection sort, though, can be expressed maybe even a little more succinctly.

  • And that was the second algorithm we did with our eight volunteers

  • here, for i from zero to n minus 1.

  • So this time, all the way through the end of the list,

  • find the smallest element between i-th and n minus 1-th.

  • So between those two ranges, the beginning

  • of your list and the end, and then swap the smallest with the i-th element.

  • So what does this mean?

  • So again, for i from 0 to n minus 1.

  • This is just pseudo code for saying, start a variable i at location 0.

  • And do this until i equals n minus 1.

  • So, do this until you've gone all the way through the list.

  • What is it telling me to do?

  • Find the smallest element between the i-th element and the end of the list.

  • N minus one never changes.

  • It always refers to the end of the list, so that's

  • why I walked through the list looking for, ultimately, the number 1.

  • And what did I do with the number 1?

  • Swap the smallest with the i-th element.

  • And I might have gotten one of the steps wrong when I did a little switcheroo,

  • but we fixed it thereafter.

  • Ultimately, I kept evicting whoever was in the i-th location

  • to make room for the element that I knew belonged there

  • And I could have shuffled them to make room for those elements,

  • but it turns out, mathematically, it's just as fine

  • to just evict it and move it all the way to the end, as we did.

  • And once I've gone all the way through the list,

  • there is no more smallest element to find.

  • And as we saw, the list is sorted.

  • So, maybe this is faster, maybe this is slower--

  • it's not immediately obvious.

  • And insertion sort, which we actually came up with by way of the blue books

  • on the floor, might be described as this.

  • For i, from 1 to n minus 1, call the 0th through the i minus i-th element

  • the sorted side--

  • that's a mouthful-- so, consider the left of your list

  • the sorted side of the list.

  • And initially, there's nothing there.

  • You have zero elements sorted to your left,

  • and eight unsorted elements to your right.

  • So that sort of describes this story, when

  • we had volunteers and numbers here.

  • There are no elements sorted; everything to my right was unsorted.

  • That's all that's saying.

  • Remove the i-th element.

  • That was like picking up this blue book, if we

  • were using blue books in this example.

  • Then what do I want to do?

  • Insert it into the sorted side, in order.

  • So, if this is the sorted side, this is the unsorted side,

  • this is the equivalent of saying, take that blue book

  • and just put it in the first location.

  • And you can kind of make a visual gap for it.

  • Now, this is the sorted side, this is the unsorted side.

  • Or, equivalently, when I was down here with the blue books,

  • the books in my hands were the sorted side, and everything still on the stage

  • was the unsorted side.

  • Same idea.

  • So, what happens next?

  • I then iterate one location next, and I remove the next element,

  • and whatever number that is, I figure out, does it go to the left

  • or does it go to the right?

  • Which was the same thing, again, on stage,

  • me sort of picking up a third blue book and deciding,

  • does it go in between these books?

  • Does it go below, does it go above?

  • I inserted it into its appropriate location.

  • So in this insertion sort algorithm, you sort of take each number

  • as you encounter it, and deal with it then and there.

  • You take the number and deal with it, so,

  • you know what, this one's got to go here, if we just kind of pretend what

  • the numbers look like for a moment.

  • So that would be inserting it into the right location,

  • like I did with the blue books.

  • Maybe this one-- oh, maybe this one's a really small number,

  • and so I insert it over here.

  • So I kind of literally deal with each problem as I encounter it,

  • but it just gets expensive, or very annoying,

  • to have to move all of this stuff out of the way

  • to make room for those elements.

  • And that's why I got bored with the blue book example,

  • because it was getting very tedious looking

  • through all of the blue books for the correct location.

  • So in short, all three of these algorithms,

  • while very differently expressed, and while all of them

  • are kind of intuitive until you try to describe what your human intuition has

  • actually been doing for the past some number of years

  • that you've been sorting things just in the real world--

  • they can all be described, at least, in terms of these algorithms.

  • So these algorithms-- and we started this conversation in the first

  • lecture--

  • all have, ultimately, some kind of running

  • time associated with them, like how long does it take to do something.

  • And we talked about the process of finding Mike Smith in terms

  • of this pretty generic graph.

  • It wasn't very mathematical, it wasn't very sophisticated-- we just

  • wanted to get a sense of the relationships, or tradeoffs, of space

  • and time, so to speak.

  • And so, on the x-axis, or horizontal, we have the size of the problem--

  • so, like, a number of pages in the phone book,

  • or number of people in the phone book--

  • and on the y-axis, or vertical, we had the amount of time

  • to solve the problem.

  • How many seconds, how many page turns-- you

  • could count using any unit of measure you like.

  • And the first algorithm for Mike Smith, when I started with the very first page

  • and turned, and turned, and turned, was a straight line, a linear relationship.

  • One more page, one more step.

  • So, it's straight line.

  • The next algorithm was searching by twos, recall, in the first lecture.

  • Two, four, six, eight.

  • And that's still a straight line, because there's

  • a predictable relationship between number of pages and number of seconds,

  • or page turns.

  • It's two to one instead of one to one, so it's still a straight line,

  • but it's lower on the graph.

  • It's better.

  • But the best one, by far, was the divide and conquer approach, we said.

  • Right?

  • And it certainly felt faster; it's great, because it was intuitive.

  • It wasn't quite as easy to express in pseudo code--

  • that was among the longer ones today--

  • but it at least got us to the answer faster.

  • And this is logarithmic, in the sense that the logarithm--

  • technically base 2, if you recall some of your math--

  • it's because you're dividing the problem in half, and half, and half.

  • And it's fine if you're uncomfortable with it,

  • don't even remember what a logarithm is.

  • For now, just assume that logarithmic time has this different shape to it.

  • It grows much more slowly.

  • Any time you can choose log of n over n, when picking between two algorithms,

  • go with the log n, or something like that,

  • because it is going to be smaller, as we can see visually here.

  • So, let's just consider something like bubble sort.

  • There's a couple of ways we can look at this.

  • And again, the goal here is not to be very mathematical--

  • we're not going to start doing proofs, but at least, by taking

  • a glance at some of the steps in this algorithm,

  • you can get a general sense of how slow or how fast an algorithm is.

  • And indeed, that's the goal.

  • There's this fancy notation we're about to see called asymptotic notation,

  • with special Greek characters.

  • But at the end of the day, we're really just trying

  • to get an intuitive sense of how good or bad an algorithm is, much like we

  • were trying to do with this picture.

  • But now we'll do it a little more conventionally,

  • as a computer scientist might.

  • So in bubble sort, recall that we compared every pair of humans

  • and made a swap if they were out of order.

  • And then we repeated.

  • And we repeated, and repeated.

  • And we kept going through the list.

  • So, that can be quantized-- like, you can kind of break that down

  • into some total number of steps.

  • So, if you've got n humans in front of the room,

  • and you want to compare them from left to right in pairs,

  • how many possible pairs are there as I walk

  • through the list for that first time?

  • If there's n elements, and I can put the stands back where they were.

  • How many pairs were there, as I walked from left to right?

  • I compared these two, these two, these two.

  • Yeah, there's n minus one.

  • Specifically seven.

  • And even if you're not quite sure where we're going with this,

  • if there's eight stands-- like, this is one, two, three, four, five, six,

  • seven--

  • and there's eight total.

  • So, that's indeed n minus 1.

  • So, that's how many comparisons we might have made

  • the first time through bubble sort.

  • But the very first time I went through the list in bubble sort

  • did the list get fully sorted?

  • No, we had to do it again.

  • We knew that 8 had bubbled up to the very end, so that was good.

  • 8 was in the right place.

  • But I had to do it again, and fix more problems along the way.

  • But the second time, I didn't need to go all the way through the list.

  • To be clear, who did I not need to look at?

  • The last location, or 8, in our very specific case.

  • So the second time through the list of humans,

  • I only have to make n minus 2 comparisons.

  • Right?

  • Because I can just ignore number 8, the final human, and just deal

  • with the seven other humans that are somehow misordered.

  • So, if I wanted to really be nitpicky and write this down, and count up

  • how many steps, or how many comparisons, I made,

  • we could generalize it like this.

  • All right?

  • It's going to be n minus 1, plus n minus 2, plus n minus 3, plus dot-dot-dot.

  • Or, more specifically, 7 plus 6 plus 5 plus 4-- this

  • is just the fancier formulaic way of saying the same thing.

  • Now, I don't remember my back-of-math-book formulas all that

  • well, but I remember this one.

  • You know, in your physics books, your math books, often in the hardcovers,

  • you'll see little cheat sheets for what series of numbers

  • actually sum to or multiply out to.

  • And so it turns out that that summation can actually

  • be expressed more succinctly as n times n minus 1, all divided by 2.

  • That is the same thing, mathematically, as the long series

  • that I had a dot-dot-dot there for.

  • So if I multiply this out, just using some algebra,

  • that's like n squared minus n, divided by 2.

  • And then if I kind of multiply that out, that's n squared, divided by 2,

  • minus n over 2.

  • So if I wanted to be really precise, this

  • is the running time of bubble sort.

  • It takes this many steps to sort n people.

  • Why?

  • Because I literally just counted the number of comparisons I made.

  • That's how many comparisons it takes to do bubble sort.

  • But honestly, this is really getting tedious,

  • and my eyes are already starting to glaze over.

  • I don't want to remember these algebraic formulas here.

  • So let's actually try an example, just to get a sense of how slow

  • or how fast this is.

  • Suppose that n were a million.

  • So not eight, but a million people, or a million numbers.

  • How slow, or fast, is bubble sort going to be?

  • Well, if we plug in a million, that's like saying n is a million.

  • So that's a million squared, divided by 2, minus a million divided by 2.

  • Because that's what it all summed up to be.

  • So if I do this out, it's a really big number.

  • 500 billion minus 500,000.

  • And in any other context, 500,000 is a pretty darn big number.

  • But not so much when you subtract it from 500 billion,

  • because you still get 499,999,500,000 after subtracting those off.

  • Which is to say, of those two terms, the one on the left versus the one

  • on the right, which is the bigger one?

  • The more dominating factor in the mathematical expression?

  • It's the one on the left, right?

  • That's the one that's massively bigger.

  • And so more generally, n squared divided by 2

  • feels like a bigger result than n divided by 2 alone.

  • And we've seen it by proof-- by example, which is not a proof,

  • but it at least gives us a feel for the size of the program,

  • or the number of comparisons we're actually making.

  • So you know what, ugh-- if that is the case, if the dominant factor--

  • the one on the left, in our case; the one with the square,

  • specifically-- is just so much more influential

  • on the number of comparisons we're going to make, then

  • let's just wave our hands at the lower-ordered terms,

  • and divide it by 2, and anything like that,

  • and just say, ugh, this algorithm feels like n squared.

  • I'm going to throw away the denominator, I'm

  • going to throw away the thing I'm subtracting off,

  • I'm going to throw away anything that is not the dominating factor,

  • which is the term that contributes the most to the total number of steps

  • incurred.

  • And indeed, this is what a computer scientist would describe, generally,

  • as the running time of this algorithm.

  • It is on the order of n squared.

  • It's not n squared, but it's on the order of n squared, as we've seen.

  • It's pretty darn close, and it's good enough for a conversation

  • with another reasonable person who wants to debate whether his or her algorithm

  • is maybe better or worse than yours.

  • So, this would be called big O notation.

  • Big O is used to refer to an upper bound on an algorithm's running time.

  • Upper bound meaning, we'll consider, for our purposes, in the worst case,

  • how much time might this algorithm take?

  • Well, it's going to take on the order of n squared steps.

  • Because if the list of numbers is unsorted initially,

  • we've got to do a lot of work to actually sort it.

  • There's other terms that we could put in those parentheses.

  • Some algorithms are not on the order of n squared.

  • Some algorithms are actually order of n log n;

  • some algorithms are on the order of n itself, or log n, or even 1,

  • where 1 refers to constant time.

  • And in fact, the ones I've highlighted here,

  • we've actually seen examples along the way of all of these so far.

  • For instance, what algorithm have we seen

  • that has a running time on the order of n?

  • n steps?

  • AUDIENCE: Linear search.

  • DAVID MALAN: Linear search.

  • If we were to think back, even to today, to linear search--

  • or from the first lecture, when I was just looking for Mike, really slowly,

  • one phone book page at a time, that's a linear algorithm.

  • If there's n pages, or n humans, it might take me on the order of n steps,

  • because Mike Smith--

  • S is toward the end of the alphabet, so he might be way over there,

  • or way toward the end of the phone book, or, God forbid,

  • his name starts with a Z, then I'm really

  • going to have to go all the way into the phone book.

  • And so that's on the order of n steps.

  • So, n here would be linear.

  • We've also seen another algorithm, here in yellow--

  • big O of log n.

  • Saw it just a moment ago.

  • Which of our algorithms was on the order of log n running time?

  • Yeah, so binary search.

  • Divide and conquer.

  • We didn't call it--

  • we didn't describe it this formulaically in the first lecture,

  • but that's how you would describe the running time.

  • Not just with a pretty picture, but just with an expression

  • like this, that all humans--

  • at least computer scientists-- can agree upon.

  • And constant time.

  • The funny thing here is, because we're throwing away

  • terms that don't really matter, O of 1 does not

  • mean an algorithm that takes one step only.

  • That would be a very limited number of options for your algorithms.

  • But it does mean, symbolically, a constant number of steps.

  • A constant number of steps.

  • So, what's something you might do that takes a constant number of steps,

  • in an algorithm?

  • Maybe in, like, the first lecture, we had the silly peanut butter

  • and jelly example.

  • Which of the steps that day might have taken big O

  • of 1 steps, a constant number of steps?

  • I remember one, like, insert knife into jar?

  • That's kind of one step.

  • Maybe it's two, because I might have to, like, pick up the knife,

  • and insert it into the jar then.

  • One step, two steps-- but it's a constant number.

  • The number of steps is not at all informed by the algorithm itself.

  • It just happens.

  • You do that in one step.

  • So, if there's any number of other algorithms here--

  • and we'll leave in white one that we'll come back to-- but let's

  • just consider the opposite of this, if you will.

  • If big O is our upper bound on running time, it turns out,

  • there's a vocabulary for discussing the lower bound on an algorithm's running

  • time.

  • Which, for our purposes, we'll generally consider in the context of best case.

  • So in the best case scenario, how little time might an algorithm take?

  • Well, this is a capital omega, and it's used in exactly the same way.

  • You just say, omega of n squared, or omega of n, or omega of log n.

  • So it's the same symbology, it just refers to a lower bound.

  • So, it takes this few steps, or this many steps.

  • Big O, big omega.

  • So, what's an algorithm, therefore, that is in, so to speak, omega of 1?

  • Like, what algorithm, in the best case, might actually take just one step?

  • And who is best to answer this question today in the room, in fact.

  • What algorithm could take one step?

  • Yeah.

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: Yeah, linear search could take omega of one steps.

  • Because in the best case, it is right there.

  • Or in Chrissy's case, even if our algorithm

  • is to sort of choose randomly, in the best case,

  • it is right there, the number 50.

  • So even her algorithm, and even our linear search algorithm--

  • and for that matter, even our binary search algorithm--

  • are in omega of 1, at least in the best case.

  • Because if you get lucky, you're done.

  • One step.

  • By contrast, what is an algorithm that takes at least n steps?

  • So, omega of n?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: That's a good one.

  • So you say bubble sort, I heard.

  • AUDIENCE: Yes.

  • DAVID MALAN: Why?

  • AUDIENCE: Because if they're all already in order,

  • you just go through each comparison, and then make no swaps.

  • DAVID MALAN: Good.

  • So in the case of bubble sort, where we generally

  • had a lot more work to do than just finding something

  • with a searching algorithm, bubble sort is, minimally, an omega event

  • you need at least on the order of n steps-- maybe it's n minus 1,

  • or n minus 2-- but it's on the order of n.

  • Why?

  • Because only once you go through the list at least once do

  • you know-- what, to be clear?

  • AUDIENCE: That they're all in order.

  • DAVID MALAN: That they're in order.

  • And you know that as a side effect of having not made any swaps.

  • So, you can only determine that a list is sorted in the first place

  • by spending at least n steps on that process.

  • Excellent.

  • So, there's yet another one, and this is the last,

  • whereby if you happen to have an algorithm, or a scenario

  • where the upper bound and the lower bound are the same--

  • turns out there's a symbol for that too; you can just describe the algorithm

  • in terms of theta notation.

  • That just means theta of n, theta of log n-- whatever it is,

  • that just means upper bound and lower bound are one and the same.

  • And there's more formalities to this, and you can actually dive in deeper

  • to this in a theory class.

  • But for our purposes, big O and omega will be a generally useful way

  • of describing, generally speaking, just what the running time of an algorithm

  • actually is.

  • So, big O of n squared is the fastest we've seen thus far.

  • Unfortunately, it does actually tend to run pretty slowly.

  • We saw it with an example of, like, 500 billion steps just

  • to sort a million elements.

  • Turns out we can do way better than that.

  • Much like in the first lecture, when I crazily proposed,

  • I think, suppose your phone book had, like, four billion pages--

  • well, you only need 32 steps using binary search,

  • instead of four billion steps using linear search.

  • So, it would be nice if, after all of this discussion of algorithms

  • and introduction of these formalities, if we can actually do better.

  • And it turns out that we can do better, but this

  • has been a lot to do already, so let's go ahead in here

  • and take a five-minute. break.

  • And when we come back, we'll blow out of the water the performance of all three

  • of those algorithms we just saw.

  • All right.

  • So, let's take a quick look at what these algorithms look like,

  • so we can actually compare them against something

  • that I claim is actually going to end up being better.

  • OK.

  • So, here we have an array of numbers represented as vertical bars.

  • So, small bar is small number; tall bar is big number.

  • And so it's a nice way to visualize what otherwise is pretty low level

  • numbers alone.

  • I'm going to go ahead here and make the animation pretty fast,

  • and I'm going to go ahead here and choose, for instance, bubble sort.

  • And, actually, let me slow it down a little bit,

  • just for the sake of discussion.

  • So, you'll see, in this algorithm, bubble sort.

  • It's making multiple passes through the list, just as I did,

  • highlighting in pink two neighbors at a time,

  • and deciding whether or not to swap them,

  • just as we were, with our eight volunteers, doing the exact same thing.

  • Of course, with this visualization, we can do it more quickly,

  • and we can speed this up to the point where you can kind of now

  • start to feel the bubbling effect, if you will, whereby the bigger

  • numbers are bubbling up to the top, to the right, just as the number 8 did,

  • when we did it on paper.

  • So, this is bubble sort.

  • And we could watch this for quite some time, and in some sense,

  • it's kind of mesmerizing.

  • But in another sense, it's pretty underwhelming,

  • because at the end of the day, all you're going to get

  • is a bunch of bars, sorted, from short bars to big bars.

  • But perhaps the takeaway is that I'd kind of

  • have to stall here for a decent amount of time,

  • even though we're running this at the fastest speed, because it's only

  • fixing, at best, one number at a time.

  • And maybe some others are improving, but we're only moving all the way

  • to the end one number at a time.

  • And we have to then go back, and go back, and go back, and do more work.

  • It's going to be very ungratifying to abort it, but let's go back to random.

  • And now, if we choose, for instance, selection sort,

  • you'll see that the algorithm works a little differently.

  • Let me slow it down.

  • And what it's doing now, which is a little less obvious,

  • is it's looking through the list for the next smallest element,

  • and it's going to put it at the beginning of the list.

  • All the way at the left.

  • So, it's looking, and looking, and looking,

  • and it leaves highlighted in red the most recently discovered

  • smallest element.

  • And then as soon as it gets to the end of the list,

  • it's going to move that smallest element all the way to the left.

  • So that we now, kind of like the opposite of bubble sort,

  • have all of the smallest elements to the left.

  • Though, this is arbitrary.

  • We could bubble up the small elements by just reversing

  • the order of our operations; we could sort from biggest

  • to smallest-- that is irrelevant.

  • It's just by human convention we tend to sort from smallest to biggest, at least

  • in examples like this.

  • And we can speed this up, but it doesn't quite have quite the same comparison

  • effect, because all you're doing is a swoop through the list looking

  • for the smallest, looking for the smallest, looking for the smallest.

  • And so, this way, it's going to build up from short to tall.

  • Let me go ahead and do it one more time, this time with insertion sort,

  • and slow it down.

  • And so, what we're doing here is the following.

  • We identify the next element, and then we go and insert it into the place

  • it belongs in the "sorted half" of the list.

  • So, recall that I generally describe stuff on the left as being sorted,

  • stuff on the right as being unsorted, and the implication

  • of that is that even though these numbers here on the left

  • are indeed sorted, when I encounter a new number, out in the unsorted area,

  • I might have to move some things around and shuffle things around.

  • And unlike the cheat I was doing here in person--

  • when I grabbed that music stand before and just kind of moved it over here--

  • that's not really legitimate.

  • Right?

  • This is garbage value land.

  • Like, I should not have had access to this memory.

  • And so what we did with our actual eight humans was more legitimate.

  • The fact that our volunteers did the physical labor

  • of moving those numbers around?

  • That was the low-level work that the computer has to do, too.

  • And you see it here all the more, either at this slow speed or the faster speed.

  • It's being inserted into the appropriate location.

  • So, case in point, this tiny little element?

  • We have to do a huge amount of work to find its location,

  • until finally, we've found it, and now we do the same thing.

  • So, all of these have some pluses and some minuses.

  • But it turns out, with merge sort, we can do even better.

  • An algorithm that goes by the name of merge sort.

  • But to do better, we need to have a new ingredient, or at least more formally

  • defined, that we've kind of sort of leverage before, but not by name.

  • And to do this, I'm going actually take out a little bit of code, in CS50 IDE,

  • a program called sigma-0.c.

  • And we'll see the interconnection in just a moment.

  • So in this program, notice the following.

  • We have a main function, whose purpose in life

  • is to get a positive integer from the user,

  • and to pester him or her, again and again

  • and again, until they cooperate and provide a positive integer.

  • That's what the do-while loop is often useful for.

  • And then, what do we do with it?

  • We simply pass that value, n, that the human typed in,

  • via get int, to a function called sigma.

  • And sigma is like the Greek character, or the capital E-looking character,

  • that generally means, in math, like, sum a bunch of numbers.

  • Add a bunch of numbers together.

  • So, this is essentially a function called sigma, whose purpose in life

  • is to sum all of the numbers from 0 to n.

  • Or, equivalently, from 1 to n.

  • So, 1 plus 2 plus 3 plus 4 plus, dot-dot-dot, n, whatever n

  • happens to be.

  • And then, via printf, we're just printing it out.

  • So, let me just run this program to make super clear what's going on.

  • And I can do this by doing, of course, in my source three directory for today,

  • make sigma 0, enter, dot slash sigma 0, positive integer, I will do 2.

  • So by that definition, it should sum 0 plus 1 plus 2, so 1

  • plus 2-- that should be 3.

  • So I should see 3.

  • And indeed, I see 3.

  • Let's do one more, if I add in three numbers.

  • So, this should be 1 plus 2 plus 3-- so, that's 1-- that's 6, in total.

  • And so forth.

  • And they get pretty big pretty quickly.

  • If I do 50, then we start to get into the thousands.

  • So, that's all it's doing.

  • And how is it doing this?

  • Well, we could implement this in a whole bunch of ways,

  • but if we leverage some of our sort of techniques thus far,

  • we might do it using a for loop.

  • That's kind of been one of the most common tools in our toolkit.

  • And how am I using it here?

  • I'm first declaring a variable called sum, initializing it to 0,

  • because I've done no work yet.

  • Then I have a for loop, for i equals 1 all the way up through m.

  • Why m?

  • Well, just because.

  • Recall that when you make your own function, whether in Scratch or in C,

  • you get to decide what to call the inputs to that function.

  • The arguments, or parameters, as they're called.

  • And just for clarity, I called it m, even

  • though we've typically been using n.

  • I could have called it anything I want.

  • I just wanted to make super clear it's a different variable.

  • But more on that in a week or so.

  • And so I'm just counting from one to m, and I'm adding to sum whatever i is.

  • Now, just as a quick check, why am I not doing

  • sum plus plus, as I usually do in these kinds of cases?

  • AUDIENCE: Because you're not incrementing by [INAUDIBLE]..

  • DAVID MALAN: Exactly.

  • I'm not incrementing by 1, I'm incrementing by 1, and then by 2,

  • and then by 3, and then by 4, and so forth.

  • So I need this loop to be counting up, and I

  • need to be adding i to the sum, not just a plus plus, in this case.

  • Then I return the sum.

  • And so, this is an example of an abstraction.

  • Like, I now have a function called sigma--

  • just like in math, you might have the big capital sigma

  • symbol that just says, add all these numbers together,

  • I have a C function that does that.

  • And so now, higher up in my code, I can call that function

  • and then print out the answer.

  • But it turns out that this simple function lends itself

  • to a nice example of another programming technique, something called recursion.

  • And we won't have terribly many opportunities in CS50

  • to apply this technique, but we will toward semester's end.

  • If you continue on to a class like CS51, you'll use it all the time.

  • If you use another type of programming language,

  • you'll very often use this technique.

  • And it's called recursion, and it looks like this.

  • Let me go ahead and open up another file that's

  • available on the course's website called sigma 1.

  • Notice that main is identical.

  • So, main is identical.

  • And indeed, it's still calling a function called sigma,

  • and then using printf to print out the answer.

  • So there's no difference there.

  • But what is different, in this next version, is the code for sigma.

  • So, what's going on here?

  • It still takes, as input, an integer called m.

  • So that's good, because I need to know what to sum up to.

  • It returns an integer, as before.

  • And it amazingly has, like--

  • what, four real lines of code, plus some curly braces?

  • And even those lines of code are super short.

  • And there's no additional variables, and there's this weird, crazy logic here.

  • But let's see what it's doing, first and foremost.

  • On line 23, I'm saying if m is less than or equal to 0, return 0.

  • Now, why does this make sense?

  • Well, I only want to support positive numbers, or non-negative numbers,

  • from 0 to m.

  • And so I just kind of need an error check there, right?

  • If the human somehow passes into this function negative 50 or something else,

  • I don't want the function to freak out and give unpredictable behavior,

  • I just want it to return 0, in cases of error

  • or when the number gets that small as to hit 0 or even lower.

  • So this, I'm going to call base case.

  • It's just, like, this sanity check, like,

  • don't let the math go beyond this point of 0 or less.

  • So, amazingly, if you really zoom in on this code, the entirety of this program

  • really boils down to one line.

  • And what's going on here?

  • So, I am returning, from sigma, an answer.

  • But, curiously, my answer is kind of defined in terms of itself,

  • which generally is a bad idea.

  • Right?

  • It's like in English, if you try to define

  • a word by using the word in the definition,

  • usually someone calls you on that, because it's not

  • all that helpful to use a word in the definition of the word.

  • And that's the same idea, at first glance, of recursion.

  • You are using the same function to solve a problem that

  • was supposed to be solved by that function in the first place.

  • So, what do I mean by that?

  • Main, of course, is calling sigma, and that means this code

  • down here that we've been looking at gets executed.

  • So, suppose that we hit this line of code.

  • What recursion allows us to do, in this case,

  • is take a bite out of the problem, and then defer to someone else

  • to figure out the rest of the problem.

  • So, what do we mean by that?

  • Well, sigma, again, is just this process of adding up all the numbers between 0

  • and some number, m.

  • So, 1 plus 2 plus 3 plus dot-dot-dot.

  • So, you know what?

  • I don't want to do all that work, as I did in version 0 with my for loop.

  • Let me just do a piece of that work.

  • And how do I do that?

  • Well, you know what, when you ask me, what is the sum from 0 to m?

  • I'm going to be kind of circular about it, and be like, well,

  • it's the answer of--

  • the answer is m, the biggest number you handed me, plus the sum of everything

  • below it.

  • Right?

  • So, if you passed in the number 10, it's like saying, well, sigma of 10

  • is 10 plus sigma of nine, and, like, leave me alone.

  • I don't want to do the rest of the math.

  • But, because you're calling the same function again, that's actually OK.

  • A function can call itself, because if you

  • think about where the story is going, now sigma gets called, in this story,

  • with sigma of 9.

  • What does the code do?

  • Well, sigma of nine returns 9 plus whatever sigma of 8 is.

  • So we're not solving the whole problem.

  • We're handing back a 10, plus a 9--

  • and if we keep going, plus an 8, plus a 7, plus a 6.

  • But we're not going to do this forever.

  • Even though I'm using sigma in my implementation of my answer, under what

  • circumstances am I not calling sigma?

  • AUDIENCE: If m equals 0.

  • DAVID MALAN: If m equals 0, or is even less than 0-- which shouldn't happen,

  • but just to be sure, I made sure it can't, with the less than or equal to.

  • So eventually, you're going to ask me, what is sigma of 0?

  • And I'm not going to be difficult about it, I'm just going to say 0.

  • And no longer do I keep passing the buck to that same function.

  • And so even though it takes a while to get to that point in the story--

  • because we say 10 plus sigma of 9, sigma of 9

  • is 9 plus sigma of 8, which is sigma of 8 plus sigma of 7--

  • like, it keeps going and going and going.

  • But if you kind of mentally buffer, so to speak,

  • much like a video in your browser, all of those numbers

  • that you're being handed back, one at a time--

  • which are, technically, being added together

  • for you by your program with the plus operator-- the last number you're

  • going to be handed back is zero, and at that point, all of the plus signs

  • can just kind of kick in and give you back

  • whatever number you're actually looking for.

  • So, recursion is the act of a function calling itself.

  • Which is very, very, very bad, unless you have a base case that ensures that

  • eventually, as you take bites out of the problem,

  • you will handle, with a special case, so to speak--

  • a base case-- a small piece of the puzzle, and just hand

  • back a hard-coded answer, to make sure that this

  • doesn't happen infinitely many times.

  • So, any questions on this principle, of a function being able to call itself?

  • Yeah.

  • AUDIENCE: So, the base case here was when m equals 0?

  • DAVID MALAN: When m equals 0 or is even less than zero, just to be sure.

  • But yes, when m equals zero.

  • Indeed.

  • So, let's see.

  • If you're comfortable, at least, with the fact-- oh, and actually,

  • there's a good little geek humor now--

  • if you go to Google.com, and suppose you wonder,

  • you're wondering what recursion is, especially a few hours from now.

  • Well, you can Google it, and then the computer scientists at Google--

  • there you go.

  • OK, so if you're laughing, you get it, which is great.

  • So that, then, is recursion.

  • Something giving you back an answer in terms of itself.

  • So, why is this useful?

  • Well, it turns out we can leverage this now

  • to solve a problem if we know that we can actually convert it to code.

  • We'll focus less on the actual implementation and more on the idea,

  • but let's see if we can't wrap our minds around the problem

  • to be solved with this code.

  • This is merge sort, in pseudo code.

  • And again, like all the pseudo code we've ever written,

  • you could write this in bunches of different ways.

  • Here's one such way.

  • Notice, the first thing, on input of n elements-- so, n numbers, n blue books,

  • n whatever-- go ahead and do the following.

  • If n is less than 2, return.

  • So it's a little different from the condition I had a moment ago,

  • but the context here is sorting, it's not summing.

  • So, why is it logically OK to say, if n is less than 2, just return?

  • Yeah, that's just itself.

  • If it's less than 2, that means there's only one blue book, or maybe even

  • 0, so in either case, there's no work to be done.

  • Just return.

  • The list is sorted, however short it is.

  • But if it's longer than that, you might have to do some work,

  • and actually do some actual sorting.

  • So, what happens then?

  • So, else-- you know what?

  • Sort the left half of the elements, and then sort

  • the right half of the elements, and then, OK, merge them together.

  • So it's the same kind of, like, blase attitude, where,

  • like, ah-- if you ask me to sort something, I'm just going to tell you,

  • well, you go sort the left, then you go sort the right,

  • and then we'll just merge the results back together.

  • And this is cyclical in the sense that, how

  • do you sort the left half of anything?

  • You need a sorting algorithm.

  • But this is the sorting algorithm.

  • So this is like saying, use merge sort to sort the left half,

  • use merge sort to sort the right half, and then merge them together.

  • Merging doesn't really need to be a fancy algorithm;

  • merging is like, if you've got one pile of numbers here

  • that are sorted, one pile of numbers here that's sorted,

  • you can just kind of eyeball them and grab the appropriate one to kind

  • of interleave them in the right order.

  • That's what we mean by merging.

  • So, how in the world is this even correct?

  • Because we haven't actually done any apparent work, in this way.

  • There's no loops, there's no comparisons, it seems.

  • It's just completely recursively defined, so to speak.

  • Well, let's see what actually this means.

  • And this is a sequence of visualizations that

  • can potentially fall off the story of.

  • So I'll try to go slowly, but not so slowly that the example itself

  • is boring.

  • We'll just go through this once, and then

  • again, the slides are online, if you kind of

  • want to step through the visualization.

  • So, here is a list of 8 numbers, the same 8 numbers,

  • that we looked at before.

  • I've drawn them contiguously, as though they are in an array.

  • This list is currently of size 8.

  • So an input of 8 elements is the beginning of this story.

  • What was the first step in our algorithm?

  • Well, we were going to check, if n is less than 2, return.

  • That is irrelevant, because n is 8, not less than 2.

  • So that's a moot point.

  • So, the first three things for me to do to sort this list

  • is to sort the left half, then sort the right half,

  • then to merge the sorted halves.

  • OK, so let's see how we get there.

  • So here's the list, here is the left half,

  • and I need to sort the left half, apparently.

  • How do I do that?

  • Well, how do you sort a list of four elements?

  • AUDIENCE: Break it up again?

  • DAVID MALAN: Break it up again.

  • Sort the left half, then its right half, then merge those two halves together.

  • So let me do that.

  • I'm going to draw a box around only the elements we're

  • thinking about at the moment.

  • So, let me look at the left half.

  • OK, now I need to sort this list.

  • How do I sort a list of size 2?

  • It's actually 2, it's not less than 2.

  • So I have to do some work.

  • So, how do you sort a list of size 2?

  • It's a little strange to say it, but--

  • sort the left half, then sort the right half, then merge the two.

  • And at this point in the story, you may very well

  • be lost, because we literally just keep saying the same thing,

  • and not actually doing any work.

  • But think of it like you're buffering these instructions.

  • Like, I've said to sort the left half, then the right half,

  • but you focused on just the left half for now.

  • But unfortunately, you got a little distracted,

  • because now to sort the left half, you have to sort the left half,

  • so you have to do a little more work.

  • So if you just kind of let this mental baggage build up,

  • we have to remember to go back through it.

  • But we've not actually done the real work yet.

  • We're about to now.

  • Because now that you've told me, given a list of size 2, sort the left half,

  • here's where we bottom out with that base case

  • and actually start to make some progress.

  • So here's 4 and 2, a list of size 2.

  • Let's sort the left half.

  • How do you sort a list of size 1?

  • You don't, right?

  • Because n is 1; 1, of course, is less than 2,

  • and what was the one instruction that we had at the top of this function

  • merge sort?

  • Just return.

  • Like, do nothing.

  • So, OK, everyone.

  • I have now sorted the number 4 for you.

  • Like, it's true, it's kind of a foolish statement,

  • but the magic must therefore come when we combine the results.

  • So, let's see where the story goes.

  • I've sorted the left half-- done.

  • Return.

  • Now, what was I supposed to do next?

  • Now I have to sort the right half of that list of size 2.

  • OK, done.

  • What's the third step at this point in the story?

  • Merge them.

  • So I'm now looking at a list of size 2 again, each of whose halves is sorted--

  • according to the crazy logic we're using here--

  • but now, something interesting happens.

  • I have on the left the number 4, I have on the right

  • the number 2, and each of these lists is of size 1.

  • And if you vary, in your mind's eye, or just visually, with my fingers,

  • consider, like, your left hand pointing at the first list,

  • your right hand pointing at the second list, the process of merging numbers

  • is just comparing what your fingers are pointing at and deciding

  • which one comes first.

  • Obviously 2 is going to come first, so in a moment,

  • we'll see that 2 should move over here, and then

  • there's nothing left for my right hand.

  • It's done.

  • So, 4 is obviously going to go here.

  • And that process of merging 2 followed by 4 is what we mean by merging.

  • It's pretty much what I was doing with insertion sort,

  • but here we're just doing it with individual elements at a time, kind

  • of weaving things together, or zipping things together,

  • like with a zipper, if you think of it that.

  • Way.

  • So, now, let me grab 2 and put it here.

  • Let me grab 4 and put it here.

  • OK.

  • So I sorted left half, I sorted right half, I merged them--

  • how do we unwind the story?

  • Where did we leave off?

  • AUDIENCE: Sort the right half.

  • DAVID MALAN: Now we have to sort the right half that was, like,

  • a minute ago in the story-- which, just to highlight it now, is the 7 and 5.

  • So now I have to do the same thing.

  • I'm sorting a list, of size 2, that happens to be on the right of the left.

  • So now, I sort the left half, done.

  • Sort the right half, done.

  • I now have to merge the two together.

  • So now my hands have to do some work, but I'll just do it from over here.

  • 5 goes down, then 7 goes down.

  • And at this point in the story, we have sorted the left half of the left half,

  • and the right half of the left half.

  • So, what point in the story are we at now?

  • Right.

  • We're-- now we have-- well, we did the right half just now.

  • We now have to merge the two halves together.

  • And, frankly, if you do this at home, if you want to kind of retrace your steps,

  • literally just write down a to-do list, like, from top to bottom

  • on the sheet of paper.

  • And then as you do something, cross it off, and go back

  • to the previous thing in the list, you would actually see, or feel,

  • even more what it was, that mental baggage

  • you were accumulating that you need to attend to.

  • But now I have two lists of size 2, so let's do the finger thing again here.

  • So, I start pointing at the left list, start pointing at the right list.

  • The first number to merge in is, presumably, going to be 2.

  • Then what comes after that?

  • I'm going to update my left finger, so now 1--

  • my left hand's pointing at the 4, at this point; my right hand, still

  • pointing at the 5, so which comes next?

  • 4.

  • There's no more work for my left hand, so it's probably

  • going to be pretty trivial--

  • 5 and 7.

  • But I do need to do the merging.

  • It looks merged already, but we have to do it.

  • And I'm going to do it in some new space, just as before.

  • So, 2 and 4 and 5 and 7.

  • And now you can really see it for the first time.

  • The left half of the original list is finally sorted.

  • Unfortunately, like three minutes ago is when we started the story.

  • And now we need to unwind, in our mind, to go back to the original right half.

  • So if you think about it now, even though I've said a lot of words,

  • this is technically the second step in our algorithm.

  • Or at least the first invocation thereof.

  • All right, so we'll do it a little faster, but same idea.

  • Sort the left half.

  • How do I do that?

  • Sort the left half, then the right half, which are stupidly easy and dumb,

  • but now I have to merge 6 and 8.

  • So, merging in this case didn't have much effect,

  • but it needed to be done to be sure.

  • Next, let's sort the right half of the right half.

  • Now I'm going to sort the left, sort the right.

  • Now the key step is merging.

  • And now we're doing some actual work.

  • And now we really have some work to be done--

  • now we have to sort the left half and the right half of the original right

  • half.

  • So it's 1, then 3, then 6, then 8.

  • Now we're finally, almost at the end.

  • Now what do we do with these?

  • Now we have two halves, the original left and the original right,

  • and you can think of the fingers as doing the work again.

  • 1 is going to go here, 2 is going to go here, 3 is going to go here, then 4--

  • and I constantly compare where my fingers are pointing,

  • but my fingers are constantly moving from left to right.

  • As soon as I deal with a number, it advances to the next number

  • in the list.

  • So it's obviously going to be, now, 1, 2, 3, 4, 5, 6.

  • But notice, if you imagine my fingers doing this work,

  • they're constantly moving toward the right, to the end of the list.

  • So, as soon as my fingers hit the ends of those lists, I must be done merging.

  • And voila.

  • We've now sorted the elements.

  • It's a huge number of words, and it would be a nightmare

  • to kind of do it with humans, because there's just so much going on,

  • and you have to remember, or buffer, so many of those steps.

  • But in the end, we've done something that is kind of captured even

  • by this picture.

  • So it turns out that merge sort, even though it sounds like a long story,

  • is fundamentally faster, and it's fundamentally faster because we're

  • dividing the problem in half, as we have been doing with binary search,

  • in the phone book example even days ago.

  • So if we look on the screen, you can kind of see the remnants of work

  • that we've done.

  • Like, how many times did we move the elements, from one row to another?

  • They started up here, then they eventually made their way here,

  • and then here, and then here.

  • So that's one, two, three movements of the letters, or of the numbers,

  • in memory, if you will.

  • So if you imagine each of these rows as just a different chunk

  • of memory and RAM, I'm just moving things around in memory.

  • So, three is just a number.

  • But it turns out, and if we did more general cases of this,

  • turns out that log base 2 of n, where n is 8--

  • 8 is the number of elements we started with-- log base 2 of 8 is 3.

  • And so indeed-- and if you'll take on faith for now,

  • so that we don't have to go through an even bigger example,

  • to show it even more--

  • the number of times we move the numbers is going to equal,

  • turns out, log base 2 of n.

  • Which, in this case, happens to be 3.

  • And so that, then, invites the question-- on each

  • of the rows, every time you move these numbers into a new location in memory,

  • how many times are you touching that number while it's in that position?

  • Or, how many times, equivalently, are you looking

  • at it, to do something about it?

  • What do I mean by this?

  • Well, the movement from top to bottom was happening anytime

  • we did the merging.

  • Right?

  • We would move the numbers from here to here.

  • But as soon as we did that, we had to do some work, with the left pointer

  • and right pointer.

  • I needed to then merge those together.

  • And I emphasized earlier that anytime I'm comparing numbers,

  • my left hand and right hand are constantly

  • advancing from left and right.

  • I never double back.

  • Much like I constantly was doubling back with bubble sort, insertion sort,

  • selection sort-- there was so much damn comparison going on,

  • it felt like a lot of work, and it physically was.

  • But here, you know, merging, I'm moving things around,

  • but my hands are constantly moving forward, looking at, on each row,

  • n numbers total.

  • My left hand or right hand pointed at each of the numbers once.

  • Never doubled back.

  • So, it was never n plus 1, or 2 n, it was just n.

  • So, we have log n movements of the numbers, in memory.

  • And every time we do that, we merge them from left to right, effectively

  • touching each number once.

  • So we're doing n things log n times.

  • And so, that would be mathematically equal to n log n.

  • So, again, even if you're not super comfy with logarithms,

  • you do know, from our picture, with the straight lines and the curved line,

  • that which is smaller?

  • Log of n, or n, generally speaking?

  • AUDIENCE: Log of n.

  • DAVID MALAN: Like, log of n is smaller, right?

  • That's why the green line was lower, and it was also curved.

  • It was below the linear line n.

  • So, generally speaking, the bigger n gets, the more slowly log n grows.

  • And again, if you just take on faith that this

  • is a mathematical expression that communicates the time required

  • to do something, it's less.

  • So, which, therefore, is smaller?

  • N squared, which of course is n times n?

  • Or n log n?

  • AUDIENCE: N log n.

  • DAVID MALAN: N log n.

  • So, we've now found an algorithm that's unlike all of the others we've seen.

  • And even though it took a while to explain, and even though, frankly,

  • you might have to kind of sift through it again to really wrap your mind

  • around it-- it took me a while, too--

  • it is fundamentally faster as well.

  • So, just to take one other stab at this, let me show one other perspective.

  • At least if you're more mathematically comfortable it after today,

  • if you're worried that this is way more math than you were expecting,

  • realize we very quickly abstract away from these details,

  • and we start to wave our hands using big 0 and big omega.

  • Let's consider how we could look at this a different way.

  • If the picture wasn't really working for you,

  • let's see if we can just, like, jot down how many steps each

  • of these lines of code is.

  • And there's not many lines of code here, so it

  • shouldn't be a very long expression.

  • So, how long does it take to decide if n is less than 2?

  • And, if so, return?

  • Well, you're past a bunch of numbers, so, you know,

  • I'm going to call it constant time.

  • Like, you know how many numbers you've been handed--

  • nope, it's not less than 2, or yes, it is.

  • You're just answering yes or no.

  • Constant time.

  • Big O of one.

  • All right?

  • So I'm going to describe that as this.

  • This is the formal way of saying it.

  • T of n, which is just how much time does it take, given a problem of size n--

  • just a fancy way of saying that.

  • It's on the order of one step.

  • Maybe it's two, maybe it's three, because you kind of

  • got to look at something.

  • But it's a fixed number of steps to decide,

  • there are fewer than n elements in front of me.

  • It's not going to take you very long.

  • So, that piece of the puzzle takes big O of one step.

  • So now, we have three other questions to ask.

  • That's, like, kind of a throwaway.

  • That's really quick, if it's just one step, or two steps, or three steps.

  • So, are these the expensive things?

  • Well, let's see.

  • Sort the left half of elements.

  • Well, here, too, I can be kind of clever and propose the following.

  • You know what?

  • The amount of time required to sort n elements

  • is technically equal to the amount of time

  • it takes to sort half of those elements, plus the amount of time required

  • to sort the other half of those elements, plus, to be fair,

  • some merging time.

  • And it's essentially n, but I'm going to generalize it as big O of n,

  • because I did have to move my hands around.

  • But again, the key thing was, my hands were constantly moving to the right.

  • There was no looping back and again, and again, like with the other algorithms.

  • So it's, like, n steps to do the merging.

  • If I've got 4 numbers here, 4 numbers here,

  • I have to touch a total of 8 elements.

  • 8 is n, so it feels like, yes, on the order of n steps to do the merging.

  • Unfortunately, this is like a recursive answer

  • to the question of how efficient is merge sort.

  • But that's kind of consistent with the fact

  • that merge sort is being implemented recursively in this code.

  • And it turns out here, too, if you've got

  • one of those old-school textbooks that's got a cheat sheet in the front

  • or the back of your physics or your math book,

  • this is a series that you can actually-- that mathematicians know actually

  • sum up to something known, which is n times log n.

  • And we won't go into the weeds of why that is, mathematically,

  • but if you take a problem of size n, and add the running

  • time for first half, second half, and then add an n,

  • this is what, mathematically, it ends up being.

  • And so, if you're more comfortable with that,

  • realize that this derives from just counting up of those several steps.

  • And ultimately, this is much better than that.

  • And in fact, we can kind of feel this here.

  • You'll be able to feel it even better with other data sets,

  • but let me go ahead and reload here, and go ahead, at the same speed

  • as we were before, choosing merge sort.

  • Let me fit it onto the screen at once.

  • Actually, we should speed it up to the original.

  • So, what is it doing?

  • It's using a bit of extra memory, just as we were on the screen,

  • using some additional space.

  • But notice, as it does that work, it's kind of moving things back and forth.

  • And it's actually saving space.

  • Even though I used log n amount of memory by keep moving it,

  • this was stupid.

  • Like, I didn't need to keep using more memory, more memory, more

  • memory, because I wasn't using the stuff anymore above.

  • So with merge sort, you really just need twice as much memory

  • as those other algorithms, because the first time you need to move them,

  • move them here.

  • And then, even though I did it visually, deliberately

  • to move it to yet another location, just keep moving things back

  • and forth as needed.

  • And that's what's happening with that algorithm there.

  • It's not quite as clear to see with this visualization, so let

  • me open up this other one here.

  • Now, go.

  • And you'll see merge sort all the way on the right--

  • done.

  • All right, so, insertion sort got a little lucky here,

  • just because of the order of the elements, and the size of the dataset

  • isn't that big, which is why I wanted to show the other one.

  • But if we do it once more, you'll see, again, that merge sort

  • is pretty darn quick.

  • And you can see it doing things in halves.

  • And selection sort, and bubble sort, are still doing their thing.

  • And if we did this using not, like, what is that--

  • 10, 20, bars total, but 100 bars?

  • Or a million bars?

  • You would really, really feel the difference,

  • just as we did with the phone book example as well.

  • Any questions there on that?

  • And we won't walk through the code, but if you're

  • curious to actually see how some of these ideas map to C code,

  • you will find, in the CS50 appliance, in the source code from today,

  • a couple of files-- binary zero and binary one in linear.c, all of which

  • implement binary search and linear search in a couple of different ways,

  • if you actually want to see how those map.

  • But what we thought we would do, in our remaining time today,

  • is tee up one of the next algorithmic challenges.

  • It turns out that there are wonderful opportunities in computer science

  • to intersect with other fields-- among them

  • the arts, and very specifically, music.

  • And it turns out that music, whether you're an audiophile or even

  • a musical theoretician, there are relationships

  • among the sounds that we hear and the rate at which we hear those notes.

  • Which is to say, they follow patterns.

  • And these patterns can be produced by computers,

  • they can be generated by computers, and what we'll do,

  • ultimately, in problem set three, in fact,

  • is introduce you to a bit of the musical world,

  • whether you have some prior experience or not.

  • And Brian, if you wouldn't mind coming up just to assist with this teaser.

  • Here are just a few keys from a keyboard,

  • and here are 88 keys from an actual keyboard.

  • And Brian will help us, in just a moment, with some of these definitions.

  • But you'll see here that there are eight keys, or one, two, three, four, five,

  • six, seven white keys and five black keys on the board.

  • And it turns out that mankind--

  • at least, in Western music, years ago-- decided

  • to standardize on how we describe these keys.

  • And we assigned letters to them.

  • And you might have heard of middle C, even if you've never

  • played a piano before, and you might think of that as being the leftmost key

  • there.

  • And then it's D, E, F, G, and then A, B. And of course, on a real piano,

  • there's keys to the left, and there's keys to the right.

  • Do you want to play what C might sound like here?

  • [PIANO PLAYS]

  • So, that's C. And then, if you want to-- very well done.

  • [APPLAUSE]

  • Do you want to go all the way up through the scale to B?

  • [PIANO PLAYS]

  • That's kind of unresolved, too, because what should have come next--

  • [PIANO PLAYS]

  • That would be another C. And so what Brian's played for us is

  • a full octave, now, referring to eight.

  • So, C to C inclusive, in this case.

  • And those of us who are kind of are familiar with music,

  • or like listening to certain music, you'll

  • notice that certain things sound good.

  • And there's actually mathematical and formulaic, or algorithmic, reasons

  • that some of these sounds sound actually good.

  • But what about these black keys?

  • They actually can be defined in a couple of different ways.

  • And if you've ever heard of flats, or sharps--

  • Brian, do you want to explain what the relationship now

  • is among the white keys and the black keys, and how they sound different?

  • BRIAN: Yeah, sure.

  • So, a bit of terminology first.

  • A semi-tone is just the distance from one note to the note

  • immediately after that, both white and black notes included.

  • And all it means for something to be sharp,

  • represented by the hashtag or pound sign up there, is take a note

  • and move it up by one semi-tone.

  • So, if we start with C, and make that note sharp, to C sharp,

  • we move one semi-tone to the note immediately

  • after it, which is that black note in between C and D. So, that's C sharp.

  • And likewise, if we add E sharp, that is one semi-tone, or the note immediately

  • after, E, which in this case, is the same thing as F. So,

  • F and E sharp are the same note.

  • And in the meantime, flat is just the opposite of that.

  • If sharp means move up one semi-tone, flat means move down one semi-tone.

  • So if I have E, E flat is one semi-tone moving left on the piano keyboard.

  • DAVID MALAN: And so even though a typical piano keyboard wouldn't

  • be labeled as such, it does follow a pattern,

  • and it does repeat, to the left and to the right as well.

  • And so as you learn to play piano, you learn what these notes sound like,

  • you learn where these keys are, and you also

  • learn, ultimately, how to read music, which might look like this.

  • This is a familiar song, now officially in the public domain.

  • And you'll see here that there are these little shapes called notes,

  • or little circles, that happen to be on specific lines.

  • And it turns out that if a note is on one line,

  • it might represent the note A; if it's on a different line,

  • higher above or down below, it might represent B or C or D or E or F or G.

  • And if there is a sharp symbol, or a flat symbol, in front of it,

  • that might shift it ever so slightly, so that you're actually

  • touching, in many cases, one of the black keys as well.

  • Which is to say that once you have the vocabulary,

  • and you know what the alphabet is to which you have access,

  • can you start to write it out, much like we write computer programs.

  • But this is what a musician would actually see.

  • And just to give us maybe a teaser of what you can actually

  • do when you take into account the different sounds of notes,

  • and the different pace at which you play notes,

  • can you give us a little something more than just a scale?

  • BRIAN: Sure.

  • [PIANO PLAYS]

  • [APPLAUSE]

  • DAVID MALAN: So, if you're a little worried what

  • we're getting into, not only computer science and programming,

  • but now music--

  • I am absolutely among the least comfortable with this,

  • and this is why Brian has kindly joined us here today.

  • But it'll be fun, we hope, ultimately, to explore

  • these relationships, and also the intersection of one field with another.

  • And to now tie these topics together, we thought

  • we'd end by looking at a short visualization

  • here, about a minute's worth of computer-generated sounds, that

  • give you not just a visual feel of some of the algorithms

  • and others that we've looked at today, but also associate sounds

  • with the operations of moving things, swapping things,

  • and ultimately touching bigger and smaller numbers digitally.

  • So, here we have, up first, insertion sort.

  • [COMPUTER SOUNDS]

  • Again, it's inserting into the right place the number.

  • This, now, is bubble sort.

  • And again, you can both see, and now kind of feel, all of the swaps

  • that it's doing.

  • We will get the satisfaction this time.

  • This, now, is selection sort, whereby you

  • go through the list selecting the next smallest element,

  • and plop it in its place.

  • And the bars are getting higher and higher, just like the notes,

  • or the frequencies.

  • This, now, is merge sort.

  • And notice the halves that are developing.

  • And this is by far the most gratifying sound, at the end of this one.

  • This is gnome sort, which we didn't look at,

  • but very distinctly has a different shape, too.

  • It's not quite as ordered as the others.

  • And that, then, are algorithms.

  • And Brian, play us out for today.

  • Otherwise, we will see you next week.

[MUSIC PLAYING]

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it