Subtitles section Play video Print subtitles [MUSIC PLAYING] DAVID MALAN: All right. This is CS50, and this is lecture three. And today, the goal is to be a lot more algorithmic than code-oriented, and to really kind of come back to where we started in lecture zero, when we talked about computational thinking, and algorithms. Because now you have a few weeks under your belt, and you have a few new tools, and hopefully skills, under your belt, even though odds are, you're still getting comfortable with some of those skills. And so let's look, then, at this piece of hardware that we talked about a couple times. And this is an example of what? AUDIENCE: Memory. DAVID MALAN: Yeah, memory, or RAM, random access memory. And this happens to be pretty big on the screen, but it's actually pretty small, if you were to find it in your Mac, or PC, in your laptop, or some other device. And this is just memory. And so, memory is places you can store information-- numbers and characters and strings, and bigger things still. And recall that if you want to use this memory, you need to address it. You need to put something in particular locations. And we're going to start doing this all the more So in particular, there's at least, like, five-- four black chips on here, which is actually where the zeros and ones are stored, in the green circuit board. And little gold connectors you see on the screen, too-- that just allows all of those black chips to intercommunicate, ultimately, and for the zeros and ones to move around. But if we zoom in on just one of these chips of memory, arbitrarily, and then zoom in here, you can think of this, wonderfully enough, as a rectangle. Could be any shape, certainly, but if we think about it as a rectangle, we can divide this chip of memory up into a grid, like this. Completely arbitrary how I've drawn this, but the fact, now, that I have a whole bunch of small boxes-- you can start to think of this as being one byte, and then another byte, and then another byte. And I coincidentally wrote eight bytes across, but it could be any number. It doesn't really matter. And so, you can start to think about these bytes as having specific locations, or numbers, or addresses, as we're going to start calling them. So if I zoom in on the top of this, you could start to think of the top left as being byte number 0, because computer scientists generally start counting at zero, and then byte 1, and 2, and 3, and 4, and dot dot dot. And then, just as a refresher, in a typical Mac or PC these days, how much RAM, or memory, would you typically have? AUDIENCE: Four? DAVID MALAN: Four what? AUDIENCE: Four gigabytes. DAVID MALAN: Four gigabytes. And giga means billion, so that's four billion bytes. And maybe you have a little less, maybe you have a little more, but on the order of billions of bytes. So we're only scratching the surface of just how much storage there is, and how many numbers and characters you can store. But the point is that we can address them, in some way-- zero, one, two, and so forth. So this is how we stored Stelios' name in memory. Recall, this was an example last time, where we needed to put the characters of his name somewhere. And so we actually put the S, and then the T, and then so forth, starting at the leftmost location on forward. But we also put something else. It's not sufficient just to put the letters of his name in memory. What else did we have to do? AUDIENCE: Put a zero in there? DAVID MALAN: Yeah, the zero command, or more specifically, the zero byte, or the so-called null byte-- N-U-L. And it's represented as backslash 0. Because if you were just to type a normal zero, that would technically be a character from your keyboard. It looks like a number, but it's technically a character, a char. And so backslash zero literally means 8-0 bits in that byte's location. OK. So now that we have this ability to represent Stelios with a sequence of characters, null-terminated at the end, we don't really need to worry about the hardware anymore. Right? We can abstract away from that just as we did in week zero, and not worry about how those zeros and ones are stored. We can just trust that they are, somehow. And so we can abstract away, and just start thinking about it as a sequence, or a grid, of letters like this. But that backslash zero is important, because it is the only way that this language we're currently using, C, knows where strings end. If you didn't have that, it would-- printf, for instance, might just keep printing all of the contents, all four gigabytes, of memory that your computer has, no matter what those characters actually are. And then, of course, you couldn't distinguish Stelios' name from Maria, or someone else altogether in memory. So, let me go ahead and open up the IDE, just to demonstrate now what you can do with this kind of knowledge. Suppose I wanted to write a program that allows me to let a user type in his or her name, and then I just want to print out his or her initials. And for simplicity, I'm going to assume that the person's initials, or whatever the capitalized letters are, in the input they type. So, I have to be a little nit-picky, and type in my name properly. But with that said, let me go ahead and whip this up. So, I'm going to save this as initials.c. Just out of habit, I'm going to start by including the CS50 library, because I'm going to want to get a string from the user. I'm going to go ahead and include standard io.h, so that I can print something to the screen. And we'll decide later if we need anything more. I don't need any command line arguments for the purpose of this program, so I'm going to go back to our old version of main, where you just specify void-- no argc, no argv. And then here, let me go ahead and declare a string called s, and get a name from the user, as with, "name," and get string. And then, what do I want to do after this? I want to go ahead and iterate over the characters in the user's name, and print them out only if they're uppercase. But you know what, rather than just print them out, you know what I want to do? I actually want to create a new string, myself, containing the user's initials. So, I don't want to go ahead and just print out, with percent c, one character after the other-- rather, I want to go ahead and store the user's actual initials in a new string, so that I've got one string for their name, and one string for their initials. And, ladies and gentlemen, Ian. SPEAKER 2: Sorry for the video glitches. DAVID MALAN: Thanks. All right. So, to be ever more clear, let me go ahead and rename this to name, literally, and then I want to have a string for their initials. But we know what a string is, as of last time. It's really just a sequence of characters. And a sequence really has another name in programming. What is another synonym we've used for a sequence of something? AUDIENCE: [INAUDIBLE] DAVID MALAN: An array. An array is that data structure we had when we started using square bracket notation. So, if I actually kind of roll this back and break the abstraction, if you will-- don't think about it as a string. What is a string? It's a sequence of characters. Technically, I could say char, and then I could say initials, and then I can specify however many letters I want to support in a human's initials. And by-- assuming we have a first name, maybe a middle name, and a last name, three characters should do it. Three characters. So, David J. Malan, DJM, three characters. Is that enough chars? AUDIENCE: [INAUDIBLE] DAVID MALAN: I'm hesitating, because it doesn't-- AUDIENCE: [INAUDIBLE] DAVID MALAN: It's not, but why? AUDIENCE: You need for the null character. DAVID MALAN: Yeah. So if we want to terminate even my initials-- which isn't technically a word, but it's certainly a string, it's a sequence of characters-- we need a fourth character, or we need to anticipate a fourth character, so that whatever we put in the computer's memory is technically, in my case-- like, D-J-M backslash 0. You need that fourth byte. Otherwise you do not have room to actually terminate the string. So, now, even though this doesn't look like a string, insofar as I'm not saying the word string, it really is. It's a sequence of characters of size four that I can put characters in. Now, what are the characters in this array, by default, have we said? When you just declare a variable of some size, what values go in it? AUDIENCE: [INAUDIBLE] DAVID MALAN: Sometimes zeros, but generally, what's the better rule of thumb? AUDIENCE: You don't know. DAVID MALAN: Yeah, you don't know. It's so-called garbage values. Nothing-- you should not trust the value of a variable, generally speaking, unless you yourself have put the value there, as by storing it, with the assignment operator, or by manually typing it in yourself. So, just to be clear, if I wanted this program to be kind of useless for anyone except myself, I could actually do this-- I could go ahead and do-- initials, bracket 0, get "d", initials, bracket 1, get "j", and then finally initials, bracket 2, get "m". And then lastly, and this is the thing you might forget sometimes, you actually need to do the backslash zero there. But of course, this is not at all dynamic. But I have, in this these lines of code now, created a new string called initials. It's of length-- it's of human length three, DJM, but the computer is actually using 4 bytes to store it. But this is not the point of the exercise, because we already asked the user for his or her name. I need to now figure what that is. So just logically, or algorithmically, if you will, what process could we use, given a name like David J. Malan, or Brian Yu, or anyone else's name-- how could we look at that input and figure out what the user's initials are? What's the thought process? Let me go a little farther back. So, David J. Malan, or any other name. What's the process? What do you think? AUDIENCE: [INAUDIBLE] DAVID MALAN: OK, good! So, iterate with a for loop over the letters in the name-- and you've done that before, or in the process of doing that, for something like caesar or vigenere, most likely. And then you can use something like is upper, or you can even do asciimath, like we did a while ago, to actually determine, is it in the range of A to Z capitals on both ends? So we have a couple of options. So, let me try to convert that to code. Let me get rid of the hard-coded name, which is just kind of nonsensical, but demonstrative of how we could store arbitrary characters. And now let me do this. For int i get 0, i is less than the string length of name, i plus plus-- and then I'm going to do something like this. If the i-th character character in name is an uppercase letter-- and I'm going to use a function that you might not have used yourself, but recall that it does exist-- is upper will return, essentially, true or false, this is an uppercase letter-- so, if this is uppercase, I'm going to go ahead and do what? Well, the story is not quite complete. It's not enough to just iterate over the names and-- the letters in the name-- we now need to decide where to put the first capitalized letter that we find. It's obviously going to go in the initials array, but what more do I need to add to my code to know where, in this block of four characters, to put the first character, D, in the case of my name? Yeah. AUDIENCE: [INAUDIBLE] initials i? DAVID MALAN: Initials i. OK, so if I do that-- good thought. So let's do, initials, bracket i, gets name bracket i-- that would seem to put the i-th character of name at the -th location in initials, which at the moment is perfectly correct, because i is 0. And if I typed in David J. Malan, D is at the zeroth location. So, we're good. But there's going to be a bug here. AUDIENCE: [INAUDIBLE] continue to the name, then you'll have less slots. DAVID MALAN: Exactly. I is going to continue marching forward, one character at a time, through the entire name of the user, but you only want to index-- you don't want to keep doing that same amount in the initials array, because again, the initials is much smaller. So even as i moves through the user's name, you want to take baby steps, so to speak, through the initials. So, how can we solve this? I can't use i, it would seem. AUDIENCE: You could use a variable that's like a [INAUDIBLE] DAVID MALAN: OK, great. Let's do that. We need another variable. So, I could put this in a few different places, but I'm going to go ahead and put it here for now. So, int counter gets zero-- I'm just initializing my counter to zero-- and then as soon as I won't find an uppercase letter, I think I want to do this? Put it at whatever the value of counter is? And then there's one more step. What do I need to do once I, yeah, put it at the counter location? AUDIENCE: Increment counter by one. DAVID MALAN: Exactly. Increment counter by one. So, I can do this in a few ways. I can do it very literally-- counter gets counter plus one, semi-colon. It's a little annoying to type. You could do plus equals one, which is slightly more succinct. Or, the ever-prettier, plus plus, which achieves the same result. Fewer characters, same exact result, in this case. OK, so now, I think we have a for loop that iterates over all the letters in the name. If it's an uppercase letter, it stores that letter, and only that letter, in the initials array, and then increments counter so that the next letter is going to go at the next location in the initials array. So, if all that seems to be correct-- ultimately, I want to do this-- I want to go ahead and print out percent s, backslash n, initials. I want to just print the user's initials. But there's one key step in this blank line that I should not forget. What's the very last thing I need to do? Yeah. AUDIENCE: You need to print a null character [INAUDIBLE] put a null character at the end of the [INAUDIBLE].. DAVID MALAN: Exactly. I need to put a null character at the end of the array. So, how do I do that? Well, I have the syntax, and I think-- you know, I want to say, end of array-- but how can I do that? What values should I put here? Yeah. DAVID MALAN: The string length name? DAVID MALAN: Yeah, I could do strlen of name, well, not of name-- AUDIENCE: Of the initials. DAVID MALAN: The initials, but now, you kind of got me in a circular argument, because I'm trying to-- it's kind of a catch-22 now. I am trying to store a backslash n at the end of the string. But recall from last time, the way strlen knows where the end of the string is, is by putting the backslash 0. So, it's not there yet. So we can't use strlen. But, we already have, I think, the solution-- AUDIENCE: Can't you just put initials four? [INAUDIBLE] DAVID MALAN: OK. So, we could absolutely do that, or almost that. It's not quite four. One tweak here, yeah? AUDIENCE: [INAUDIBLE] DAVID MALAN: In back, yeah? AUDIENCE: [INAUDIBLE] DAVID MALAN: OK, good. So, actually-- so, counter-- is, yeah. Spoiler, counter is going to be the answer. But let me just fix this one bug here. Four was the right intuition, but remember, if you have four characters possible, it's 0, 1, 2, 3, is the very last one. So, we needed to fix that to 3. But even more general would be to put counter, because the value of counter is literally the length of the string we just read into the initials. And so if we want to terminate that string, we already know how many characters we've counted up. And in fact, it would technically be wrong to just blindly put backslash zero only at the very end of these four characters. Why? In what situation would that be-- logic be incorrect? Yeah? AUDIENCE: If someone has more than three initials. DAVID MALAN: If they have more than three initials, we really have a problem, because we don't have space for anything beyond three actual letters and a null terminator. And there's another problem, potentially. Yeah? AUDIENCE: If they don't have a middle name? DAVID MALAN: Yeah, if they don't have a middle name, there's going to be, only maybe two letters, first name and last name. And so, you're putting the backslash zero at the end of your array, which is good. But what's going to be that third value, that second to last value, in the array? AUDIENCE: A garbage value. It's just garbage, so to speak. And so it could be some funky character, you might get weird symbols on the screen-- you don't know, because it's just incorrect. The backslash zero has to go at the very end of the string. So, let me go ahead, and if I've made no syntax errors here-- let me go ahead now and save this, and go ahead and do make initials-- OK, so, hmm. Implicitly declaring library function strlen with type unsigned long. So, there's a lot of words, only some of which I kind of recognize immediately. I could run this through help50, which should be your first instinct, on your own or in office hours. But let's see if we can't tease apart what the error actually is. What did I forget to do? Yeah. AUDIENCE: Uh, [INAUDIBLE] DAVID MALAN: Yeah, I needed the string library, which now I'll start to get in the habit of more. Anytime I'm using strlen, just try to remember to use string, otherwise you'll see that error message again and again. Maybe there's more errors, but I'm not sure, so I'm going to go ahead and just recompile, because again, you might have more errors on the screen than you actually have in reality. But there is indeed one more. So, similar error, but different function. Implicit declaration of function is upper. So that's, again, same mistake. I've forgotten something. Anyone recall? This one's a little more-- less common, I would say. Yeah. AUDIENCE: You need the character type library. DAVID MALAN: Yeah, the character type, or C type, library. So again, you'd only know this by checking the [? man ?] page, so to speak, or reference.cs50.net, or checking your notes, or whatnot, or Google. And so that's fine. It happens to be in Ctype.h. Now, let me go ahead and recompile. Seems to work, so, initials-- let me go ahead now and type David J. Malan, enter-- seems to work. Let me try a corner case, like Rob Bowden, without his middle name. RB. And so, it seems to work. This is not a rigorous testing process, but let's trust that we did at least get the fundamentals right. But the key here is that we broke this abstraction, so to speak, of what a string is. Because if you understand that, well, a string is just a sequence of characters, and hey, a string must end with a backslash zero, you can do that yourself. And this is what you can do in C. You can put characters anywhere you want in memory, you can put numbers anywhere you want in memory. And this ultimately gives you a lot of power and flexibility, but also, as you'll soon see, a lot of opportunities for bugs, with this power. All right, any questions on that particular example? Yeah. AUDIENCE: Can you explain the initials counter? DAVID MALAN: Sure. AUDIENCE: [INAUDIBLE] DAVID MALAN: Sure. Let's explain the initials counter, which is used here, and is declared up here. So, we essentially want to keep track of two locations. If my name is sort of of this length, I'm going to start at the first character, D, and with i, iterate from D-A-V-I-D, space, and so forth. So, that one just kind of marches on, one step at a time. The initials array, meanwhile, is another chunk of memory. It's of size four, somewhere else in that green silicon chip, with all those black chips on it. And we want to move through it more slowly, because we only want to move to the next location in the initials array once we've put a capital letter in it. And that's going to be less frequent, presumably, unless the user typed in all caps. So, in order to achieve that, we need a second variable. And you proposed that we use a variable called counter. But we could have called it j, or something else. And so, the counter is initialized to zero, and it is incremented here any time we encounter a capital letter. So, it has the effect of literally counting the capital letters in someone's name: D, J, M. And so it should be 3, at the end of those loops. And that's perfect, because as we realized earlier, 3 happens to be the correct location for where you put the backslash 0, even though it wants to go in the fourth location, the address of it, or the location is technically 3. So, we sort of solve multiple problems at once, in this way. Good question. Other questions? Other questions? All right. So. With that said, let's not worry as much about that low-level kind of implementation, and consider what more we can do with these things called arrays. If we start to get rid of the addresses, and just know that we have sequence of characters, or anything else in memory, and really, at the end of the day, we have the ability to lay things out contiguously in memory. Back to back to back to back, like this. So, here are, then, eight boxes, inside of which we can put anything. But we've kind of been cheating as humans for some time. When we had Stelios' name on the screen, all of us in this room just kind of glance up, and you kind of absorb his name all in one fell swoop. But it turns out that computers are not quite as intuitive, or as all-seeing as we are, where we can sort of take in the whole room, visually. A computer can only look at one thing at a time. And so, a better metaphor than a box like this for the locations that you have in your computer's memory would really be like eight lockers in a school, where a computer, in order to look at the value in any of those boxes, actually has to do a bit of work and open the door to see what's inside. And you cannot, therefore, see all eight locations simultaneously. So, this is going to have very real world implications, because now, if we want to start checking the length of a string, or counting the number of things in an array, or moving things around, we're going to have to do one thing at a time. Whereas we humans might just kind of look at it and say, oh, sort these numbers in some intuitive way. And that actually is a good segue to a very different problem, which is that of sorting values. So, for this, we have, say, the equivalent of seven lockers, up here now. And behind these doors, so to speak, is a whole bunch of numbers. So we'll transition away from characters and strings, and just generalize it to numbers, because they're convenient to work with, and we have so many of them at our disposal. But I'd like to find one number in particular. So, suppose that a computer were storing an array of numbers, a whole bunch of integers, back to back to back to back. Here's what they might look like. The doors are closed, though, so we need an algorithm for finding a number, because a computer can't just look at it and say, there's your number there. The computer has to be more methodical, probably going from left to right, from right to left, starting in the middle, randomly opening them. We need an algorithm. So, for that-- could we get one brave volunteer? OK, come on down. What's your name? CHRISSY: Chrissy. DAVID MALAN: Kristen? CHRISSY: Chrissy. DAVID MALAN: Chrissy. Come on down, over this way. All right, so Chrissy, all you know is that there are seven doors-- nice to meet you-- here on the screen. And using your finger, you should be able to just touch a door to open, or unlock it, and we'll see what it is. And I would like you to find the number 50. Dammit. [LAUGHTER] Very good! [APPLAUSE] I feel-- we need a better prize than a stress ball for that. But very nicely done. And let me ask you, if we can-- if you don't mind me putting the mic in your hand-- here you go. So, what was your amazing algorithm for finding 50? CHRISSY: I just clicked on it. DAVID MALAN: OK, that's great. CHRISSY: It looks nice. DAVID MALAN: OK. So, you-- OK. So, good. So, wonderfully effective. Let me go ahead and reveal-- actually, let's do this. So, here's where you could have gone wrong any number of places. And let me ask the audience before we try one other example. What strikes you about these numbers? Anything in particular? AUDIENCE: They're unordered. DAVID MALAN: They're all in order? AUDIENCE: Unordered. DAVID MALAN: Unordered. Kind of random. Although, the very astute might notice something. Or, someone who watches too much TV. Yeah. AUDIENCE: They're the numbers from Lost. DAVID MALAN: Yes, thank you. The two of us watch a lot of Lost. So-- plus we had a seventh number, so we added ours. So, they are actually in random order. I just kind of shuffled them arbitrarily. They're not sorted from smallest to biggest, they're not sorted from biggest to smallest. There's really no pattern, because I really just did shuffle them up. And that's problematic, because even though Chrissy got lucky, finding 50-- was just so good at finding 50-- it might be hard to replicate that algorithm and find it every time, unless you know a little something about the numbers behind the doors. And so, we do know that in this next example. So, in this next example, there are still seven doors, and still the same seven numbers, but now they're sorted. And knowing that, does that change your approach? CHRISSY: Uh, well, I guess if they're sorted, like, lowest to highest, then it would, because I would know it's closer to the end. But if I don't know how it's sorted, then-- I guess I wouldn't really know. DAVID MALAN: OK, good. So let me stipulate they're sorted from smallest to biggest, and let me propose that you, again, find us the number 50. Yeah, this is not working out very well. That is very well done. OK, so, congratulations. [APPLAUSE] So, you'll see-- thank you. So, you'll see just how much more efficient her second algorithm was, when she leveraged that information. But in all seriousness, you could do better, especially for very large datasets, if you know something about the data. If you know that these numbers are sorted, you could, as Chrissy did very intuitively and very correctly, go to the very end, knowing that that's the biggest number, it's probably all the way on the right. If she didn't know that, and just knew that the numbers were sorted, and did not know if 50 was a small number, a medium number, the largest number-- just it was a number behind doors. What would a smart strategy be, given that lesser information? AUDIENCE: [INAUDIBLE] halfway, and then if it's greater, you move it to the right, if it's less, move it left. DAVID MALAN: Yeah, we can try that same divide and conquer approach we did in the very first class, with the phone book, right? Where we looked roughly in the middle, because we know that the Ms, give or take, would be in the middle. And then if the-- we're looking for Mike Smith, whose name starts with an S, we know that he would be to the right, and so we would go to the right, and then to divide the problem-- [LAUGHTER] Today is not going very well. So, we would divide and conquer the problem again and again. And we can do that here. Even though it's not quite as visually engaging as a phone book, you can kind of go to the middle. And if Chrissy had opened that middle door, and seen 16, you would know-- what? And actually, I can recreate this. I'm just going to refresh the screen. So in this case, we have the same doors. I know 50's somewhere, and I don't know how-- where it is, but 16. Now I know it's to the right, as you propose. So, now I can essentially eliminate all these doors, not even worry about them. And indeed, if I open them, we can confirm that I don't need to waste any time actually searching them. Now I've got three doors. What would you propose we do next? AUDIENCE: Go in the middle again? DAVID MALAN: Yeah, go in the middle again. So here, 42 is a great answer, but it's not the one we're looking for. And indeed, we can throw this half of the problem away, and finally search the right half, which now has been whittled down to one, and then we would find 50. And if I can reconstruct what would have been a great history, in the first example, how well might Chrissy have done theoretically, or if we did this exercise again and again and again, with the first example. If you don't know anything about the numbers, you can get lucky, as one might by just choosing a door and, wow, that happens to be the number. But that's not going to happen all the time, most likely. And so you might have to just start, you know, maybe at the beginning-- no-- no-- no. Maybe you can get clever and skip ahead-- no-- no-- OK, eventually, you will find it, if it's actually there. But if you don't know anything about the numbers, the best you can do is what's called brute force. Just brute force your way through the possibilities until you find the answer. But in the worst case, how many doors might I have to open to find the number 50, if I knew nothing about them? AUDIENCE: All seven. DAVID MALAN: Yes, all seven. Right? If n is the number of doors, it might take me as many as n steps. In this case, it was like n minus one, or six steps. But in Chrissy's case, clearly, there's a really exciting lower bound, because if you do get lucky, it might only take you one step. So, that's an interesting range to consider. The solution to your problem might take one step, or n steps, or any number in between. But the binary search that you proposed in the second approach, where you divide and conquer-- recall that when we did that in the past, we had a very nice shape to the curve that we drew that described the efficiency of that algorithm. And we'll come back to that before long. But let's just, for clarity, formalize just what these two algorithms are. If I start from the left and go right, or start from the right and go left, following a line, we would call that linear search. And it can be written in any number of ways. I came up with this pseudo code, but again, any reasonable person could come up with an alternative one and say it, too, is linear search. These are not official definitions. And I wrote it as follows. For each element in array, if the element you're looking for, return true. So, this is a very concise way of saying, for each element in the array, just look at it from left to right, or right to left. If it's the one you're looking for, return true. I found the number 50, or whatever it actually is. Otherwise, return false at the very end. And notice the indentation. I un-indented it because only as the very, very last step do I return false, if none of my iterations through the loop actually return true. So, that would be linear search. Binary search, you have to be a little more verbose to explain it, perhaps. And there's many different ways to write this out, but this is very similar in spirit to something we saw before, with Mike Smith and the phone book. So if we go ahead and look at the middle of the sorted array, just as you proposed, and if the element you're looking for is right there, go ahead and return true. I found 50. I got lucky, it was dead center in the middle of my array, in one particular running of this program. Else, if the element is to the left, search the left half of the array. Else, if it's to the right, search the right half of the array. Otherwise, return false, because it's presumably not there. So, this would be binary search. And even though it's more lines of code, or pseudo code, it arguably should be a little faster, right? Because of that dividing and conquering, and throwing half, half, half, half, half, half of the problem away, the problem gets smaller much, much more quickly. All right. So with that said, it seems to be a very good thing that having things sorted for you is a very powerful ingredient to a problem. Right? It takes more work-- should have taken Chrissy more work; would take all of us, in general, more work to find a number using linear search than by using binary search. Much like it would have taken me forever to find Mike Smith flipping one phone page at a time, versus using divide and conquer, where I found him much more quickly by dividing the problem in half, and half, and half again. So, that invites the question, how do you get something sorted? Well, let me go ahead and pull up these things, which we happen not to use in this class, but elsewhere on campus, you might have these blue books. And at exam time, you might write your name, and the class, and all that on them. And we've kind of simplified, so, this is a person whose last name starts with A, last name-- a person's name starts with B, C, and all the way to Z. But suppose that people finish at different times during the exam, and of course, when you're in the science center or wherever, everyone just starts handing them in, and they kind of come in like this, and then the TFs at the front of the room, or professor, has to actually sort through these values. Well, what's this algorithm going to be that we actually use? If you've got a pile of exam books like this, all of them have a letter associated with them, how would we go about sorting these? What do I do? AUDIENCE: Compare two at a time. DAVID MALAN: Compare two at a time? OK, so let me go ahead and pick up a couple here. And since I'm the only one that can see them, you should be able to see them on the overhead, thanks to Ian and Scully. So, P and H, so, H goes before P, alphabetically. All right, now what do I do? Pick up another two? AUDIENCE: Pick up one. DAVID MALAN: Yeah, so maybe just pick up one, and I happened to get N, so, that goes in between H and P. So I can kind of slide it in. Now I picked up G. That goes before H. Now I picked up another one, E. That goes before G. So, it's actually getting kind of easy. Uh-oh, U-- this goes at the end, after P. And so, I can keep grabbing one at a time, F in this case, and then kind of insert it into its appropriate location. And we won't do this the whole way, because it's going to get tedious, and eventually I'm going to embarrass myself by getting one of the letters wrong. But that's an algorithm, right? For each book in the pile, pick it up, compare it to all of the elements you're already holding, and insert it into the appropriate location. And we can actually call that something, and that's something called insertion sort, insofar as the emphasis of the algorithm is-- thank you-- on inserting letters, in this case, into their appropriate location. So, with that said, are there other ways than that? Let's go ahead, here-- and we have enough stress balls to do it with just one more human demo here, beyond these books-- these numbers here. Suppose we wanted to sort these. I have eight stress balls, eight pieces of paper-- could we get eight other volunteers? So, yes, right and back, two three-- let's go farther back-- four. Can I get farther back? Five, six, seven, and eight. Come on up. Ah, next time. Next time. All right, come on down. OK. What's your name? JERRY: Jerry. DAVID MALAN: Jerry. OK, If you want to go ahead and step in front of a board there. ARMAH: [? Armah. ?] DAVID MALAN: [? Armah, ?] David. CHRIS: Chris. DAVID MALAN: Chris, David. Thank you. KAYLIND: Kaylind. DAVID MALAN: Kaylind. NOLAN: Nolan. DAVID MALAN: Nolan, David. JAY: Jay. DAVID MALAN: David. MATTHEW: Matthew. DAVID MALAN: Matthew, David. OK. So, this is statistically anomalous, insofar as we're actually very excited to say, in CS50, this semester, for the first time ever-- we watch these numbers annually-- and we actually have 44% women in CS50 this year. So, as you can see, none of my demonstrations are going correctly today. But trust in that data. So if each of you could stand behind one of the music stands here-- hopefully I have counted out exactly eight people. We have, in front of you guys, numbers. So go ahead and turn around the pieces of paper, which represent the numbers that we have on the board here, if I got the same order right. So, there's going to be a bunch of different ways we can sort these eight values. So, how do we go about doing this? Well, much like you proposed earlier-- just pick up a pair of blue books and compare them-- why don't I try that same intuition? So, four and two-- these numbers are obviously out of order. So, what do I want to go ahead and do? Yeah, so I can go ahead and swap these. And now, problem is solved, which is great. I've taken a bite out of the problem. And then we move on. Four and seven? Those look OK. Seven and five? Not OK. So, what do I want to do here? AUDIENCE: Switch them. DAVID MALAN: Yeah, so I can swap those, thank you. So, seven and six? Also out of order. Let's swap those. Seven and eight? That's good. Eight and three? Not correct, so if you want to go ahead and swap those. And, eight and one, if you want to go ahead and swap those. So, what was the net effect? Have I sorted this list of numbers? So, obviously not. But I did improve it. And what's the most glaring example, perhaps, of the improvement, in front of these? AUDIENCE: Eight's at the end. DAVID MALAN: Eight is now at the very end. So the biggest number, if you will, bubbled up to the end, as though it was bigger and sort of bubbled up. So that's good, but there's still work to be done here. So, I can, again, just try to fix these problems locally. Just pick up a couple of problems and try to solve it. So, two and four? We're good. Four and five? Five and six? Six and seven? Ooh, seven and three? If you guys want to swap those? Wonderful. And then, seven and one, we want to do it again. And now, do I need to bother comparing seven and eight? Technically no, right, because if we know eight made its way-- now we can start cutting some corners, but correctly, just to shave some time off of the algorithm, make it a little more efficient. Good. So, sorted now? No, so we have to keep doing this. And let me let you guys now execute this, pair-wise at a time. So, here we go. Two, four. Four, five. Five, six. Six, three. Six, one. And we can stop there, because we know seven is in order. Now we do it again. Two and four. Four and five. Five and three? Five and one? Improved, good. And now, next, two and four, four and three, four and one, and then two and three, three and one-- and then two and one-- OK, now it's sorted. Yes, very well done. Very well done. So, it's kind of tedious, frankly, and I didn't want to keep walking back and forth, because thankfully we have, like, all of this-- this manpower, these multiple CPUs, I guess, literally today. So, we have all of these CPUs, or computers, that are able to help me out. But it was still very slow. I mean, it's a long story. So, let's rewind just once and do one more. If you guys could rearrange your pieces of paper so that it matches the screen again, just to reset to our original location. Let's go back there. And let's try one other approach. I've tried the insertion approach, whereby I just take a problem, like the blue book, and insert it into its correct location. But honestly, that was getting a little tedious, and that's why I aborted, because it's going to take me longer, and longer, and longer to find the right location among all of those blue books. So, let me try just a more intuitive one. If I want to sort these numbers, let me just go and select the smallest number I see. OK, four, at the moment, is the smallest number I see. So I'm going to grab it for just now. Now, again, these are lockers. Even though we humans can see them all, the computer can only see one location at a time. And this is, indeed, the smallest number I have seen thus far. So I'm going to grab it. But very quickly, I can abort that, because now I've found an even smaller number. So I'm going to hang onto that instead. Seven, I'm not going to worry about that; five, six, eight, three, one-- I've found a smaller number. Now, I need to kind of do something with this. I want to grab the one, so I could just put the two here. And what do I want to do now with the number one? Yeah, I kind of just want to put it here. And so, I can do this in a few different ways, but you know what, I'm just going to evict whoever's here, because it's a pretty low number, but it's also random. It could have been a big number. So let me just make room for it and do that. I have selected the smallest element. Is the list sorted? I mean, obviously not. But is it better? It is, right? Because the one is now, at least, in the right location. So I've solved one eighth of the problem so far. So that's not bad. What could I do next? Let me apply the same algorithm. Let me select the smallest, which is currently this one, still this one, still this one-- nope, three is smaller-- oh, two is even smaller, so let me ultimately grab this. And you know what, four, you really don't need to be there; I'm just going to evict you again, and put two where it belongs, and move this one over here. So now, the list is even more sorted. And if I proceed to do this again and again, if you want to just keep handing me the smallest number-- three? OK, so I'm going to go ahead and just evict seven, because it's kind of a random number anyway. And now, thank you, four-- I'm going to go ahead and evict five, even though, kind of feels like I'm making a little bit of work for myself, on average, it's not going to matter. Sometimes it will be good, sometimes it'll be bad. So let me go ahead and put five over here. Now I need to select the next smallest element, which happens to be five. So we recovered pretty quickly. So I'm going to evict six over here. Now I'm going to look for the smallest element. Now it's indeed six. I'm going to evict eight, put this over here-- and now seven is good, eight is good-- done. But it's still kind of a long story, right? Like, I'm going back and forth, looking for these numbers. But this would be what we'd call selection sort. So thank you all very much-- if you'd like to keep your pieces of paper, you're welcome to. And let me give you guys a stress ball as well. And a round of applause, if we could, for you guys. If you want to hand those out. So, as before, let's see if we can apply-- let's see if we can apply some pseudo code to this algorithm, because it's one thing to talk about it, and it's one thing to sort of reason through it intuitively, but at the end of the day, if you want to program this, we've got to be more precise, and we've got to consider the lower level operations that the computer is going to execute. So, here's how we might implement bubble sort. Repeat until no swaps. For i, from 0 to n minus 2-- and n is just the size of the problem, the number of doors, the number of humans, the number of numbers, whatever the input to the problem actually is. So, for i, from 0 to n minus 2, if the i-th and the i-th plus 1 elements are out of order, swap them. So, this is kind of a mouthful. But if you think about it, I'm just kind of applying some of the vocabulary that we have from C, and kind of sort of from scratch, to an otherwise very organic human experience, but using more methodical language than I was just kind of doing off the cuff, when we were doing it with humans. Because what does this mean? For i from 0 to n minus 2. That means start at, like, the 0 location, and if there's n elements here-- this is 0-- and this, at the very end, is location-- it's going to be n minus 1. Right? If you start counting at 0, you have to readjust your whole life to subtract 1 from the tail end of that range of numbers. Right? 0-- if that was 1, this would be n. But if that were 0, this is now n minus one. So I'm saying, though, for 0-- for i from 0 to n minus 2, which is technically this. So I'm using sort of for loop-like language to start iterating here, and do something like this, up until the second to last element, which we've not done before. Seems almost buggy. But if you read ahead in the pseudo code, why did I do that? And only iterate until the second to last element with i? What jumps out at you? Yeah. AUDIENCE: Because then you're gonna swap the i in the i-plus-one-th elements? DAVID MALAN: Good. AUDIENCE: When you get to n minus 2, you'll swap it with n minus 1. DAVID MALAN: Exactly. Recall that bubble sort was all about swapping, or potentially swapping pair-wise elements, neighbors, if you will. So, you have to make sure that if you're iterating through all of these numbers, you have to stop short of the end of the array, so that i plus 1 actually refers to an element that's actually in your list, and not, for instance, way over here, which would be some garbage value that you shouldn't actually touch. So, if those elements are out of order, we swap them, and I have a big outer loop there that just says, keep doing this, again and again and again, until you don't swap anything. At which point you can infer that you're done. Because every time I walked back and forth on the list, and the guys helped out by swapping their numbers as appropriate, I kept doing it again if there was still room for improvement. And intuitively, why is it absolutely, logically safe to stop that whole process once you have not made any swaps on a pass through the list? Why is that a safe conclusion? So, if I walk through the list-- no, these are good, these are good, these are good-- OK, you didn't want your number-- these are good, these are good-- how do I know that I don't need to do that again? AUDIENCE: It's sorted already. DAVID MALAN: It's sorted already, right? And it would be kind of irrational-- if you've walked through the list, looking at everything pair-wise, found nothing to swap, to even bother doing that again-- why would you expect different results, if the numbers themselves are not moving and you didn't move them yourself? So you can just stop. But this still is going to invite the question, well, how expensive was that? How many swaps, or comparisons, did we make? And we'll come back to that before long. Selection sort, though, can be expressed maybe even a little more succinctly. And that was the second algorithm we did with our eight volunteers here, for i from zero to n minus 1. So this time, all the way through the end of the list, find the smallest element between i-th and n minus 1-th. So between those two ranges, the beginning of your list and the end, and then swap the smallest with the i-th element. So what does this mean? So again, for i from 0 to n minus 1. This is just pseudo code for saying, start a variable i at location 0. And do this until i equals n minus 1. So, do this until you've gone all the way through the list. What is it telling me to do? Find the smallest element between the i-th element and the end of the list. N minus one never changes. It always refers to the end of the list, so that's why I walked through the list looking for, ultimately, the number 1. And what did I do with the number 1? Swap the smallest with the i-th element. And I might have gotten one of the steps wrong when I did a little switcheroo, but we fixed it thereafter. Ultimately, I kept evicting whoever was in the i-th location to make room for the element that I knew belonged there And I could have shuffled them to make room for those elements, but it turns out, mathematically, it's just as fine to just evict it and move it all the way to the end, as we did. And once I've gone all the way through the list, there is no more smallest element to find. And as we saw, the list is sorted. So, maybe this is faster, maybe this is slower-- it's not immediately obvious. And insertion sort, which we actually came up with by way of the blue books on the floor, might be described as this. For i, from 1 to n minus 1, call the 0th through the i minus i-th element the sorted side-- that's a mouthful-- so, consider the left of your list the sorted side of the list. And initially, there's nothing there. You have zero elements sorted to your left, and eight unsorted elements to your right. So that sort of describes this story, when we had volunteers and numbers here. There are no elements sorted; everything to my right was unsorted. That's all that's saying. Remove the i-th element. That was like picking up this blue book, if we were using blue books in this example. Then what do I want to do? Insert it into the sorted side, in order. So, if this is the sorted side, this is the unsorted side, this is the equivalent of saying, take that blue book and just put it in the first location. And you can kind of make a visual gap for it. Now, this is the sorted side, this is the unsorted side. Or, equivalently, when I was down here with the blue books, the books in my hands were the sorted side, and everything still on the stage was the unsorted side. Same idea. So, what happens next? I then iterate one location next, and I remove the next element, and whatever number that is, I figure out, does it go to the left or does it go to the right? Which was the same thing, again, on stage, me sort of picking up a third blue book and deciding, does it go in between these books? Does it go below, does it go above? I inserted it into its appropriate location. So in this insertion sort algorithm, you sort of take each number as you encounter it, and deal with it then and there. You take the number and deal with it, so, you know what, this one's got to go here, if we just kind of pretend what the numbers look like for a moment. So that would be inserting it into the right location, like I did with the blue books. Maybe this one-- oh, maybe this one's a really small number, and so I insert it over here. So I kind of literally deal with each problem as I encounter it, but it just gets expensive, or very annoying, to have to move all of this stuff out of the way to make room for those elements. And that's why I got bored with the blue book example, because it was getting very tedious looking through all of the blue books for the correct location. So in short, all three of these algorithms, while very differently expressed, and while all of them are kind of intuitive until you try to describe what your human intuition has actually been doing for the past some number of years that you've been sorting things just in the real world-- they can all be described, at least, in terms of these algorithms. So these algorithms-- and we started this conversation in the first lecture-- all have, ultimately, some kind of running time associated with them, like how long does it take to do something. And we talked about the process of finding Mike Smith in terms of this pretty generic graph. It wasn't very mathematical, it wasn't very sophisticated-- we just wanted to get a sense of the relationships, or tradeoffs, of space and time, so to speak. And so, on the x-axis, or horizontal, we have the size of the problem-- so, like, a number of pages in the phone book, or number of people in the phone book-- and on the y-axis, or vertical, we had the amount of time to solve the problem. How many seconds, how many page turns-- you could count using any unit of measure you like. And the first algorithm for Mike Smith, when I started with the very first page and turned, and turned, and turned, was a straight line, a linear relationship. One more page, one more step. So, it's straight line. The next algorithm was searching by twos, recall, in the first lecture. Two, four, six, eight. And that's still a straight line, because there's a predictable relationship between number of pages and number of seconds, or page turns. It's two to one instead of one to one, so it's still a straight line, but it's lower on the graph. It's better. But the best one, by far, was the divide and conquer approach, we said. Right? And it certainly felt faster; it's great, because it was intuitive. It wasn't quite as easy to express in pseudo code-- that was among the longer ones today-- but it at least got us to the answer faster. And this is logarithmic, in the sense that the logarithm-- technically base 2, if you recall some of your math-- it's because you're dividing the problem in half, and half, and half. And it's fine if you're uncomfortable with it, don't even remember what a logarithm is. For now, just assume that logarithmic time has this different shape to it. It grows much more slowly. Any time you can choose log of n over n, when picking between two algorithms, go with the log n, or something like that, because it is going to be smaller, as we can see visually here. So, let's just consider something like bubble sort. There's a couple of ways we can look at this. And again, the goal here is not to be very mathematical-- we're not going to start doing proofs, but at least, by taking a glance at some of the steps in this algorithm, you can get a general sense of how slow or how fast an algorithm is. And indeed, that's the goal. There's this fancy notation we're about to see called asymptotic notation, with special Greek characters. But at the end of the day, we're really just trying to get an intuitive sense of how good or bad an algorithm is, much like we were trying to do with this picture. But now we'll do it a little more conventionally, as a computer scientist might. So in bubble sort, recall that we compared every pair of humans and made a swap if they were out of order. And then we repeated. And we repeated, and repeated. And we kept going through the list. So, that can be quantized-- like, you can kind of break that down into some total number of steps. So, if you've got n humans in front of the room, and you want to compare them from left to right in pairs, how many possible pairs are there as I walk through the list for that first time? If there's n elements, and I can put the stands back where they were. How many pairs were there, as I walked from left to right? I compared these two, these two, these two. Yeah, there's n minus one. Specifically seven. And even if you're not quite sure where we're going with this, if there's eight stands-- like, this is one, two, three, four, five, six, seven-- and there's eight total. So, that's indeed n minus 1. So, that's how many comparisons we might have made the first time through bubble sort. But the very first time I went through the list in bubble sort did the list get fully sorted? No, we had to do it again. We knew that 8 had bubbled up to the very end, so that was good. 8 was in the right place. But I had to do it again, and fix more problems along the way. But the second time, I didn't need to go all the way through the list. To be clear, who did I not need to look at? The last location, or 8, in our very specific case. So the second time through the list of humans, I only have to make n minus 2 comparisons. Right? Because I can just ignore number 8, the final human, and just deal with the seven other humans that are somehow misordered. So, if I wanted to really be nitpicky and write this down, and count up how many steps, or how many comparisons, I made, we could generalize it like this. All right? It's going to be n minus 1, plus n minus 2, plus n minus 3, plus dot-dot-dot. Or, more specifically, 7 plus 6 plus 5 plus 4-- this is just the fancier formulaic way of saying the same thing. Now, I don't remember my back-of-math-book formulas all that well, but I remember this one. You know, in your physics books, your math books, often in the hardcovers, you'll see little cheat sheets for what series of numbers actually sum to or multiply out to. And so it turns out that that summation can actually be expressed more succinctly as n times n minus 1, all divided by 2. That is the same thing, mathematically, as the long series that I had a dot-dot-dot there for. So if I multiply this out, just using some algebra, that's like n squared minus n, divided by 2. And then if I kind of multiply that out, that's n squared, divided by 2, minus n over 2. So if I wanted to be really precise, this is the running time of bubble sort. It takes this many steps to sort n people. Why? Because I literally just counted the number of comparisons I made. That's how many comparisons it takes to do bubble sort. But honestly, this is really getting tedious, and my eyes are already starting to glaze over. I don't want to remember these algebraic formulas here. So let's actually try an example, just to get a sense of how slow or how fast this is. Suppose that n were a million. So not eight, but a million people, or a million numbers. How slow, or fast, is bubble sort going to be? Well, if we plug in a million, that's like saying n is a million. So that's a million squared, divided by 2, minus a million divided by 2. Because that's what it all summed up to be. So if I do this out, it's a really big number. 500 billion minus 500,000. And in any other context, 500,000 is a pretty darn big number. But not so much when you subtract it from 500 billion, because you still get 499,999,500,000 after subtracting those off. Which is to say, of those two terms, the one on the left versus the one on the right, which is the bigger one? The more dominating factor in the mathematical expression? It's the one on the left, right? That's the one that's massively bigger. And so more generally, n squared divided by 2 feels like a bigger result than n divided by 2 alone. And we've seen it by proof-- by example, which is not a proof, but it at least gives us a feel for the size of the program, or the number of comparisons we're actually making. So you know what, ugh-- if that is the case, if the dominant factor-- the one on the left, in our case; the one with the square, specifically-- is just so much more influential on the number of comparisons we're going to make, then let's just wave our hands at the lower-ordered terms, and divide it by 2, and anything like that, and just say, ugh, this algorithm feels like n squared. I'm going to throw away the denominator, I'm going to throw away the thing I'm subtracting off, I'm going to throw away anything that is not the dominating factor, which is the term that contributes the most to the total number of steps incurred. And indeed, this is what a computer scientist would describe, generally, as the running time of this algorithm. It is on the order of n squared. It's not n squared, but it's on the order of n squared, as we've seen. It's pretty darn close, and it's good enough for a conversation with another reasonable person who wants to debate whether his or her algorithm is maybe better or worse than yours. So, this would be called big O notation. Big O is used to refer to an upper bound on an algorithm's running time. Upper bound meaning, we'll consider, for our purposes, in the worst case, how much time might this algorithm take? Well, it's going to take on the order of n squared steps. Because if the list of numbers is unsorted initially, we've got to do a lot of work to actually sort it. There's other terms that we could put in those parentheses. Some algorithms are not on the order of n squared. Some algorithms are actually order of n log n; some algorithms are on the order of n itself, or log n, or even 1, where 1 refers to constant time. And in fact, the ones I've highlighted here, we've actually seen examples along the way of all of these so far. For instance, what algorithm have we seen that has a running time on the order of n? n steps? AUDIENCE: Linear search. DAVID MALAN: Linear search. If we were to think back, even to today, to linear search-- or from the first lecture, when I was just looking for Mike, really slowly, one phone book page at a time, that's a linear algorithm. If there's n pages, or n humans, it might take me on the order of n steps, because Mike Smith-- S is toward the end of the alphabet, so he might be way over there, or way toward the end of the phone book, or, God forbid, his name starts with a Z, then I'm really going to have to go all the way into the phone book. And so that's on the order of n steps. So, n here would be linear. We've also seen another algorithm, here in yellow-- big O of log n. Saw it just a moment ago. Which of our algorithms was on the order of log n running time? Yeah, so binary search. Divide and conquer. We didn't call it-- we didn't describe it this formulaically in the first lecture, but that's how you would describe the running time. Not just with a pretty picture, but just with an expression like this, that all humans-- at least computer scientists-- can agree upon. And constant time. The funny thing here is, because we're throwing away terms that don't really matter, O of 1 does not mean an algorithm that takes one step only. That would be a very limited number of options for your algorithms. But it does mean, symbolically, a constant number of steps. A constant number of steps. So, what's something you might do that takes a constant number of steps, in an algorithm? Maybe in, like, the first lecture, we had the silly peanut butter and jelly example. Which of the steps that day might have taken big O of 1 steps, a constant number of steps? I remember one, like, insert knife into jar? That's kind of one step. Maybe it's two, because I might have to, like, pick up the knife, and insert it into the jar then. One step, two steps-- but it's a constant number. The number of steps is not at all informed by the algorithm itself. It just happens. You do that in one step. So, if there's any number of other algorithms here-- and we'll leave in white one that we'll come back to-- but let's just consider the opposite of this, if you will. If big O is our upper bound on running time, it turns out, there's a vocabulary for discussing the lower bound on an algorithm's running time. Which, for our purposes, we'll generally consider in the context of best case. So in the best case scenario, how little time might an algorithm take? Well, this is a capital omega, and it's used in exactly the same way. You just say, omega of n squared, or omega of n, or omega of log n. So it's the same symbology, it just refers to a lower bound. So, it takes this few steps, or this many steps. Big O, big omega. So, what's an algorithm, therefore, that is in, so to speak, omega of 1? Like, what algorithm, in the best case, might actually take just one step? And who is best to answer this question today in the room, in fact. What algorithm could take one step? Yeah. AUDIENCE: [INAUDIBLE] DAVID MALAN: Yeah, linear search could take omega of one steps. Because in the best case, it is right there. Or in Chrissy's case, even if our algorithm is to sort of choose randomly, in the best case, it is right there, the number 50. So even her algorithm, and even our linear search algorithm-- and for that matter, even our binary search algorithm-- are in omega of 1, at least in the best case. Because if you get lucky, you're done. One step. By contrast, what is an algorithm that takes at least n steps? So, omega of n? AUDIENCE: [INAUDIBLE] DAVID MALAN: That's a good one. So you say bubble sort, I heard. AUDIENCE: Yes. DAVID MALAN: Why? AUDIENCE: Because if they're all already in order, you just go through each comparison, and then make no swaps. DAVID MALAN: Good. So in the case of bubble sort, where we generally had a lot more work to do than just finding something with a searching algorithm, bubble sort is, minimally, an omega event you need at least on the order of n steps-- maybe it's n minus 1, or n minus 2-- but it's on the order of n. Why? Because only once you go through the list at least once do you know-- what, to be clear? AUDIENCE: That they're all in order. DAVID MALAN: That they're in order. And you know that as a side effect of having not made any swaps. So, you can only determine that a list is sorted in the first place by spending at least n steps on that process. Excellent. So, there's yet another one, and this is the last, whereby if you happen to have an algorithm, or a scenario where the upper bound and the lower bound are the same-- turns out there's a symbol for that too; you can just describe the algorithm in terms of theta notation. That just means theta of n, theta of log n-- whatever it is, that just means upper bound and lower bound are one and the same. And there's more formalities to this, and you can actually dive in deeper to this in a theory class. But for our purposes, big O and omega will be a generally useful way of describing, generally speaking, just what the running time of an algorithm actually is. So, big O of n squared is the fastest we've seen thus far. Unfortunately, it does actually tend to run pretty slowly. We saw it with an example of, like, 500 billion steps just to sort a million elements. Turns out we can do way better than that. Much like in the first lecture, when I crazily proposed, I think, suppose your phone book had, like, four billion pages-- well, you only need 32 steps using binary search, instead of four billion steps using linear search. So, it would be nice if, after all of this discussion of algorithms and introduction of these formalities, if we can actually do better. And it turns out that we can do better, but this has been a lot to do already, so let's go ahead in here and take a five-minute. break. And when we come back, we'll blow out of the water the performance of all three of those algorithms we just saw. All right. So, let's take a quick look at what these algorithms look like, so we can actually compare them against something that I claim is actually going to end up being better. OK. So, here we have an array of numbers represented as vertical bars. So, small bar is small number; tall bar is big number. And so it's a nice way to visualize what otherwise is pretty low level numbers alone. I'm going to go ahead here and make the animation pretty fast, and I'm going to go ahead here and choose, for instance, bubble sort. And, actually, let me slow it down a little bit, just for the sake of discussion. So, you'll see, in this algorithm, bubble sort. It's making multiple passes through the list, just as I did, highlighting in pink two neighbors at a time, and deciding whether or not to swap them, just as we were, with our eight volunteers, doing the exact same thing. Of course, with this visualization, we can do it more quickly, and we can speed this up to the point where you can kind of now start to feel the bubbling effect, if you will, whereby the bigger numbers are bubbling up to the top, to the right, just as the number 8 did, when we did it on paper. So, this is bubble sort. And we could watch this for quite some time, and in some sense, it's kind of mesmerizing. But in another sense, it's pretty underwhelming, because at the end of the day, all you're going to get is a bunch of bars, sorted, from short bars to big bars. But perhaps the takeaway is that I'd kind of have to stall here for a decent amount of time, even though we're running this at the fastest speed, because it's only fixing, at best, one number at a time. And maybe some others are improving, but we're only moving all the way to the end one number at a time. And we have to then go back, and go back, and go back, and do more work. It's going to be very ungratifying to abort it, but let's go back to random. And now, if we choose, for instance, selection sort, you'll see that the algorithm works a little differently. Let me slow it down. And what it's doing now, which is a little less obvious, is it's looking through the list for the next smallest element, and it's going to put it at the beginning of the list. All the way at the left. So, it's looking, and looking, and looking, and it leaves highlighted in red the most recently discovered smallest element. And then as soon as it gets to the end of the list, it's going to move that smallest element all the way to the left. So that we now, kind of like the opposite of bubble sort, have all of the smallest elements to the left. Though, this is arbitrary. We could bubble up the small elements by just reversing the order of our operations; we could sort from biggest to smallest-- that is irrelevant. It's just by human convention we tend to sort from smallest to biggest, at least in examples like this. And we can speed this up, but it doesn't quite have quite the same comparison effect, because all you're doing is a swoop through the list looking for the smallest, looking for the smallest, looking for the smallest. And so, this way, it's going to build up from short to tall. Let me go ahead and do it one more time, this time with insertion sort, and slow it down. And so, what we're doing here is the following. We identify the next element, and then we go and insert it into the place it belongs in the "sorted half" of the list. So, recall that I generally describe stuff on the left as being sorted, stuff on the right as being unsorted, and the implication of that is that even though these numbers here on the left are indeed sorted, when I encounter a new number, out in the unsorted area, I might have to move some things around and shuffle things around. And unlike the cheat I was doing here in person-- when I grabbed that music stand before and just kind of moved it over here-- that's not really legitimate. Right? This is garbage value land. Like, I should not have had access to this memory. And so what we did with our actual eight humans was more legitimate. The fact that our volunteers did the physical labor of moving those numbers around? That was the low-level work that the computer has to do, too. And you see it here all the more, either at this slow speed or the faster speed. It's being inserted into the appropriate location. So, case in point, this tiny little element? We have to do a huge amount of work to find its location, until finally, we've found it, and now we do the same thing. So, all of these have some pluses and some minuses. But it turns out, with merge sort, we can do even better. An algorithm that goes by the name of merge sort. But to do better, we need to have a new ingredient, or at least more formally defined, that we've kind of sort of leverage before, but not by name. And to do this, I'm going actually take out a little bit of code, in CS50 IDE, a program called sigma-0.c. And we'll see the interconnection in just a moment. So in this program, notice the following. We have a main function, whose purpose in life is to get a positive integer from the user, and to pester him or her, again and again and again, until they cooperate and provide a positive integer. That's what the do-while loop is often useful for. And then, what do we do with it? We simply pass that value, n, that the human typed in, via get int, to a function called sigma. And sigma is like the Greek character, or the capital E-looking character, that generally means, in math, like, sum a bunch of numbers. Add a bunch of numbers together. So, this is essentially a function called sigma, whose purpose in life is to sum all of the numbers from 0 to n. Or, equivalently, from 1 to n. So, 1 plus 2 plus 3 plus 4 plus, dot-dot-dot, n, whatever n happens to be. And then, via printf, we're just printing it out. So, let me just run this program to make super clear what's going on. And I can do this by doing, of course, in my source three directory for today, make sigma 0, enter, dot slash sigma 0, positive integer, I will do 2. So by that definition, it should sum 0 plus 1 plus 2, so 1 plus 2-- that should be 3. So I should see 3. And indeed, I see 3. Let's do one more, if I add in three numbers. So, this should be 1 plus 2 plus 3-- so, that's 1-- that's 6, in total. And so forth. And they get pretty big pretty quickly. If I do 50, then we start to get into the thousands. So, that's all it's doing. And how is it doing this? Well, we could implement this in a whole bunch of ways, but if we leverage some of our sort of techniques thus far, we might do it using a for loop. That's kind of been one of the most common tools in our toolkit. And how am I using it here? I'm first declaring a variable called sum, initializing it to 0, because I've done no work yet. Then I have a for loop, for i equals 1 all the way up through m. Why m? Well, just because. Recall that when you make your own function, whether in Scratch or in C, you get to decide what to call the inputs to that function. The arguments, or parameters, as they're called. And just for clarity, I called it m, even though we've typically been using n. I could have called it anything I want. I just wanted to make super clear it's a different variable. But more on that in a week or so. And so I'm just counting from one to m, and I'm adding to sum whatever i is. Now, just as a quick check, why am I not doing sum plus plus, as I usually do in these kinds of cases? AUDIENCE: Because you're not incrementing by [INAUDIBLE].. DAVID MALAN: Exactly. I'm not incrementing by 1, I'm incrementing by 1, and then by 2, and then by 3, and then by 4, and so forth. So I need this loop to be counting up, and I need to be adding i to the sum, not just a plus plus, in this case. Then I return the sum. And so, this is an example of an abstraction. Like, I now have a function called sigma-- just like in math, you might have the big capital sigma symbol that just says, add all these numbers together, I have a C function that does that. And so now, higher up in my code, I can call that function and then print out the answer. But it turns out that this simple function lends itself to a nice example of another programming technique, something called recursion. And we won't have terribly many opportunities in CS50 to apply this technique, but we will toward semester's end. If you continue on to a class like CS51, you'll use it all the time. If you use another type of programming language, you'll very often use this technique. And it's called recursion, and it looks like this. Let me go ahead and open up another file that's available on the course's website called sigma 1. Notice that main is identical. So, main is identical. And indeed, it's still calling a function called sigma, and then using printf to print out the answer. So there's no difference there. But what is different, in this next version, is the code for sigma. So, what's going on here? It still takes, as input, an integer called m. So that's good, because I need to know what to sum up to. It returns an integer, as before. And it amazingly has, like-- what, four real lines of code, plus some curly braces? And even those lines of code are super short. And there's no additional variables, and there's this weird, crazy logic here. But let's see what it's doing, first and foremost. On line 23, I'm saying if m is less than or equal to 0, return 0. Now, why does this make sense? Well, I only want to support positive numbers, or non-negative numbers, from 0 to m. And so I just kind of need an error check there, right? If the human somehow passes into this function negative 50 or something else, I don't want the function to freak out and give unpredictable behavior, I just want it to return 0, in cases of error or when the number gets that small as to hit 0 or even lower. So this, I'm going to call base case. It's just, like, this sanity check, like, don't let the math go beyond this point of 0 or less. So, amazingly, if you really zoom in on this code, the entirety of this program really boils down to one line. And what's going on here? So, I am returning, from sigma, an answer. But, curiously, my answer is kind of defined in terms of itself, which generally is a bad idea. Right? It's like in English, if you try to define a word by using the word in the definition, usually someone calls you on that, because it's not all that helpful to use a word in the definition of the word. And that's the same idea, at first glance, of recursion. You are using the same function to solve a problem that was supposed to be solved by that function in the first place. So, what do I mean by that? Main, of course, is calling sigma, and that means this code down here that we've been looking at gets executed. So, suppose that we hit this line of code. What recursion allows us to do, in this case, is take a bite out of the problem, and then defer to someone else to figure out the rest of the problem. So, what do we mean by that? Well, sigma, again, is just this process of adding up all the numbers between 0 and some number, m. So, 1 plus 2 plus 3 plus dot-dot-dot. So, you know what? I don't want to do all that work, as I did in version 0 with my for loop. Let me just do a piece of that work. And how do I do that? Well, you know what, when you ask me, what is the sum from 0 to m? I'm going to be kind of circular about it, and be like, well, it's the answer of-- the answer is m, the biggest number you handed me, plus the sum of everything below it. Right? So, if you passed in the number 10, it's like saying, well, sigma of 10 is 10 plus sigma of nine, and, like, leave me alone. I don't want to do the rest of the math. But, because you're calling the same function again, that's actually OK. A function can call itself, because if you think about where the story is going, now sigma gets called, in this story, with sigma of 9. What does the code do? Well, sigma of nine returns 9 plus whatever sigma of 8 is. So we're not solving the whole problem. We're handing back a 10, plus a 9-- and if we keep going, plus an 8, plus a 7, plus a 6. But we're not going to do this forever. Even though I'm using sigma in my implementation of my answer, under what circumstances am I not calling sigma? AUDIENCE: If m equals 0. DAVID MALAN: If m equals 0, or is even less than 0-- which shouldn't happen, but just to be sure, I made sure it can't, with the less than or equal to. So eventually, you're going to ask me, what is sigma of 0? And I'm not going to be difficult about it, I'm just going to say 0. And no longer do I keep passing the buck to that same function. And so even though it takes a while to get to that point in the story-- because we say 10 plus sigma of 9, sigma of 9 is 9 plus sigma of 8, which is sigma of 8 plus sigma of 7-- like, it keeps going and going and going. But if you kind of mentally buffer, so to speak, much like a video in your browser, all of those numbers that you're being handed back, one at a time-- which are, technically, being added together for you by your program with the plus operator-- the last number you're going to be handed back is zero, and at that point, all of the plus signs can just kind of kick in and give you back whatever number you're actually looking for. So, recursion is the act of a function calling itself. Which is very, very, very bad, unless you have a base case that ensures that eventually, as you take bites out of the problem, you will handle, with a special case, so to speak-- a base case-- a small piece of the puzzle, and just hand back a hard-coded answer, to make sure that this doesn't happen infinitely many times. So, any questions on this principle, of a function being able to call itself? Yeah. AUDIENCE: So, the base case here was when m equals 0? DAVID MALAN: When m equals 0 or is even less than zero, just to be sure. But yes, when m equals zero. Indeed. So, let's see. If you're comfortable, at least, with the fact-- oh, and actually, there's a good little geek humor now-- if you go to Google.com, and suppose you wonder, you're wondering what recursion is, especially a few hours from now. Well, you can Google it, and then the computer scientists at Google-- there you go. OK, so if you're laughing, you get it, which is great. So that, then, is recursion. Something giving you back an answer in terms of itself. So, why is this useful? Well, it turns out we can leverage this now to solve a problem if we know that we can actually convert it to code. We'll focus less on the actual implementation and more on the idea, but let's see if we can't wrap our minds around the problem to be solved with this code. This is merge sort, in pseudo code. And again, like all the pseudo code we've ever written, you could write this in bunches of different ways. Here's one such way. Notice, the first thing, on input of n elements-- so, n numbers, n blue books, n whatever-- go ahead and do the following. If n is less than 2, return. So it's a little different from the condition I had a moment ago, but the context here is sorting, it's not summing. So, why is it logically OK to say, if n is less than 2, just return? Yeah, that's just itself. If it's less than 2, that means there's only one blue book, or maybe even 0, so in either case, there's no work to be done. Just return. The list is sorted, however short it is. But if it's longer than that, you might have to do some work, and actually do some actual sorting. So, what happens then? So, else-- you know what? Sort the left half of the elements, and then sort the right half of the elements, and then, OK, merge them together. So it's the same kind of, like, blase attitude, where, like, ah-- if you ask me to sort something, I'm just going to tell you, well, you go sort the left, then you go sort the right, and then we'll just merge the results back together. And this is cyclical in the sense that, how do you sort the left half of anything? You need a sorting algorithm. But this is the sorting algorithm. So this is like saying, use merge sort to sort the left half, use merge sort to sort the right half, and then merge them together. Merging doesn't really need to be a fancy algorithm; merging is like, if you've got one pile of numbers here that are sorted, one pile of numbers here that's sorted, you can just kind of eyeball them and grab the appropriate one to kind of interleave them in the right order. That's what we mean by merging. So, how in the world is this even correct? Because we haven't actually done any apparent work, in this way. There's no loops, there's no comparisons, it seems. It's just completely recursively defined, so to speak. Well, let's see what actually this means. And this is a sequence of visualizations that can potentially fall off the story of. So I'll try to go slowly, but not so slowly that the example itself is boring. We'll just go through this once, and then again, the slides are online, if you kind of want to step through the visualization. So, here is a list of 8 numbers, the same 8 numbers, that we looked at before. I've drawn them contiguously, as though they are in an array. This list is currently of size 8. So an input of 8 elements is the beginning of this story. What was the first step in our algorithm? Well, we were going to check, if n is less than 2, return. That is irrelevant, because n is 8, not less than 2. So that's a moot point. So, the first three things for me to do to sort this list is to sort the left half, then sort the right half, then to merge the sorted halves. OK, so let's see how we get there. So here's the list, here is the left half, and I need to sort the left half, apparently. How do I do that? Well, how do you sort a list of four elements? AUDIENCE: Break it up again? DAVID MALAN: Break it up again. Sort the left half, then its right half, then merge those two halves together. So let me do that. I'm going to draw a box around only the elements we're thinking about at the moment. So, let me look at the left half. OK, now I need to sort this list. How do I sort a list of size 2? It's actually 2, it's not less than 2. So I have to do some work. So, how do you sort a list of size 2? It's a little strange to say it, but-- sort the left half, then sort the right half, then merge the two. And at this point in the story, you may very well be lost, because we literally just keep saying the same thing, and not actually doing any work. But think of it like you're buffering these instructions. Like, I've said to sort the left half, then the right half, but you focused on just the left half for now. But unfortunately, you got a little distracted, because now to sort the left half, you have to sort the left half, so you have to do a little more work. So if you just kind of let this mental baggage build up, we have to remember to go back through it. But we've not actually done the real work yet. We're about to now. Because now that you've told me, given a list of size 2, sort the left half, here's where we bottom out with that base case and actually start to make some progress. So here's 4 and 2, a list of size 2. Let's sort the left half. How do you sort a list of size 1? You don't, right? Because n is 1; 1, of course, is less than 2, and what was the one instruction that we had at the top of this function merge sort? Just return. Like, do nothing. So, OK, everyone. I have now sorted the number 4 for you. Like, it's true, it's kind of a foolish statement, but the magic must therefore come when we combine the results. So, let's see where the story goes. I've sorted the left half-- done. Return. Now, what was I supposed to do next? Now I have to sort the right half of that list of size 2. OK, done. What's the third step at this point in the story? Merge them. So I'm now looking at a list of size 2 again, each of whose halves is sorted-- according to the crazy logic we're using here-- but now, something interesting happens. I have on the left the number 4, I have on the right the number 2, and each of these lists is of size 1. And if you vary, in your mind's eye, or just visually, with my fingers, consider, like, your left hand pointing at the first list, your right hand pointing at the second list, the process of merging numbers is just comparing what your fingers are pointing at and deciding which one comes first. Obviously 2 is going to come first, so in a moment, we'll see that 2 should move over here, and then there's nothing left for my right hand. It's done. So, 4 is obviously going to go here. And that process of merging 2 followed by 4 is what we mean by merging. It's pretty much what I was doing with insertion sort, but here we're just doing it with individual elements at a time, kind of weaving things together, or zipping things together, like with a zipper, if you think of it that. Way. So, now, let me grab 2 and put it here. Let me grab 4 and put it here. OK. So I sorted left half, I sorted right half, I merged them-- how do we unwind the story? Where did we leave off? AUDIENCE: Sort the right half. DAVID MALAN: Now we have to sort the right half that was, like, a minute ago in the story-- which, just to highlight it now, is the 7 and 5. So now I have to do the same thing. I'm sorting a list, of size 2, that happens to be on the right of the left. So now, I sort the left half, done. Sort the right half, done. I now have to merge the two together. So now my hands have to do some work, but I'll just do it from over here. 5 goes down, then 7 goes down. And at this point in the story, we have sorted the left half of the left half, and the right half of the left half. So, what point in the story are we at now? Right. We're-- now we have-- well, we did the right half just now. We now have to merge the two halves together. And, frankly, if you do this at home, if you want to kind of retrace your steps, literally just write down a to-do list, like, from top to bottom on the sheet of paper. And then as you do something, cross it off, and go back to the previous thing in the list, you would actually see, or feel, even more what it was, that mental baggage you were accumulating that you need to attend to. But now I have two lists of size 2, so let's do the finger thing again here. So, I start pointing at the left list, start pointing at the right list. The first number to merge in is, presumably, going to be 2. Then what comes after that? I'm going to update my left finger, so now 1-- my left hand's pointing at the 4, at this point; my right hand, still pointing at the 5, so which comes next? 4. There's no more work for my left hand, so it's probably going to be pretty trivial-- 5 and 7. But I do need to do the merging. It looks merged already, but we have to do it. And I'm going to do it in some new space, just as before. So, 2 and 4 and 5 and 7. And now you can really see it for the first time. The left half of the original list is finally sorted. Unfortunately, like three minutes ago is when we started the story. And now we need to unwind, in our mind, to go back to the original right half. So if you think about it now, even though I've said a lot of words, this is technically the second step in our algorithm. Or at least the first invocation thereof. All right, so we'll do it a little faster, but same idea. Sort the left half. How do I do that? Sort the left half, then the right half, which are stupidly easy and dumb, but now I have to merge 6 and 8. So, merging in this case didn't have much effect, but it needed to be done to be sure. Next, let's sort the right half of the right half. Now I'm going to sort the left, sort the right. Now the key step is merging. And now we're doing some actual work. And now we really have some work to be done-- now we have to sort the left half and the right half of the original right half. So it's 1, then 3, then 6, then 8. Now we're finally, almost at the end. Now what do we do with these? Now we have two halves, the original left and the original right, and you can think of the fingers as doing the work again. 1 is going to go here, 2 is going to go here, 3 is going to go here, then 4-- and I constantly compare where my fingers are pointing, but my fingers are constantly moving from left to right. As soon as I deal with a number, it advances to the next number in the list. So it's obviously going to be, now, 1, 2, 3, 4, 5, 6. But notice, if you imagine my fingers doing this work, they're constantly moving toward the right, to the end of the list. So, as soon as my fingers hit the ends of those lists, I must be done merging. And voila. We've now sorted the elements. It's a huge number of words, and it would be a nightmare to kind of do it with humans, because there's just so much going on, and you have to remember, or buffer, so many of those steps. But in the end, we've done something that is kind of captured even by this picture. So it turns out that merge sort, even though it sounds like a long story, is fundamentally faster, and it's fundamentally faster because we're dividing the problem in half, as we have been doing with binary search, in the phone book example even days ago. So if we look on the screen, you can kind of see the remnants of work that we've done. Like, how many times did we move the elements, from one row to another? They started up here, then they eventually made their way here, and then here, and then here. So that's one, two, three movements of the letters, or of the numbers, in memory, if you will. So if you imagine each of these rows as just a different chunk of memory and RAM, I'm just moving things around in memory. So, three is just a number. But it turns out, and if we did more general cases of this, turns out that log base 2 of n, where n is 8-- 8 is the number of elements we started with-- log base 2 of 8 is 3. And so indeed-- and if you'll take on faith for now, so that we don't have to go through an even bigger example, to show it even more-- the number of times we move the numbers is going to equal, turns out, log base 2 of n. Which, in this case, happens to be 3. And so that, then, invites the question-- on each of the rows, every time you move these numbers into a new location in memory, how many times are you touching that number while it's in that position? Or, how many times, equivalently, are you looking at it, to do something about it? What do I mean by this? Well, the movement from top to bottom was happening anytime we did the merging. Right? We would move the numbers from here to here. But as soon as we did that, we had to do some work, with the left pointer and right pointer. I needed to then merge those together. And I emphasized earlier that anytime I'm comparing numbers, my left hand and right hand are constantly advancing from left and right. I never double back. Much like I constantly was doubling back with bubble sort, insertion sort, selection sort-- there was so much damn comparison going on, it felt like a lot of work, and it physically was. But here, you know, merging, I'm moving things around, but my hands are constantly moving forward, looking at, on each row, n numbers total. My left hand or right hand pointed at each of the numbers once. Never doubled back. So, it was never n plus 1, or 2 n, it was just n. So, we have log n movements of the numbers, in memory. And every time we do that, we merge them from left to right, effectively touching each number once. So we're doing n things log n times. And so, that would be mathematically equal to n log n. So, again, even if you're not super comfy with logarithms, you do know, from our picture, with the straight lines and the curved line, that which is smaller? Log of n, or n, generally speaking? AUDIENCE: Log of n. DAVID MALAN: Like, log of n is smaller, right? That's why the green line was lower, and it was also curved. It was below the linear line n. So, generally speaking, the bigger n gets, the more slowly log n grows. And again, if you just take on faith that this is a mathematical expression that communicates the time required to do something, it's less. So, which, therefore, is smaller? N squared, which of course is n times n? Or n log n? AUDIENCE: N log n. DAVID MALAN: N log n. So, we've now found an algorithm that's unlike all of the others we've seen. And even though it took a while to explain, and even though, frankly, you might have to kind of sift through it again to really wrap your mind around it-- it took me a while, too-- it is fundamentally faster as well. So, just to take one other stab at this, let me show one other perspective. At least if you're more mathematically comfortable it after today, if you're worried that this is way more math than you were expecting, realize we very quickly abstract away from these details, and we start to wave our hands using big 0 and big omega. Let's consider how we could look at this a different way. If the picture wasn't really working for you, let's see if we can just, like, jot down how many steps each of these lines of code is. And there's not many lines of code here, so it shouldn't be a very long expression. So, how long does it take to decide if n is less than 2? And, if so, return? Well, you're past a bunch of numbers, so, you know, I'm going to call it constant time. Like, you know how many numbers you've been handed-- nope, it's not less than 2, or yes, it is. You're just answering yes or no. Constant time. Big O of one. All right? So I'm going to describe that as this. This is the formal way of saying it. T of n, which is just how much time does it take, given a problem of size n-- just a fancy way of saying that. It's on the order of one step. Maybe it's two, maybe it's three, because you kind of got to look at something. But it's a fixed number of steps to decide, there are fewer than n elements in front of me. It's not going to take you very long. So, that piece of the puzzle takes big O of one step. So now, we have three other questions to ask. That's, like, kind of a throwaway. That's really quick, if it's just one step, or two steps, or three steps. So, are these the expensive things? Well, let's see. Sort the left half of elements. Well, here, too, I can be kind of clever and propose the following. You know what? The amount of time required to sort n elements is technically equal to the amount of time it takes to sort half of those elements, plus the amount of time required to sort the other half of those elements, plus, to be fair, some merging time. And it's essentially n, but I'm going to generalize it as big O of n, because I did have to move my hands around. But again, the key thing was, my hands were constantly moving to the right. There was no looping back and again, and again, like with the other algorithms. So it's, like, n steps to do the merging. If I've got 4 numbers here, 4 numbers here, I have to touch a total of 8 elements. 8 is n, so it feels like, yes, on the order of n steps to do the merging. Unfortunately, this is like a recursive answer to the question of how efficient is merge sort. But that's kind of consistent with the fact that merge sort is being implemented recursively in this code. And it turns out here, too, if you've got one of those old-school textbooks that's got a cheat sheet in the front or the back of your physics or your math book, this is a series that you can actually-- that mathematicians know actually sum up to something known, which is n times log n. And we won't go into the weeds of why that is, mathematically, but if you take a problem of size n, and add the running time for first half, second half, and then add an n, this is what, mathematically, it ends up being. And so, if you're more comfortable with that, realize that this derives from just counting up of those several steps. And ultimately, this is much better than that. And in fact, we can kind of feel this here. You'll be able to feel it even better with other data sets, but let me go ahead and reload here, and go ahead, at the same speed as we were before, choosing merge sort. Let me fit it onto the screen at once. Actually, we should speed it up to the original. So, what is it doing? It's using a bit of extra memory, just as we were on the screen, using some additional space. But notice, as it does that work, it's kind of moving things back and forth. And it's actually saving space. Even though I used log n amount of memory by keep moving it, this was stupid. Like, I didn't need to keep using more memory, more memory, more memory, because I wasn't using the stuff anymore above. So with merge sort, you really just need twice as much memory as those other algorithms, because the first time you need to move them, move them here. And then, even though I did it visually, deliberately to move it to yet another location, just keep moving things back and forth as needed. And that's what's happening with that algorithm there. It's not quite as clear to see with this visualization, so let me open up this other one here. Now, go. And you'll see merge sort all the way on the right-- done. All right, so, insertion sort got a little lucky here, just because of the order of the elements, and the size of the dataset isn't that big, which is why I wanted to show the other one. But if we do it once more, you'll see, again, that merge sort is pretty darn quick. And you can see it doing things in halves. And selection sort, and bubble sort, are still doing their thing. And if we did this using not, like, what is that-- 10, 20, bars total, but 100 bars? Or a million bars? You would really, really feel the difference, just as we did with the phone book example as well. Any questions there on that? And we won't walk through the code, but if you're curious to actually see how some of these ideas map to C code, you will find, in the CS50 appliance, in the source code from today, a couple of files-- binary zero and binary one in linear.c, all of which implement binary search and linear search in a couple of different ways, if you actually want to see how those map. But what we thought we would do, in our remaining time today, is tee up one of the next algorithmic challenges. It turns out that there are wonderful opportunities in computer science to intersect with other fields-- among them the arts, and very specifically, music. And it turns out that music, whether you're an audiophile or even a musical theoretician, there are relationships among the sounds that we hear and the rate at which we hear those notes. Which is to say, they follow patterns. And these patterns can be produced by computers, they can be generated by computers, and what we'll do, ultimately, in problem set three, in fact, is introduce you to a bit of the musical world, whether you have some prior experience or not. And Brian, if you wouldn't mind coming up just to assist with this teaser. Here are just a few keys from a keyboard, and here are 88 keys from an actual keyboard. And Brian will help us, in just a moment, with some of these definitions. But you'll see here that there are eight keys, or one, two, three, four, five, six, seven white keys and five black keys on the board. And it turns out that mankind-- at least, in Western music, years ago-- decided to standardize on how we describe these keys. And we assigned letters to them. And you might have heard of middle C, even if you've never played a piano before, and you might think of that as being the leftmost key there. And then it's D, E, F, G, and then A, B. And of course, on a real piano, there's keys to the left, and there's keys to the right. Do you want to play what C might sound like here? [PIANO PLAYS] So, that's C. And then, if you want to-- very well done. [APPLAUSE] Do you want to go all the way up through the scale to B? [PIANO PLAYS] That's kind of unresolved, too, because what should have come next-- [PIANO PLAYS] That would be another C. And so what Brian's played for us is a full octave, now, referring to eight. So, C to C inclusive, in this case. And those of us who are kind of are familiar with music, or like listening to certain music, you'll notice that certain things sound good. And there's actually mathematical and formulaic, or algorithmic, reasons that some of these sounds sound actually good. But what about these black keys? They actually can be defined in a couple of different ways. And if you've ever heard of flats, or sharps-- Brian, do you want to explain what the relationship now is among the white keys and the black keys, and how they sound different? BRIAN: Yeah, sure. So, a bit of terminology first. A semi-tone is just the distance from one note to the note immediately after that, both white and black notes included. And all it means for something to be sharp, represented by the hashtag or pound sign up there, is take a note and move it up by one semi-tone. So, if we start with C, and make that note sharp, to C sharp, we move one semi-tone to the note immediately after it, which is that black note in between C and D. So, that's C sharp. And likewise, if we add E sharp, that is one semi-tone, or the note immediately after, E, which in this case, is the same thing as F. So, F and E sharp are the same note. And in the meantime, flat is just the opposite of that. If sharp means move up one semi-tone, flat means move down one semi-tone. So if I have E, E flat is one semi-tone moving left on the piano keyboard. DAVID MALAN: And so even though a typical piano keyboard wouldn't be labeled as such, it does follow a pattern, and it does repeat, to the left and to the right as well. And so as you learn to play piano, you learn what these notes sound like, you learn where these keys are, and you also learn, ultimately, how to read music, which might look like this. This is a familiar song, now officially in the public domain. And you'll see here that there are these little shapes called notes, or little circles, that happen to be on specific lines. And it turns out that if a note is on one line, it might represent the note A; if it's on a different line, higher above or down below, it might represent B or C or D or E or F or G. And if there is a sharp symbol, or a flat symbol, in front of it, that might shift it ever so slightly, so that you're actually touching, in many cases, one of the black keys as well. Which is to say that once you have the vocabulary, and you know what the alphabet is to which you have access, can you start to write it out, much like we write computer programs. But this is what a musician would actually see. And just to give us maybe a teaser of what you can actually do when you take into account the different sounds of notes, and the different pace at which you play notes, can you give us a little something more than just a scale? BRIAN: Sure. [PIANO PLAYS] [APPLAUSE] DAVID MALAN: So, if you're a little worried what we're getting into, not only computer science and programming, but now music-- I am absolutely among the least comfortable with this, and this is why Brian has kindly joined us here today. But it'll be fun, we hope, ultimately, to explore these relationships, and also the intersection of one field with another. And to now tie these topics together, we thought we'd end by looking at a short visualization here, about a minute's worth of computer-generated sounds, that give you not just a visual feel of some of the algorithms and others that we've looked at today, but also associate sounds with the operations of moving things, swapping things, and ultimately touching bigger and smaller numbers digitally. So, here we have, up first, insertion sort. [COMPUTER SOUNDS] Again, it's inserting into the right place the number. This, now, is bubble sort. And again, you can both see, and now kind of feel, all of the swaps that it's doing. We will get the satisfaction this time. This, now, is selection sort, whereby you go through the list selecting the next smallest element, and plop it in its place. And the bars are getting higher and higher, just like the notes, or the frequencies. This, now, is merge sort. And notice the halves that are developing. And this is by far the most gratifying sound, at the end of this one. This is gnome sort, which we didn't look at, but very distinctly has a different shape, too. It's not quite as ordered as the others. And that, then, are algorithms. And Brian, play us out for today. Otherwise, we will see you next week.
B1 US david malan malan sort david algorithm sorted CS50 2017 - Lecture 3 - Algorithms 53 4 小克 posted on 2018/02/11 More Share Save Report Video vocabulary