Placeholder Image

Subtitles section Play video

  • [MUSIC PLAYING]

  • DAVID J. MALAN: All right, this is CS50 and this is a lecture four.

  • So we're here in beautiful Lowell Lecture Hall

  • and Sanders is in use today.

  • And we're joined by some friends that will soon

  • be clear and present in just a moment.

  • But before then, recall that last time we took a look at CS50 IDE.

  • This was a new web-based programming environment similar in spirit

  • to CS50 Sandbox and CS50 Lab, but added a few features.

  • For instance, what features did it add to you--

  • to your capabilities?

  • Yeah?

  • AUDIENCE: Debugger.

  • DAVID J. MALAN: What's that?

  • AUDIENCE: The debugger.

  • DAVID J. MALAN: The debugger.

  • So debug50, which opens that side panel that

  • allows you to step through your code, step by step, and see variables.

  • Yeah?

  • AUDIENCE: Check50.

  • DAVID J. MALAN: Sorry, say again?

  • AUDIENCE: Check50.

  • DAVID J. MALAN: Check50 as well, which is a CS50 specific tool that

  • allows you to check the correctness of your code

  • much like the teaching fellows would when providing feedback on it.

  • Running a series of tests that pretty much are

  • the same tests that a lot of the homework's

  • will encourage you yourself to run manually,

  • but it just automates the process.

  • And anything else?

  • AUDIENCE: [INAUDIBLE]

  • DAVID J. MALAN: So that is true too.

  • There's a little hidden Easter egg that we don't use this semester,

  • but yes indeed.

  • If you look for a small puzzle piece, you

  • can actually convert your C code back to Scratch like puzzle pieces

  • and back and forth, and back to forth, thanks to Kareem and some of the team.

  • So that is there, but by now, it's probably better

  • to get comfortable with text as well.

  • So there's a couple of our other tools that we've

  • used over time of course besides check50 and debug50.

  • We've of course used printf and when is printf useful?

  • Like when might you want to use it beyond needing to just print something

  • because the problem set tells you to.

  • Yeah?

  • AUDIENCE: To find where your bug is.

  • DAVID J. MALAN: Yeah, so to find where your bug is.

  • If you just, kind of, want to print out variables, value or some kind of text

  • so you know what's going on and you don't necessarily

  • want to deploy debug50, you can do that.

  • When else?

  • AUDIENCE: If you have a long formula for something [INAUDIBLE]

  • and you want to see [INAUDIBLE].

  • DAVID J. MALAN: Good.

  • Yeah.

  • AUDIENCE: How running-- like going through debug50 50 times.

  • DAVID J. MALAN: Indeed.

  • Well, in real life-- so you might want to use printf

  • when you have maybe a nested loop, and you want to put a printf inside loop

  • so as to see when it kicks in.

  • Of course, you could use debug50, but you

  • might end up running debug50 or clicking next, next, next, next, next, next,

  • next so many times that gets a little tedious.

  • But do keep in mind, you can just put a breakpoint deeper into your code

  • as well and perhaps remove an earlier breakpoint as well.

  • And honestly, all the time, whether it's in C or other languages,

  • do I find myself occasionally using printf just to type out printf in here

  • just so that I can literally see if my code got to a certain point in here

  • to see if something's printed.

  • But the debugger you're going to find now

  • and hence forth so much more powerful, so much more versatile.

  • So if you haven't already gotten to the habit of using debug50 by all

  • means start and use those breakpoints to actually walk through your code

  • where you care to see what's going on.

  • So style50, of course, checks the style of your code much like the teaching

  • fellows might, and it shows you in red or green

  • what spaces you might want to delete, what spaces you might

  • want to add just to pretty things up.

  • So it's more readable for you and others.

  • And then what about help50?

  • When should you instinctively reach for help50?

  • AUDIENCE: When you don't understand an error message.

  • DAVID J. MALAN: Exactly.

  • Yeah, when you don't understand an error message.

  • So you're compiling something.

  • You're running a command.

  • It doesn't really quite work and you're seeing a cryptic error message.

  • Eventually, you'll get the muscle memory and the sort of exposure

  • to just know, oh, I remember what that means.

  • But until then, run help50 at the beginning of that same command,

  • and it's going to try to detect what your error is

  • and provide TF-like feedback on how to actually work around that.

  • You'll see two on the course's website is a wonderful handout made

  • by Emily Hong, one of our own teaching fellows,

  • that introduces all of these tools, and a few more,

  • and gets you into the habit of thinking about things.

  • It's kind of a flow chart.

  • If I have this problem, then do this or else

  • if I have this problem do this other thing.

  • So to check that out as well.

  • But today, let's introduce really the last, certainly for C,

  • of our command line tools that's going to help

  • you chase down problems in your code.

  • Last week, recall that we had talked about memory a lot.

  • We talked about malloc, allocating memory,

  • and we talked about freeing memory and the like.

  • But it turns out, you can do a lot of damage

  • when you start playing with memory.

  • In fact, probably by now, almost everyone-- segmentation fault?

  • [LAUGHTER]

  • Yeah, so that's just one of the errors that you might run into,

  • and frankly, you might have errors in your code now

  • and hence forth that have bugs but you don't even realize it

  • because you're just getting lucky.

  • And the program is just not crashing or it's not freezing,

  • but this can still happen.

  • And so Valgrind is a command line program that is probably

  • looks the scariest of the tools we've used,

  • but you can also use it with help50, that

  • just tries to find what are called memory leaks in your program.

  • Recall that last week we introduced malloc,

  • and malloc lets you allocate memory.

  • But if you don't free that memory, by literally calling the free function,

  • you're going to constantly ask your operating system, MacOS, Linux,

  • Windows, whatever, can I have more memory?

  • Can I have more memory?

  • Can I have more memory?

  • And if you never, literally, hand it back by calling free your computer

  • may very well slow down or freeze or crash.

  • And frankly, if you've ever had that happen on your Mac or PC, very likely

  • that's what some human accidentally did.

  • He or she just allocated more and more memory

  • but never really got around to freeing that memory.

  • So Valgrind can help you find those mistakes before you or your users do.

  • So let's do a quick example, let me go CS50 IDE, and let me go ahead

  • and make one new program here.

  • We'll call it memory.c because we'll see later today how

  • I might chase down those memory leaks.

  • But for now, let's start with something even simpler, which all of you

  • may be done by now, which is to accidentally touch memory

  • that you shouldn't, changing it, reading it and let's see what this might mean.

  • So let me do the familiar at the top here.

  • Include standard IO.

  • Well, let's not even do that yet.

  • Let's just do this first.

  • Let's do int, main(void), just to start a simple program

  • and in here let me go ahead and just call a function called f.

  • I don't really care what its name is for today.

  • I just want to call a function f, and then that's it.

  • Now this function f, let me go ahead and define it as follows, void f(void).

  • It's not going to do much of anything at all.

  • But let's suppose, just for the sake of discussion, that f's purpose in life

  • is just to allocate memory for whatever useful purpose,

  • but for now it's just for demonstration's sake.

  • So what's the function with which you can allocate memory?

  • AUDIENCE: Malloc.

  • DAVID J. MALAN: Malloc.

  • So suppose I want malloc space for, I don't know,

  • something simple like just one integer.

  • We're just doing this for demonstration purposes,

  • or actually let's do more, 10 integers, 10 integers.

  • I could, of course, do-- well, give me 10, but how many bytes do what I want?

  • How many bytes do I need for 10 integers?

  • AUDIENCE: sizeof(int).

  • DAVID J. MALAN: Yeah, so I can do literally sizeof(int)

  • and most likely the size of an int is going to be?

  • AUDIENCE: Four.

  • DAVID J. MALAN: Four, probably.

  • On many systems today, it's just 4 bytes or 32 bits,

  • but you don't want to hard code that lest someone else's computer not use

  • those same values.

  • So the size of an int.

  • So 10 times the size of an int.

  • Malloc returns what type of data?

  • What does that hand me back?

  • AUDIENCE: [INAUDIBLE]

  • DAVID J. MALAN: Yeah, returns an address or a pointer.

  • Specifically, the address, 100, 900, whatever, of the chunk of memory

  • it just allocated for you.

  • So if I want to keep that around, I need to declare a pointer.

  • Let's just call it x for today that stores that address.

  • Could call it x, y, z, whatever, but it's not an int that it's returning.

  • It's the address of an int.

  • And remember, that's what the star operator now means.

  • The address of some data type.

  • It's just a number.

  • All right, so now if I were to--

  • first, let's clean this up.

  • Turns out that you use malloc, I need to use stdlib.h.

  • We saw that last week, albeit briefly, and then of course

  • if I'm going to call f, what do I have to do to fix this code?

  • AUDIENCE: You need to declare.

  • DAVID J. MALAN: Yeah, I need to declare it up here,

  • or I could just move f's implementation up top.

  • So I think this works, even though this program at the moment

  • is completely stupid.

  • It doesn't do anything useful, but it will allocate memory.

  • And I'll do something with it as follows.

  • If I want to change the first value in this chunk of memory,

  • well how might I do that?

  • Well, I've asked the computer for 10 integers or rather space

  • for 10 integers.

  • What's interesting about malloc is that when

  • it returns a chunk of memory for you it's contiguous, back-to-back.

  • And when you hear contiguous or back-to-back,

  • what kind of data structure does that recall to mind?

  • AUDIENCE: An array.

  • DAVID J. MALAN: An array.

  • So it turns out we can treat this just random chunk of memory

  • like it's an array.

  • So if we want to go to the first location in that array of memory,

  • I can just do this and put in the number say 50.

  • Or if I want to go to the next location, I can do this.

  • Or if I want to do the next location, I can do this.

  • Or if I want to go to the last location, I might do this,

  • but is that good or bad?

  • AUDIENCE: Bad.

  • DAVID J. MALAN: Why bad?

  • AUDIENCE: It's-- it's out of bounds

  • DAVID J. MALAN: Yeah, so it's out of bounds.

  • Right?

  • This is sort of week one style mistakes when it came to loops.

  • Recall, with for loops or while loops, you might go a little too far,

  • and that's fine.

  • But now we actually will see we have a tool that

  • can help us notice these things.

  • So hopefully, just visually, it's apparent that what I have going on here

  • is just-- on line 12, I have a variable x

  • that storing the address of that chunk of memory.

  • And then on line 13, I'm just trying to access location 10

  • and set the value 50 there.

  • But as you note, there is no location 10.

  • There's location 0, 1, 2, 3, all the way through 9, of course.

  • So how might we detect this with a program?

  • Well, let me go ahead and increase my terminal window just a bit

  • here, save my file, and let me go ahead and compile make memory.

  • OK, all is well.

  • It compiled without any error messages, and now

  • let me go ahead and run memory, enter.

  • All right, so that worked pretty well.

  • Let's actually be a little more explicit here just for good measure.

  • Let me go ahead and print something out.

  • So printf, %i for an integer, and let's make it just more explicit.

  • You inputted %i and then comma x bracket 10.

  • And what do I have to include you use printf?

  • AUDIENCE: stdio.h.

  • DAVID J. MALAN: Yeah, so stdio.

  • So let's just quickly add that, stdio.h, save.

  • All right, let me recompile this, make memory, enter.

  • And now let me go ahead and do ./memory.

  • Huh?

  • Feels like it's a correct program.

  • And yet, for a couple of weeks now we've been claiming that mm-hmm,

  • don't do that.

  • Don't go beyond the boundaries of your array.

  • So how do we reconcile this?

  • Feels like buggy code or at least we've told you it's buggy code,

  • and yet it's working.

  • Yeah?

  • AUDIENCE: [INAUDIBLE]

  • DAVID J. MALAN: That's a good way of putting it.

  • AUDIENCE: It's still very similar.

  • We want that.

  • DAVID J. MALAN: OK.

  • AUDIENCE: So we can theoretically--

  • it just created a program.

  • DAVID J. MALAN: Yeah, and I think if I heard you correctly,

  • you said C doesn't scream if you go too far?

  • AUDIENCE: Yeah.

  • DAVID J. MALAN: Yeah, OK.

  • So that's a good way of putting it.

  • Like, you can get lucky in C. And you can

  • do something that is objectively, pedagogically, like technically wrong,

  • but the computer's not going to crash.

  • It's not going to freeze because you just get lucky.

  • Because often, for performance reasons, when

  • you allocate space for 10 integers, you're

  • actually going to get a chunk of memory back

  • that's a little bigger than you need.

  • It's just not safe to assume that it's bigger than you need,

  • but you might just get lucky.

  • And you might end up having more memory that you can technically get away

  • with touching or accessing or changing, and the computer's not going to notice.

  • But that's not safe because on someone else's Mac or PC,

  • their computer might just be operating a little bit differently than yours,

  • and bam, that bug is going to bite them and not you.

  • And those are the hardest, most annoying bugs to chase down as some of you

  • might have experienced.

  • Right?

  • It works on your computer but not a friends or vise versa.

  • These are the kinds of explanations for that.

  • So Valgrind can help us track down even these most subtle errors.

  • The program seems to be working.

  • Check50 or tools like it might even assume

  • that it's working because it is printing the right thing,

  • but let's take a look at what this program Valgrind thinks.

  • Let me increase the size of the terminal window here,

  • and go ahead and type in Valgrind ./memory.

  • So same program name ./memory but I'm prefixing it with the name Valgrind.

  • All right?

  • Unfortunately, Valgrind is really quite ugly,

  • and it prints out a whole bunch of stuff here.

  • So let's take a look.

  • At the very top, you'll see all these numbers on the left,

  • and that's just an unfortunate aesthetic.

  • But we do see some useful information.

  • Invalid read of size 4 and then it has these cryptic

  • looking letters and numbers.

  • What are those?

  • They're just addresses and hexadecimal.

  • It doesn't really matter what they are, but Valgrind

  • can tell us where the memory is that's acting up suspiciously.

  • You can then see next to that, that Valgrind is pointing

  • to function f on memory. c 15th line.

  • So that's perhaps helpful, and then main on line 8

  • because that's the function that was called.

  • So Valgrind is actually kind of nice in that it's showing us all the functions

  • that you called from bottom up, much like the stack from last week.

  • And so something's going wrong line 15, and if we go back to that,

  • let's see line 15 was--

  • well, sure enough.

  • I'm actually trying to access that memory location

  • and frankly I did it on line 14 as well.

  • So hopefully fixing one or both of those will address this issue.

  • And notice here, this frankly just gets overwhelming pretty quickly.

  • And then, oh, 40 bytes in one block are definitely lost in lost record.

  • I mean, this is the problem with Valgrind, honestly.

  • It was written some years ago, not particularly user friendly,

  • but that's fine we have a tool to address this.

  • Let me go ahead and rerun Valgrind with help50,

  • enter, and see if we can't just assist with this.

  • All right, so still the same amount of black and white input but down here now

  • help50 is noticing, oh, I can help you with an invalid write of size 4.

  • So it's still at the same location, but this time--

  • or rather same file, memory.c but line 14.

  • And we propose, looks like you're trying to modify 4 bytes of memory that

  • isn't yours, question mark.

  • Did you try to store something beyond the bounds of an array?

  • Take a closer look at line 14 of memory.c.

  • So hopefully, even though Valgrind's output is crazy esoteric,

  • at least that yellow output will point you toward, ah, line 14.

  • I'm indeed touching 4 bytes, an integer, that shouldn't be.

  • And so let's go ahead and fix this.

  • If I go into my program, and I don't do this.

  • Let's change it to location 9, and location 9 here and save.

  • Then let me go ahead and rerun Valgrind without help50.

  • All right, progress except--

  • oops.

  • Nope, no progress.

  • I skipped the step.

  • Yeah, I didn't recompile it.

  • A little puzzled why I saw the same thing.

  • So now let's rerun Valgrind and here it seems to be better.

  • So I don't see that same error message up

  • at the very top like we did before, but notice here, 40 bytes in one blocks.

  • OK, that was bad grammar in the program, but are definitely

  • lost in loss record 1 of 1.

  • So I still don't quite understand that.

  • No big deal.

  • Let's go ahead and run help50 and see what the second of two errors

  • apparently is here.

  • So here it's highlighting those lines.

  • 40 bytes and one blocks are definitely lost, and looks like your program

  • leaked 40 bytes of memory.

  • Did you forget the free memory that you allocated with malloc?

  • Take a closer look at line 13 of memory.c.

  • So in this case line 13 indeed has a call to malloc.

  • So what's the fix for this problem?

  • AUDIENCE: Free.

  • DAVID J. MALAN: Per help50 or your own intuition?

  • What do I have to add to this program?

  • AUDIENCE: Free.

  • AUDIENCE: Free.

  • Yeah, free, and where does that go?

  • Right here.

  • So we can free the memory.

  • Why would this be bad?

  • AUDIENCE: [INAUDIBLE]

  • DAVID J. MALAN: Exactly.

  • We're freeing the memory, which is like saying to the operating system,

  • I don't need this anymore.

  • And yet, two lines later we're using it again and again.

  • So bad.

  • We didn't do that mistake last week, but you should only

  • be freeing memory when, literally, you're

  • ready to free it up and give it back, which should probably

  • be at the end of the program.

  • So let me go ahead and re-save this, Open, up my terminal window,

  • recompile it this time, and now, let me run Valgrind one last time

  • without help50.

  • And still a little verbose, but zero errors, from zero contexts.

  • That sounds pretty good.

  • And moreover, it also explicitly says, all heap blocks were freed.

  • And recall that the heap, is that chunk of memory

  • that we drew visually up here, which is where malloc takes memory from.

  • So, done.

  • So this is kind of the mentality with which

  • to have when approaching the correctness of your code.

  • Like, it's one thing to run sample inputs, or run the program like I did.

  • All looked well.

  • It's one thing to run tools like check50, which we humans wrote.

  • But we too are fallible, certainly, and we might not think of anything.

  • And thankfully, smart humans have made tools, that at first glance,

  • might be a little hard to use.

  • Like debug 50, as is Valgrind now.

  • But they ultimately help you get your code 100% correct

  • without you having to struggle visually over just staring at the screen.

  • And we see this a lot in office hours, honestly.

  • A lot of students, to their credit, sort of reasoning through, staring

  • at the screen, just trying to understand what's going wrong,

  • but they're not taking any additional input other than the characters

  • on the screen.

  • You have so many tools that can feed you more and more hints along the way.

  • So do acquire those instincts.

  • Any questions on this?

  • Yeah?

  • AUDIENCE: Sir, if you had a main function that took arguments.

  • Would you run Valgrind with those arguments as well?

  • DAVID J. MALAN: Yes, indeed.

  • So Valgrind works just like debug 50, just like help50.

  • If you have command line arguments, just run them as usual,

  • but prefix your command with Valgrind, or maybe even help50 Valgrind,

  • to help one with the other.

  • Good question.

  • Other thoughts?

  • Yeah?

  • AUDIENCE: Where does the data go [INAUDIBLE]??

  • DAVID J. MALAN: Good question.

  • So at the end of the day, think about what's

  • inside the computer, which is just something like this.

  • So physically, it's obviously still there.

  • It's just being treated by the operating system--

  • Mac, OS, Windows, Linux, whatever, as like a pool of memory.

  • We keep drawing it as a grid that looks a little something like this.

  • So the operating systems job is to just keep track of which of those squares

  • is in use, thanks to malloc.

  • And which has been freed.

  • And so you can think of it as having little check

  • marks next to them saying, this is in use, this is in use,

  • these others are not in use.

  • So they just go back on the so-called free list into that pool of memory.

  • Good question.

  • If you take a higher level course on operating systems in fact,

  • or CS61 or 161 at Harvard, you'll actually build these kinds of things

  • yourself.

  • And implement tools like, malloc, yourself.

  • Yeah?

  • AUDIENCE: So why did we have to allocate memory in this case, and what happens

  • [INAUDIBLE]?

  • DAVID J. MALAN: Good question.

  • Why did we have to allocate memory in this case?

  • We did not.

  • This was purely, as mentioned, for demonstration purposes.

  • If we had some program in which we wanted

  • to allocate some amount of memory, then this is how we might do it.

  • However, a cleaner way to do all of this,

  • would have been to say, hey, computer, give me 10 integers like this,

  • and not have to worry about memory management.

  • And that's where we began in week one, just using arrays on the stack,

  • so to speak.

  • Not using malloc at all.

  • So the point is only, that once you start using malloc, and free,

  • and memory more generally, you take on more responsibilities

  • than we did in week one.

  • Good question.

  • And the others?

  • All right.

  • So, turns out, there's one more tool, in all seriousness.

  • This is the thing.

  • [? DDB50. ?] So debug 50 is an allusion to a very popular tool called, GDB 50,

  • [? Gnu ?] debugger.

  • It's an older tool that you won't use at the command line,

  • but it's what makes debug 50 work.

  • Turns out, there's a thing.

  • And there's an actual Wikipedia article that you

  • might have clicked on in my email last night, called rubber duck debugging.

  • And frankly, you don't have to go as all out, as excessive, as we did here,

  • but the purpose of this technique, of rubber duck debugging,

  • is to keep, literally, like a rubber duck on your shelf, or on your desk.

  • And when you have a bug and you don't have the luxury of a teaching fellow,

  • or a roommate who took CS50, or a more technical friend who can help walk you

  • through your code, literally, start walking through your code

  • verbally, talking to the duck saying, well, online 2, I'm declaring main,

  • and on line 3, I'm allocating space for an array.

  • And then, on line 4, I'm calling-- ah!

  • That's what I'm doing wrong.

  • So if any of you have ever had that kind of moment, whether in office hours,

  • or alone, where you're either talking in your head,

  • or you're talking through your code to someone else.

  • And here, she doesn't even have to respond.

  • You just hear yourself saying the wrong thing, or having that aha moment.

  • You can approximate that by just keeping one of these little guys on your desk,

  • and have that conversation.

  • And it's actually not as crazy sounding as it actually is.

  • It's that process of just talking through your code logically,

  • step by step, in a way that you can't necessarily do in your own mind.

  • At least I can't.

  • When you hear yourself say something wrong,

  • or that didn't quite follow logically, bam, you

  • can actually have that aha moment.

  • So on the way out today, by all means, take any one of these ducks.

  • That took quite a long, time for [? Colten ?] to lay out today.

  • And we'll have more at office hours in the weeks to come, if you would like.

  • So some of you might recall such a duck from [? Currier ?] House

  • last year too, which was a cousin of his as well.

  • All right.

  • So that is rubber duck debugging.

  • Now, last week, recall that we began to take off training wheels.

  • We'd use for a few weeks, the CS50 library.

  • And that's kind of in the past now.

  • That was just a technique, a tool, via which

  • we could get user input a little more pleasantly, than if we actually

  • started dealing with memory early on.

  • And we revealed last week that a "string", quote, unquote,

  • is just what, underneath the hood in C?

  • Say again.

  • An array of characters.

  • And even more specifically, it's a synonym S-T-R-I-N-G for what actual

  • data type?

  • char star, as we've called it.

  • So a char star is just the computer scientists

  • way of describing a pointer to a character,

  • or rather the address of a character, which

  • is functionally equivalent to saying an array of memory, or sequence of memory.

  • But it's kind of the more precise, more technical way of describing it.

  • And so now that we know that we have char stars underneath the hood, well,

  • where is all of that coming from?

  • Well, indeed, it maps directly to that memory.

  • We keep pointing out that something like this is inside of your computer.

  • And we can think of the memory as just being chunks of memory,

  • all of whose bytes are numbered.

  • 0 on up to 2 gigabytes, or 2 billion, whatever the value might be.

  • But of course last week, we pointed out that you think about this memory

  • not as being hardware per se, but as just being this pool of memory that's

  • divided into different regions.

  • The very top of your computer's memory, so to speak,

  • is what we call the text segment.

  • And what goes in the text segment of your computer's memory

  • when you're running a program?

  • Text is like, poor choice of words, frankly, but what is it?

  • Say again.

  • AUDIENCE: File Headers?

  • DAVID J. MALAN: Not the file headers, in this case.

  • This is in the context of running a program, not necessarily saving a file.

  • Yeah?

  • AUDIENCE: String literals.

  • DAVID J. MALAN: Not string literals here,

  • but they're nearby, actually, in memory.

  • AUDIENCE: Functions.

  • DAVID J. MALAN: Functions, closer.

  • Yeah.

  • The text segment of your computer's memory

  • is where, when you double click a program to run it,

  • or in Linux, when you do dot flash something, to run it.

  • That's where the zeros and ones of your actual program, the machine code,

  • that we talked about in week zero, is just loaded into RAM.

  • So recall from last week, that, you know, anything physical in this world--

  • hard drives, solid state drives, is slow.

  • So those devices are slow, but RAM, the stuff we keep pulling up on the screen,

  • is relatively fast.

  • If only because it has no moving parts.

  • It's purely electronic.

  • So when you double click a program on your Mac or PC,

  • or do dot slash something in Linux, that is

  • loading from a slow device, your hard drive,

  • where the data is stored long term, into RAM or memory,

  • where it can run much more quickly and pleasurably in terms of performance.

  • And so, what does this actually mean for us?

  • Well, it's got to go somewhere.

  • We just decided, humans, years ago that it's

  • going to go at the top, so to speak, of this chunk of memory.

  • Below that though, are the more dynamic regions of memory--

  • the stack and the heap.

  • And we said this a moment ago, and last week as well, what goes on the heap?

  • Or who uses the heap?

  • AUDIENCE: Dynamic memory.

  • DAVID J. MALAN: Dynamic memory.

  • Any time you call malloc, you're asking the operating system

  • for memory from the so-called heap.

  • Anytime you call free, you're sort of conceptually putting it back.

  • Like, it's not actually going anywhere.

  • You're just marking it as available for other functions and variables to use.

  • The stack, meanwhile, is used for what?

  • AUDIENCE: Local variables.

  • DAVID J. MALAN: Local variables and any of your functions.

  • So main, typically takes a sliver of memory at the bottom.

  • If main calls another function, it gets a sliver of memory above that.

  • If that function calls one, it gets a sliver of memory above that.

  • So they each have their own different regions of memory.

  • But of course, these arrows, both pointing at each other,

  • doesn't seem like such a good design.

  • But the reality, is bad things can happen.

  • You can allocate so much memory that, bam, the stack overflows the heap.

  • Or the heap overflows the stack.

  • Thus was born websites like Stack Overflow, and the like.

  • But that's just a reality.

  • If you have a finite amount of memory, at some point,

  • something's going to break.

  • Or the computer's going to have to say, mm-mm, no more memory.

  • You're going to have to quit some programs, or close some files,

  • or whatnot.

  • So that was only to say that that's how the memory is laid out.

  • And we started to explore this by way of a few programs.

  • This one here-- it's a little dark here.

  • This one here, was a swap function.

  • Now it's even darker.

  • It was a swap function that actually did swap two values, A and B.

  • But it didn't actually work in the way we intended.

  • What was broken about this swap function last week?

  • Like, I'm pretty sure it worked.

  • And when our brave volunteer came up and swapped the orange juice and the milk,

  • that worked.

  • So like, the logic was correct, but the program itself did not work.

  • Why?

  • AUDIENCE: It changed the values of the copy variables.

  • DAVID J. MALAN: Exactly.

  • It changed values in the copies of the variable.

  • So recall, that when main was the function

  • we called, and it had two values, x and y, that chunk of memory was here.

  • That chunk of memory was here.

  • And it had like the numbers 1 and 2.

  • But when it called the swap function, that got its own chunk of memory.

  • So main was at the bottom, swap was above that.

  • It had its own chunks of memory called, a and b, which

  • initially, got the values 1 and 2.

  • 1 and 2 were indeed successfully swapped,

  • but that had no effect on x and y.

  • So we fixed that.

  • With the newer version of this program, of course,

  • it looked a lot more cryptic at first glance, but in English,

  • could someone just describe what it is that happens

  • in this example that was more correct?

  • Like, what does this program do line by line?

  • Yeah?

  • AUDIENCE: Instead of passing copies of the variables,

  • you pass pointers to their addresses.

  • DAVID J. MALAN: Exactly.

  • Instead of passing the values of the variables, thereby copying them,

  • it passes the addresses of those variables.

  • So that's like saying, I don't technically care where it is in memory,

  • but I do need to know that it is somewhere in memory.

  • So instead of passing an x in the number 1,

  • let's suppose that x is at location 100--

  • my go to example.

  • It's actually the number 100 that's going to go there.

  • And if y is at the location like, 104, well, it's

  • 104 that's going to go there, which are not the values we want to swap,

  • but those are sort of like little maps, or breadcrumbs if you will,

  • that lead us to the right location.

  • So that when we execute this code, what we're ultimately

  • swapping in those three lines, is this and this, and all along the way,

  • recall, we're using a temporary variable there

  • that can be just thrown away after.

  • So that's what pointers allowed us to do.

  • And that's what allowed us to actually change values on the so-called stack,

  • even by calling on other function.

  • All right.

  • Any questions then, on where we left off last time with the stack and with swap?

  • No?

  • All right.

  • So recall we introduced Binky as well, who lost his head at one point,

  • but why?

  • What went horribly, horribly awry with this scene from last week's film

  • from Stanford?

  • Binky was doing everything correctly, right?

  • Like, moving values.

  • 42 was successful.

  • And then, yeah?

  • AUDIENCE: He tried to dereference something that

  • wasn't pointing to any actual address.

  • DAVID J. MALAN: Exactly.

  • He tried to dereference a pointer, an address, that wasn't actually pointing

  • to a valid address.

  • Recall that this was the line in code in question that was unlucky and bad.

  • Star y, means, go to the address in y, and do something to it.

  • Set it equal to the number 13.

  • But the problem was, that in the code we looked at last week,

  • all we did at the start was say, hey, computer give me a pointer to an int,

  • and call it x.

  • Do the same, and call it y.

  • Allocate space and point x at it.

  • But we never did the same for y.

  • So whereas x contained, last week, the address of an actual chunk of memory,

  • thanks to malloc, what did y contain at that point in the story?

  • The yellow line there.

  • What did y contain?

  • What value?

  • AUDIENCE: Null.

  • DAVID J. MALAN: Null.

  • Maybe.

  • Maybe.

  • But it's not obvious because there's no mention of null in the program.

  • We might get lucky.

  • Null is just 0.

  • And sometimes we've seen that 0 are the default values in a program.

  • So maybe.

  • But I say, maybe, and I'm hedging why.

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: Yeah.

  • And it doesn't allocate-- well, allocate, is not quite the right word.

  • That suggests you are allocating actual memory.

  • It's a garbage value.

  • There's something there.

  • Right?

  • My Mac has been running for a few hours.

  • And your Macs, and PCs, and phones, are probably running all day long.

  • Or certainly when the lid is up.

  • And so, your memory is getting used, and unused, and used.

  • Like, lots of stuff is going on.

  • So your computer is not filled with all zeros or all ones.

  • If you look at it at some random point in the day,

  • it's filled with like bunches and bunches of zeros and ones

  • from previous programs that you quit long ago.

  • Windows you have in the background and the like.

  • So, the short of it is, when you're running

  • a program for the first time, that's been running now for some time,

  • it's going to get messy.

  • That big rectangle of memory is going to have some ones over here

  • some zeros over here and vise versa.

  • So they're garbage values, because those bytes have some values in them.

  • You just don't necessarily know what they are.

  • So the point is, you should never ever dereference a pointer

  • that you have not set yourself.

  • Maybe you will crash.

  • Maybe it won't crash.

  • Valgrind can help you find these things but sometimes.

  • But it's just not a safe operation.

  • And lastly, the last thing we introduced last week,

  • which will be the stepping stone for what problems we'll solve this week,

  • was struct.

  • So struck is kind of cool, in that you can design your own custom data

  • structures.

  • C is pretty limited out of the box, so to speak.

  • You only have chars and boules, and floats, and ints, and doubles,

  • and longs, and str--

  • well, we don't even have strings, per se.

  • So it doesn't really come with many features, like a lot of languages do.

  • Like Python, which we'll see in a few weeks.

  • So with struct in C, you have the ability

  • to solve some problems of your own.

  • For instance, with the struct, we can actually

  • start to implement our own features.

  • Or our own data types.

  • For instance, let me go up here.

  • And let me go ahead and create a file called say,

  • student, or rather destruct dot h.

  • So recall that dot h is a header file.

  • Thus far, you have used header files that other people made.

  • Like, CS50 dot h, and standard IO dot h, and standard [? lid ?] dot h,

  • but you can make your own.

  • Header files are just files that typically contain code that you

  • want to share across multiple programs.

  • And we'll see more of this in time.

  • So let me go ahead and just save this file.

  • And suppose that I want to represent a student in memory.

  • A student of course, is probably going to have what?

  • For instance, how about a string for their name,

  • a string for their dorm-- but string is kind of two weeks ago.

  • Lets call this char star.

  • And lets call name, char star.

  • And so you might want to associate like, multiple pieces of data with students.

  • Right?

  • And you don't want to have multiple variables, per se.

  • It would be nice to kind of encapsulate these together.

  • And recall at the very end of last week, we

  • saw this feature where you can define your own type,

  • with typedef, that is a structure itself.

  • And you can give it a name.

  • So in short, simply by executing this these lines of code,

  • you have just created your own custom data type.

  • It's now called student.

  • And every student in the world shall have, per this code, a name

  • and a dorm associated with them.

  • Now, why is this useful?

  • Well the program, we looked at the very end of last time looked

  • a little something like this.

  • Instruct zero dot c, we had the following,

  • I first allocated some amount of space for student.

  • I asked the user what's the enrollment in the class or whatnot?

  • That gives us an int.

  • And then, we allocated an array of type student, called students, plural.

  • This was an alternative, recall, to doing something

  • like this, string names enrollment, and string dorms enrollment.

  • Which would work.

  • You could have two separate arrays, and you'd just

  • have to remember that name zero and dorm zero is the same human.

  • But why do that if you can keep things together.

  • So with structs, we were able to do this.

  • Give me this many student structures, and call the whole array, students.

  • And the only new syntax we introduce to satisfy this goal, was what operator?

  • AUDIENCE: The dot.

  • DAVID J. MALAN: The dot.

  • Yeah.

  • So in the past, recall from like week two, we introduced arrays.

  • And arrays allow you to do square bracket notation.

  • So that is no different from a couple of weeks back.

  • But if your array is not storing just integers, or chars, or floats,

  • or whatever, it's actually storing a structure, like a student,

  • you can get at that student's name by literally just saying dot name.

  • And you can get at their dorm by doing dot dorm.

  • And then everything else is the same.

  • This is what's called, encapsulation.

  • And it's kind of like a fundamental principle of programming

  • where, if you have some real world entity, like a student,

  • and you want to represent students with code, yeah,

  • you can have a bunch of arrays that all have called names, dorms, emails, phone

  • numbers, but that just gets messy.

  • You can instead encapsulate all of that related Information about a student

  • into one data structure so that now you have, per week zero, an abstraction.

  • Like, a student is an abstraction.

  • And if we break that abstraction, what is a student actually?

  • Not in the real world, but in our code world here?

  • Student is an abstraction.

  • It's a useful word, all of us can kind of agree means something,

  • but technically, what does it apparently mean?

  • A student is actually a name in a dorm, which really kind of is

  • diminutive to everyone in this room, but we've distilled it in code

  • to just those two values.

  • So there we have encapsulation.

  • You're kind of encapsulating together multiple values.

  • And you're abstracting away just have a more useful term,

  • because no one is going to want to talk in terms of lines of code

  • to describe anything.

  • So, same topic as in the past.

  • So, now we have the ability to come up with our own custom data structures

  • it seems.

  • That we can store anything inside of them that we want.

  • So let's now see how poorly we've been designing

  • some things for the past few weeks.

  • So it turns out that much of the code, hopefully

  • we've been writing in recent weeks has been correct,

  • but we've been not necessarily designing solutions in the best way.

  • Recall that when we have this chunk of memory,

  • we've typically treated it as at most, an array.

  • So just a contiguous chunk of memory.

  • And thanks to this very simple mental model, do we get strings,

  • do we get arrays of students now.

  • But arrays aren't necessarily the best data structure in the world.

  • Like, what is a downside of an array if you've encountered ones thus far.

  • In C, what's a downside of an array?

  • Yeah?

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: Can or cannot?

  • AUDIENCE: Cannot.

  • DAVID J. MALAN: You cannot.

  • That is true.

  • So in C, you cannot mix data types inside of an array.

  • They must all be ints, they must all be chars, they must all be students.

  • It's a bit of a white lie because technically, you

  • can have something called a void star, and you can actually map-- but yes.

  • That is true though, strictly speaking-- cannot mix data types.

  • Though frankly, even though other languages let you do that,

  • it's not necessarily the best design decision.

  • But sure, a limitation.

  • Other thoughts.

  • Yeah?

  • AUDIENCE: The size cannot change.

  • DAVID J. MALAN: The size cannot change.

  • Let's focus on that one.

  • Because that's sort of even more constraining it would seem.

  • So if you want an array for, say, two values, what do you do?

  • Well, you can do something like int, x, bracket, 2, semi-colon.

  • And what does that actually give you inside of your computer's memory?

  • It gives you some chunk that we'll draw a rectangle.

  • This is location 0.

  • This is location 1.

  • Suppose that, oh, a few minutes later, you change your mind.

  • Oh, darn, I just took a--

  • I want to type in a third value, or I want

  • to add another student to the array.

  • Where do you put that?

  • Well, you don't.

  • If you want to add a third value to an array of size 2,

  • what's your only option in C?

  • AUDIENCE: You make a new array.

  • DAVID J. MALAN: You make a new array.

  • So literally.

  • And if this array had the number like 42,

  • and this had the number 13, the only way to add a third number is to allocate

  • a second array, copy the values into the same locations, 42, 13, and then,

  • we'll add another value, 50.

  • And then, so that you're not using up twice as much space

  • almost permanently, now you can sort of free somehow,

  • or stop using that chunk of memory.

  • So that's fine.

  • It's correct what we just did.

  • But what's the running time of that process?

  • Recall a couple of weeks ago, we started talking about efficiency and design.

  • What's the running time of resizing an array.

  • AUDIENCE: Too long.

  • DAVID J. MALAN: Say Again.

  • AUDIENCE: I said, too long.

  • DAVID J. MALAN: Too long.

  • Fair.

  • But let's be more precise.

  • Big o of-- big o of what?

  • AUDIENCE: N.

  • DAVID J. MALAN: N. What's n?

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: OK.

  • True.

  • But what does n represent?

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: Yeah.

  • So you don't actually have to not know.

  • It's just a general answer.

  • In this case, however long the array is, call it n.

  • It is that many steps to resize it into that plus 1.

  • Technically it's big o, over n, plus 1.

  • But recall in our discussion, "The big o notation," we just

  • ignore the smaller terms-- the plus 1s, the divided by 2s, the plus n.

  • We focus only on the most powerful term in the expression, which

  • is just n here.

  • So yes, if you have an array of size 2, and you resize it

  • into an array of size 3, or really, n plus 1, that's

  • going to take me roughly n steps.

  • Technically n plus 1 steps.

  • But n steps.

  • Ergo big o of n.

  • So it's a linear process.

  • So possible but not necessarily the fastest

  • thing because he literally had to move all those damn values around.

  • So what would be better than this?

  • And if you've programed before, you might have the right instincts already.

  • How do we solve this problem?

  • Yeah?

  • AUDIENCE: Would you allocate more memory at the end of the array?

  • DAVID J. MALAN: Reallocate more memory at the end of the array.

  • So it turns out c does have a function called, realloc.

  • Perfectly, if not obviously, named that reallocates memory.

  • And if you pass it, the address of a chunk of memory you've allocated,

  • and the operating system notices, oh, yeah you got lucky.

  • I've got more memory at the end of this array,

  • it will then allocate that additional RAM for you, and let you use it.

  • Or worst case, if there's nothing available at the end

  • of the array in memory, because it's being

  • used by something else in your program.

  • That's fine.

  • Realloc will take on the responsibility of creating another array somewhere

  • in memory, copying all of that data for you into it,

  • and returning the address of that new chunk of memory.

  • Unfortunately, that's still linear.

  • Yeah?

  • AUDIENCE: Is this all being done in the heap?

  • Or--

  • DAVID J. MALAN: This is all being done in the heap.

  • Malloc, and realloc, and free, all operate on the heap.

  • Yes.

  • So that is a solution, but it doesn't really speak to the efficiency.

  • Yeah?

  • AUDIENCE: Could you use linked list?

  • DAVID J. MALAN: Yeah.

  • What is a linked list?

  • Go ahead.

  • AUDIENCE: It's when you have an element that points to different elements.

  • DAVID J. MALAN: OK.

  • Points to other elements.

  • Yeah.

  • So let me speak to what's the fundamental issue here.

  • The fundamental problem is much like painting yourself into a corner,

  • so to speak, as the cliche goes.

  • With an array, you're deciding in advance how big the data structure is

  • and committing to it.

  • Well, what if you just do the opposite.

  • Don't do that.

  • If you want initially, room for just one value, say one integer,

  • only ask the computer for that.

  • Give me space for one integer and I'll put my number 42 in here.

  • And then, if and only if, you want a second integer,

  • do you ask the computer for a second integer.

  • And so the computer, as by a malloc, or whatnot, will give you another one

  • like, the number 13.

  • And if you want a third, just ask the same question of the operating system.

  • Each time just getting back one chunk of memory.

  • But there's a fundamental gotcha here.

  • There's always a trade off.

  • So yes, this is possible.

  • You can call malloc three times.

  • Each time asking for a chunk of memory of size 1, instead of size 3,

  • for instance.

  • But what's the price you pay?

  • Or what problem do we still need to solve?

  • Yeah?

  • AUDIENCE: They're not stored next to each other.

  • DAVID J. MALAN: Yeah.

  • They're not being stored next to each other.

  • So even though I can think of this as being the first element, the second,

  • and the third, you do not have, in this story, random access to elements.

  • And random access, ergo, random access memory, or RAM,

  • just means that arithmetically, like, mathematically, you

  • can jump to location 0, location 1, location 2, randomly, or in constant

  • time.

  • Just instantly.

  • Because if they're all back to back to back, all you have to do is like,

  • add 1, or add 4, or whatever to the address, and you're there.

  • But the problem is, if you're calling malloc again and again

  • and again, there's no guarantee that these things are even

  • going to be proximal to one another.

  • These second chunks of memory might end up--

  • if this is a big chunk of memory we've been talking about,

  • where the heaps up here, and the stacks down here--

  • 42 might end up over here.

  • The next chunk of memory, 50, might end up over here.

  • The third chunk might end up over here.

  • So you can't just jump from location 0, to 1, to 2,

  • because you have to somehow remember where location 0, and 1, and 2, are.

  • So how do we solve this?

  • Even if you haven't programed before, like, what would a solution be here?

  • AUDIENCE: Somehow store [INAUDIBLE].

  • DAVID J. MALAN: OK.

  • Somehow storing the addresses of--

  • AUDIENCE: Of the [INAUDIBLE]

  • DAVID J. MALAN: All right.

  • So let's just suppose, for the sake of discussion, that this chunk of memory

  • ended up at location 100.

  • This one ended up at like 150.

  • This one ended up at like 475.

  • Whatever those values are.

  • It would seem that somehow or other I need to remember three values--

  • 100, 150, and 475.

  • So where can I store that?

  • Well, it turns out, I can be a little clever but a little greedy.

  • I could say to malloc, you know what, every time I call you, don't just

  • give me space for an integer, give me space for an integer

  • plus the address of another integer.

  • So if you've ever kind of seen like popcorn strung together on a string,

  • or any kind of chain link fence where one link is linking to another.

  • We could create the equivalent of-- oops not that.

  • We could create the equivalent of this kind of picture,

  • where each of these squares, or nodes, we'll start calling them, kind of links

  • graphically to the other.

  • Well, we've seen these links, or these pointers,

  • literally arrows that are pointing implemented in code.

  • An arrow or a pointer is just an address.

  • So you know what?

  • We should just ask malloc not for enough space for just the number 42,

  • we should instead, ask for a little more memory in each of these squares,

  • making them pictorially rectangles now.

  • So that now, yes, we do have these arrows conceptually

  • pointing from one location to another.

  • But what values do I actually want to put in these new additional boxes?

  • AUDIENCE: The addresses of the next.

  • DAVID J. MALAN: The addresses of the next.

  • So they're like little breadcrumbs.

  • So in this box here, associated with the first value,

  • should be the address of my second value, 475.

  • Associated with my second value here, per the arrow--

  • and let me draw the arrow from the right place.

  • --from the arrow, should be the address 150, because that's the last.

  • And then, from this extra box, what should I put there?

  • Yeah?

  • AUDIENCE: Slash 0 or something?

  • DAVID J. MALAN: Yeah.

  • So probably, the equivalent of slash 0, which in the world of pointer's recall,

  • is null.

  • So just a special value that means that's it, this is the end of the line.

  • That still leaves us with room to add a fourth value and point to it,

  • but it for now, signifies very clearly to us there's nothing actually there.

  • So what did we just do?

  • We created a list of values 50, oh sorry 42, 50, 13,

  • but we linked to them together.

  • First, pictorially, with just arrows.

  • Like any human might with a piece of chalk.

  • But technically in code, we could do this

  • by just storing addresses in each of these places.

  • So just to be clear then, what might this actually translate to in code?

  • Well, what if I proposed this.

  • In code, we might do something like this.

  • If we want to store an integer.

  • We're of course, going to need to store like int n, we'll call it.

  • n will represent 42, or 50, or 13.

  • But if we want to create a data structure,

  • we might want to start giving this data structure a name.

  • I called it, a moment ago, node, which is a CS term for a node in a linked

  • list, so to speak.

  • And it looks like this.

  • So typedef means, give me my own type.

  • Struct means, make it a structure, like a student was.

  • And then, node, which is going to be the name of this thing.

  • And I'll explain in a moment why I have the word node twice this time.

  • But I left room on the board for just one more line.

  • In addition to an int, called n, or whatever,

  • I need to somehow represent in code, the additional memory

  • that I want malloc to give me for the address.

  • So first of all, these are addresses of what data types?

  • Each of those three new boxes.

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: They are the addresses of integers in that point in the story.

  • But technically, what is this box really pointing to?

  • Is it pointing specifically to the ints?

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: It's pointing to that whole chunk of memory, if you will.

  • So if you start thinking about each of these rectangles as being a node,

  • and each of the arrows as pointing to another node,

  • we need to somehow express, I need to somehow store a pointer to a node.

  • In other words, each of these arrows needs to point to another node.

  • And in code, we could say this.

  • Right?

  • Like, let's give it a name.

  • Instead of n, which is the number, let's call it next.

  • So next, shall be the name of this field that points to the next node in memory.

  • And node star, what does that mean in English, if you will?

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: Say again?

  • AUDIENCE: Pointing to an address.

  • DAVID J. MALAN: Pointing to an address.

  • Right?

  • It looks different.

  • Node is a new word today and that's fine.

  • But node star, just means a pointer to a node.

  • The address of a node.

  • And it turns out that this is a custom structure

  • so we actually have to say this.

  • But it's the same principle even though things are kind of escalating quickly

  • here, we just need to values, an int, and then, a pointer to another thing.

  • That other thing is going to be another node.

  • And we're just using a node, frankly, to encapsulate two values--

  • an int and a pointer.

  • And the way you express in C, albeit somewhat cryptically,

  • a pointer, or one of those arrows, is you say give me a variable called next,

  • have it point to a structure called node.

  • Or rather, have it be the address of a structure of type node.

  • Yeah?

  • AUDIENCE: How can you [? reveal ?] the timing of struct node [INAUDIBLE]??

  • DAVID J. MALAN: Good question.

  • So this feels like a circular kind of definition because I'm defining a node,

  • and yet, inside of a node is a node.

  • That is OK because of the star.

  • It is necessary in C--

  • remember that C always is kind of read top to bottom.

  • So accordingly, this very first line of code here, typedef struct note,

  • at that point in the story, when clang has read that line,

  • it knows that a phrase, struct node, exists.

  • AUDIENCE: That's why you say nodes [INAUDIBLE]..

  • DAVID J. MALAN: Exactly.

  • Exactly.

  • We didn't need to do this with students because there were

  • no pointers involved to other students.

  • But yes, in this case.

  • So in short, this tells clang, hey, clang, give me a structure called node.

  • And then, in here, we say, hey, clang, each of those nodes

  • shall have two things, an integer called n,

  • and a pointer to another one of these data structures of type node,

  • and call the whole thing, node.

  • It's a bit of a mouthful.

  • But all this is, is the following.

  • Let me go ahead and erase all of this.

  • All this data type is--

  • if we get rid of the picture we draw on the fly there.

  • --is this says, hey, clang, give me a data structure

  • that pictorially looks like this.

  • It's divided into two parts.

  • The first part is called n, the second type is called, next.

  • This data type is of type int.

  • This is a pointer to another such node.

  • And that's it.

  • Even though the code looks complex, the idea is exactly that.

  • Yeah?

  • AUDIENCE: [INAUDIBLE]?

  • Why do you have to say struct node again?

  • DAVID J. MALAN: Good question.

  • The reason is, as just came up a moment ago, clang

  • and C, in general, are kind of dumb.

  • They just read code top to bottom.

  • And the problem is, you have to declare the name of this structure

  • as being a struct node before you actually use it.

  • It's similar in spirit to our discussion of prototypes-- y functions need

  • to be mentioned way up top.

  • This just says to clang, give me a type called struct node.

  • You don't know what it's going to look like yet.

  • But I'll finish my thought later.

  • And then in here, we're just telling clang, inside of that node

  • should be an integer, as well as, a pointer to the very type of thing

  • I'm in the middle of defining.

  • But if I had left off the word node up there, and just said struct,

  • you couldn't do that because it hasn't seen the word N-O-D-E yet.

  • That's all.

  • Other questions?

  • All right.

  • So if I now have a data structure called node,

  • I can use it to kind of stitch together these linked lists.

  • And maybe just the very things a little bit,

  • and to start giving away some ducks, would folks

  • be comfortable with volunteering to solve a problem here?

  • Yeah?

  • OK.

  • Come on up.

  • 1, 2--

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: Sure.

  • Or you can take a duck and run.

  • OK.

  • 1, 2, how about 3?

  • Come on over here, 3.

  • So if you want to be our first pointer, you can be number 5.

  • Come on over here.

  • You want to be number 9.

  • And one more.

  • One more volunteer.

  • Come on over here.

  • Yeah.

  • All right.

  • So-- I'll meet you over here.

  • OK, 17.

  • All right.

  • So if you'd like to--

  • just so we pick this up for those following along at home.

  • If you would like to just say hello to the audience.

  • ANDREA: Hi, I'm Andrea.

  • [? COMEY: ?] Hi, [? I'm Comey. ?]

  • [? KYONG: ?] Hi, [? I'm Kyong. ?]

  • SPEAKER 2: Hi, I'm [INAUDIBLE].

  • DAVID J. MALAN: Wonderful.

  • OK.

  • If you wouldn't mind all just taking a big step back over the ducks,

  • just so that we're a little farther back.

  • Let's go ahead and do this.

  • If you're our first pointer, if you could come over here for instance,

  • and just stand outside the ducks.

  • And if you guys could come a little over here in front is still fine.

  • So here we have the makings of a linked list.

  • And what's your name again?

  • [? COMEY: ?] [? Comey. ?]

  • DAVID J. MALAN: [? Comey ?] is our first pointer if you will.

  • Via [? Comey's ?] variable are we just going

  • to keep track of the first element of the linked list.

  • So if you could, with your left hand, represent first.

  • Just point over at-- what was your name again?

  • ANDREA: Andrea.

  • DAVID J. MALAN: So Andrea is the number 9.

  • If you could use your left hand to point at number 5.

  • And if you could use your left hand, yep, to point at number 17.

  • And your left hand to just point at null, which we'll just call the ground.

  • So you don't want to just point it randomly

  • because that would be like following a bogus pointer, so here means null.

  • All right.

  • So this is a linked list.

  • All you need to store are linked list of three values

  • is three nodes, inside of which are three integers,

  • and their left hands represents that next pointer, so to speak.

  • [? Comey's ?] a little different, in that she's not holding a value.

  • She's not holding an integer.

  • Rather, holding just the name of the variable, first.

  • So you're the only one that's different here fundamentally.

  • So suppose I want to insert the number 20?

  • Could someone volunteer to be number 20?

  • OK.

  • Come on up.

  • All right.

  • And what's your name?

  • ERIC: Eric.

  • DAVID J. MALAN: Eric.

  • Eric, you're the number 20.

  • And Eric, actually, let's see.

  • Actually can we do this?

  • Let me give-- let me make this a little more different.

  • OK.

  • That never happened.

  • OK.

  • Eric, give me that please.

  • I want to insert Eric as number 5.

  • So Eric, I'm keeping this list sorted.

  • So where, obviously, you're going to go?

  • ERIC: Go right there.

  • DAVID J. MALAN: All right.

  • But before you do that, let's just consider what this looks like in code.

  • In code, presumably, we have malloced Eric from the audience.

  • I've given him a value, n of number 5.

  • And his left hand is like, it's garbage value right now, because it's not

  • pointing to anything specific.

  • So he's got two values-- an integer, and a left hand representing

  • the next pointer.

  • If the goal is to put Eric in sorted order.

  • What should our steps be?

  • Like, whose hand should point where, and in what order?

  • Yeah.

  • Give us one step.

  • AUDIENCE: You should point to number 9.

  • DAVID J. MALAN: OK so you should point at number 9,

  • which is equivalent to saying, point at whatever first.

  • Where [? Comey ?] is pointing at.

  • So go ahead and do that.

  • All right next?

  • What's the next step?

  • Someone else?

  • Someone else.

  • Almost there.

  • Yeah?

  • AUDIENCE: First should point to 5.

  • DAVID J. MALAN: OK.

  • So first, or [? Comey, ?] could you point to 5.

  • And that's fine.

  • You don't even have to move.

  • Right?

  • This is the beauty of a linked list.

  • It doesn't matter where you are in memory,

  • it's the whole beauty of these pointers, where you can literally

  • point at that other location.

  • It's not an array where they need to be standing back to back to back.

  • They can be pointing anywhere.

  • All right.

  • Let's go ahead and insert one more.

  • Who wants to be say, 55?

  • Big value.

  • Yeah.

  • Come on down.

  • All right.

  • What's your name?

  • [? KYONG: ?] [? Kyong. ?]

  • DAVID J. MALAN: [? Kyong. ?] OK.

  • So come on over.

  • So we've just malloced [? Kyong ?] from the audience.

  • I've given him his end value of 55.

  • His left hand is just some garbage value right now.

  • How do we insert [? Kyong ?] in the right order?

  • Where is the obviously supposed to go?

  • In sorted order, he obviously belongs at the end.

  • But here's the catch with the linked list.

  • Just like when we've discussed searching and sorting in the past,

  • the computer is pretty blind to all but just one value.

  • And the linked list, at the moment--

  • like, I don't know that these three, these four, exist.

  • All I know really, is that [? Comey ?] exists.

  • Because via this first pointer, is the only access

  • to the rest of the elements.

  • And so what's cool about a linked list, but perhaps not obvious,

  • is that you only--

  • the most important value is the first.

  • Because from the first value, you can get to everyone else.

  • It's not useful-- excuse me for me to remember, Andrea?

  • --Andrea alone, because if I do, I've just

  • lost track of [? Comey ?] and more importantly, because of his number,

  • Eric.

  • So all I have to do really, is remember [? Comey. ?]

  • So if the goal now is to insert number 55, what steps should come first?

  • No pun intended.

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: Say again.

  • AUDIENCE: Finding the first space.

  • DAVID J. MALAN: OK.

  • Finding the first space.

  • So I'm going to start at [? Comey, ?] and I'm going to follow this pointer.

  • Number 5, does 55 belong here?

  • No.

  • So I'm going to follow this pointer and get to Andrea.

  • Does 55 belong here?

  • No.

  • Gonna follow her pointer, and 22, does it belong here?

  • No.

  • I follow this pointer, 26?

  • No.

  • But you have a free hand, it turns out.

  • So what step should come next?

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: We could have you point at 55, and now done.

  • So relatively simple, but what was the running time of this?

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: It's big o of n.

  • It's linear.

  • Because I had to start at the beginning, even though we

  • humans have the luxury of just eyeballing it.

  • Saying, oh, obviously, he belongs way at the end.

  • Mm-mm.

  • Not in code.

  • Like, we have to start at the beginning to reverse the whole darn list,

  • until we get linearly to the very end.

  • And now we're done.

  • Let's try one last one.

  • How about 20?

  • Yeah.

  • Great.

  • Come on down.

  • What's your name?

  • JAMES: James.

  • DAVID J. MALAN: James.

  • All right, James.

  • All right.

  • So we just malloced James, given him the number 20.

  • He obviously belongs roughly in the middle.

  • What's the first step?

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: Sorry?

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: All right.

  • So we start with [? Comey, ?] again.

  • All right.

  • First, OK.

  • 5, do you belong here?

  • No.

  • Let me follow the link.

  • OK 9, do you belong here?

  • No.

  • Do you belong at 22-- ooh.

  • But what did I just do wrong?

  • I went too far.

  • At least in this story.

  • Like, I literally-- Andrea is behind me now.

  • OK.

  • So can I follow the pointer backwards?

  • You can't.

  • Like in every picture we've drawn, and every example

  • we've done with an address, we only have the address of the next pointer.

  • We don't have what's called, a doubly linked list, at least in this story,

  • where I can just turn around.

  • So that was a bug.

  • So I need to start over instead.

  • First, OK 5, OK 19--

  • what I really need in code, ultimately, is

  • to kind of peek ahead and not actually move-- not that far.

  • Just to 22.

  • Peek ahead at 22 and realize, oh, that's going to be too far.

  • This is not yet far enough.

  • So let's go ahead and bring James over.

  • Well, actually, you can stay there physically.

  • But what step has to happen first?

  • I know now he belongs in here.

  • You want to point at him?

  • OK.

  • Point at him.

  • ANDREA: Oh.

  • I'm sorry, he points first.

  • DAVID J. MALAN: Well let's do that, just because it is incorrect.

  • That's fine.

  • OK.

  • Andrea proposed that we point here, but she just broke the whole linked list.

  • Why?

  • ANDREA: Because there's nothing to point at.

  • DAVID J. MALAN: Right.

  • No one is remembering-- what's was your name again?

  • [? KYONG: ?] [? Kyong. ?]

  • DAVID J. MALAN: No one's remembered where [? Kyong ?] was.

  • So you can't do that.

  • Your left hand has to stay there.

  • So what steps should happen first instead?

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: James should point at whatever

  • Andrea is pointing at, perhaps?

  • So a little redundantly at the moment, just like before.

  • OK.

  • Now what happens next?

  • That's step one.

  • ANDREA: Now I can point.

  • DAVID J. MALAN: Now you can point at him.

  • OK.

  • You could do that.

  • All right.

  • And so now, this looks like a complete mess,

  • but if we know that [? Comey ?] is first,

  • we can follow the breadcrumbs to Eric, and then to Andrea, and then to James,

  • and then the rest of our list step by step by step.

  • So it's a huge amount of like logic now.

  • But what problem have we solved?

  • And I think we identified it over here earlier.

  • What was the problem first and foremost with the arrays?

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: You have to decide on their size in advance.

  • And once you do that, if you want to add an additional element,

  • you have to resize the whole darn thing.

  • Which is expensive because you have to move everyone around.

  • Now frankly, I'm being a little greedy here.

  • And every time we've inserted these new elements,

  • I've been keeping them in sorted order.

  • So it would seem that if you insert things in sorted order, big o event,

  • every time.

  • Because in the worst case, the new element

  • might end up all the way at the end.

  • But what if we relax that constraint?

  • What if I'm not so uptight and need everything nice and orderly and sorted?

  • What if I just want to keep growing the list in any random order?

  • And I allocate the number 34.

  • And I'll play the number 34.

  • Malloc 34.

  • Where is the quickest place for me to go?

  • Yeah?

  • AUDIENCE: Point to 5, and then have [INAUDIBLE]..

  • DAVID J. MALAN: OK.

  • I'll point to 5, and then, [? Comey, ?] if you could point to me.

  • Done.

  • One-- well, two steps.

  • All right.

  • Suppose now, I malloc 17 with someone else, who'll we'll

  • pretend is right here.

  • Where's the best place for 17 to go?

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: Right after [? Comey ?] too.

  • So now, [? Comey ?] can point at 17, 17 can point at me, I can point at Eric,

  • and so forth.

  • And that's two steps again.

  • Two steps-- if it's the same number of steps every time,

  • we call that, constant time.

  • And we write it as big o of 1.

  • And so here too, it's just a trade off.

  • If you want really fast insertions, don't worry about sorting.

  • Just put them at the beginning and deal with it later.

  • If you want a dynamic resizeability, don't use an array, use a linked list,

  • and just keep allocating more and more as you go without wasting

  • a huge amount of space too.

  • Which notice, that's another big problem with an array.

  • If you over allocate space, and only use part of it, you're just wasting space.

  • So there's no one solution here.

  • But we do now have the capabilities, thanks to the structs

  • and pointers to stitch together, if you will, these new problems.

  • Yes, please.

  • SPEAKER 2: Why can't the node [INAUDIBLE]??

  • DAVID J. MALAN: And who am I in this story?

  • SPEAKER 2: [INAUDIBLE].

  • DAVID J. MALAN: Oh, OK.

  • Absolutely.

  • So another very reasonable idea would be, well,

  • why don't we just put the new ones at the end?

  • That's fine if I keep track of who is at the end.

  • The problem, is at the moment in the story,

  • and we'll ultimately see this in code, I'm only remembering [? Comey. ?] And

  • from [? Comey ?] am I getting everywhere else.

  • I could have another pointer, a second pointer,

  • and literally call it, last, that's equivalent to you.

  • Or that's always pointing at you.

  • I just need then two pointers, one literally called first,

  • one literally called last.

  • That's fine.

  • That's a nice optimization if I want to throw all the elements at the end.

  • And frankly, I could get really fancy--

  • and to solve the problem that Andrea cited earlier--

  • if I store not just an int and a pointer, but instead,

  • an int and two pointers, I can even have each

  • of these guys pointing with their left and right hands

  • in a doubly linked list, so as to solve the problem Andrea identified, which

  • was if I go too far no big deal.

  • Take one step back.

  • I don't have to think as hard about that logic.

  • So there too, a trade off.

  • Let's go ahead and take a five minute break.

  • I'll turn on some music.

  • Grab a duck now, if you'd like.

  • And we'll return with some fancier data structures still.

  • Thanks.

  • All right.

  • We're back.

  • So let's now translate some of these ideas to code.

  • So that we can actually solve this problem a little more

  • concretely than just having humans pointing at each other.

  • So for instance, let's try to distill everything

  • we've been talking about into just a goal in code

  • of storing a list of numbers.

  • I would propose that we can take like three passes at this problem.

  • The first would be, let's just decide in advance how many numbers we

  • want to store so we don't have to deal with all

  • this complexity with the pointing and the pointers and all this,

  • and just hard code that value somehow, and just stop

  • when the user is inputted that many numbers and no more.

  • Two, we can improve upon that and at least let the user dynamically resize

  • their array.

  • So that if they decide to input more numbers than we intend,

  • it's going to grow, and deal with that.

  • Of course, arrays are not necessarily ideal

  • because they have to do all that damn copying from old to new.

  • That's linear time.

  • It would seem smartest to get subversion 3, which

  • is actually going to use a linked list.

  • So we're just more modestly allocating space for another number,

  • and another number, and another number, or really a node.

  • One number at a time.

  • So let me go ahead and start as follows.

  • I'm going to go ahead and include some familiar lines in list 0.c,

  • of the CS50 library, just to make it easy to get some user input for this.

  • And standard iO dot h, for printdef.

  • And let me go ahead and declare my main function as usual.

  • And then, in here let's do a couple of things.

  • First, let's ask the user for the capacity of the array

  • that we're going to use.

  • Or rather, let's do this first.

  • Let me first rewind and say, you know what?

  • Int, numbers, 50.

  • Well, that's going to be annoying to type in 50 numbers.

  • We're going to give the user two numbers at first, that here, she can type in.

  • Next, let's go ahead and prompt the user for those numbers.

  • So let me go ahead and say--

  • let's do this.

  • Let's at least clean this up a little bit so that we can reuse this value.

  • So we don't have a magic number.

  • This just came up in discussion actually.

  • So while-- do I want to do that?

  • Nope.

  • Let me fix this.

  • This will be my capacity of size 2.

  • And that's going to give me that size.

  • And then, I'm going to keep track of how many integers

  • I've prompted the user for so far.

  • So initially, the size of this structure is going to be 0.

  • But it's capacity, so to speak, is 2.

  • So size means how many things are in it.

  • Capacity means how many things can be in it.

  • And while the size of the structure is less than its capacity,

  • let's go ahead and get some inputs from the user.

  • Let's go ahead and ask them for a number, using our old friend, get int.

  • And just say, give me a number.

  • And then, let me go ahead and insert the number

  • that they type in into this array at location size, like this.

  • And then, do size plus, plus.

  • I think.

  • You know, I wrote it pretty quickly.

  • But let's consider what I just did.

  • I initialized size to 0, because there's nothing in it initially.

  • Then I say, while size is less than the capacity of the whole thing--

  • and capacity is 2 by default--

  • go ahead and do the following.

  • Give me an int from the user.

  • OK.

  • So int number gets int.

  • Then, put at location, size, in my numbers, array,

  • whatever the human typed in, number.

  • And then, increment size with plus, plus.

  • All right.

  • So on the first iteration size is 0.

  • So numbers, bracket, 0, gets the first number.

  • Numbers, bracket, 1, gets the second number.

  • Then, size equals capacity.

  • So it stops, logically.

  • Any questions on the logic of this code?

  • All right.

  • So once we have those numbers, let's just do something simple.

  • Like for int, I gets 0.

  • I is less than the actual size I, plus, plus.

  • Let's just go ahead and print out the number

  • you inputted, percent I, backslash n, and type out numbers, bracket, I. All

  • right.

  • So if I made no typos in list 0 dot C, then, I'm going to go ahead

  • and do dot, slash, o, dot, C. I'm going to be prompted for a couple of numbers.

  • Let's go ahead and do 1, 2.

  • You inputted 1, you inputted 2.

  • All right.

  • So not bad.

  • But this is bad design, arguably, why?

  • Just find one fault. It's correct.

  • But bad design.

  • AUDIENCE: Repetitive.

  • DAVID J. MALAN: Repetitive, because I'm using a couple of loops, sure.

  • And it's fundamentally-- it's very limited in functionality.

  • Why?

  • Like how useful is this program?

  • AUDIENCE: It's hard coded at 2.

  • DAVID J. MALAN: Yeah.

  • It's hard coded at 2.

  • So let's at least improve upon this a little bit,

  • and get rid of this hard coding.

  • Why don't I at least ask the user for something like this?

  • Well, instead of just declaring the capacity, let me go ahead and say,

  • you know what?

  • Let's just replace the 2.

  • Get int, and just say capacity, for instance.

  • All right.

  • And now if I do this, I'm going to be prompted--

  • so make list 0.

  • Dot slash list 0.

  • The capacity will be 2.

  • 1, 2, that's nice.

  • But if I run it again, and give it a capacity of 3--

  • 1, 2, 3, I get more capacity.

  • So that's nice.

  • It's an improvement for sure.

  • There is a bug here.

  • Before I test it further, can anyone identify a bug or somehow crash this?

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: Oh, go ahead.

  • AUDIENCE: If you don't input an integer.

  • DAVID J. MALAN: If I don't put an integer.

  • Or-- is that same comment up here?

  • AUDIENCE: I was going to say, what happens if you go back

  • and put in [INAUDIBLE] those other [INAUDIBLE] will be in the memory.

  • DAVID J. MALAN: Oh.

  • No.

  • Because I'm rerunning it in each time.

  • I don't need to worry about previous runs of the program.

  • Yeah?

  • AUDIENCE: In the for loop, it just goes 1,

  • 2, 3, it doesn't actually care what you put it.

  • DAVID J. MALAN: [INAUDIBLE] 1, 2, 3-- well, I am iterating up to size,

  • which could be capacity.

  • Because now they do end up being equivalent.

  • Because I'm filling the whole thing.

  • But let's try this.

  • If you don't type in a value.

  • So let me go ahead and rerun this.

  • My capacity shall be duck.

  • All right.

  • So we did handle that.

  • Because getInt does that for me.

  • But I bet I can still break this.

  • Ooh, yeah, let's always try something negative.

  • Oh, OK.

  • So bad.

  • Like cryptic looking message, but clearly,

  • has to do with a negative value.

  • So I should probably be a little smarter about this.

  • And recall from like, Week 1, we did do this.

  • With Mario, you might have done this.

  • So I could do something like, do, while capacity is less than 1.

  • I could go ahead and say, capacity getInt capacity.

  • So just a little bit of error checking to close the bug that you identified.

  • All right.

  • So let's go ahead and recompile this.

  • Make lists 0-- oops we're going to start hearing that a lot today.

  • Aren't we [INAUDIBLE]?

  • Make list 0, dot, slash, list 0.

  • Capacity will be 3.

  • 1, 2, 3.

  • Now capacity will be negative 1.

  • Doesn't allow it.

  • Capacity 0, doesn't allow it.

  • Capacity 1, yes.

  • So non-exhaustively, I've tested it.

  • It feels like it's in better shape.

  • OK.

  • But this program, while correct, and while more featureful,

  • still has this fundamental limit.

  • Wouldn't it be nice to allow the user to just keep typing numbers,

  • as many as they want, and then quit once they're done inputting numbers.

  • Right?

  • If you're making a program to compute someone's GPA,

  • different students might have taken different courses,

  • you don't want to have them to type in all 32 courses.

  • If they're younger and haven't taken all those courses.

  • Like there's a lot of scenarios where you don't know in advance how

  • many numbers the user wants to provide.

  • But you want to support a few numbers, lots of numbers, or beyond.

  • So let's do this in a second version.

  • In list 1 dot C, let me go ahead and improve upon that example as follows.

  • First, let me give my familiar friends up here CS50 dot for iO,

  • standard iO dot h, and then, in here, int main void.

  • And then, let's start writing this.

  • So now, I don't know in advance, necessarily, how many numbers the user

  • is going to type in.

  • Like the goal is, I want them to be able to type

  • in a number, another number, another number, and then

  • hit the equivalent of like, q, for quit, when they're done inputting numbers.

  • Like I don't want them to have to think about in advance, how many numbers it

  • is they're inputting.

  • But how do I do that?

  • Like I can't just come up with an array called numbers, and say, 50.

  • Because if the user wants to type in 51 numbers,

  • I'm going to have to resize that.

  • But how do you resize an array?

  • How do you resize an array?

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: What's that?

  • AUDIENCE: You can't.

  • DAVID J. MALAN: You can't.

  • Right.

  • We've never seen an instance where you've re-sized an array.

  • We talked about it on the blackboard here.

  • Well, just like, allocate a bigger one and copy everything in.

  • And we did identify realloc.

  • But you can't actually use realloc on an array.

  • Realloc actually accepts an address of a chunk of memory

  • that you want to grow, or shrink.

  • So it turns out, if we now start to harness

  • the sort of fundamental definition of what an array is, a chunk of memory,

  • we can actually build arrays ourselves.

  • If an array is just a chunk of memory, or more specifically,

  • it's like the address of the first byte of a chunk of memory,

  • it would seem that I could declare my array, not with square brackets

  • as we've been doing for weeks, but I can say,

  • you know what numbers really is, it's really just a pointer.

  • And I'm initially going to initialize it to null.

  • Because there is no array.

  • But now I have the ability to point that pointer

  • at any chunk of memory, small or big.

  • Now why is this useful?

  • Well, initially let me claim that my capacity is 0,

  • because nothing's going on yet.

  • I haven't called malloc or anything.

  • And initially, my size is 0 because there's nothing in the array.

  • And it doesn't even have a size.

  • But let me just do this forever.

  • Much like in scratch, we had the forever block you can use, while true, and C,

  • to just say keep doing this until the user breaks out of this.

  • And let me go ahead and ask the user, give me a number, getInt.

  • And just ask them for a number.

  • And then, we just need a place to put that.

  • So where do I put this number?

  • Well, do I have, at the moment, any place to put the number?

  • No.

  • And technically speaking, how do you express that?

  • Like in pseudo code, I want to say, if no place for number.

  • But technically, I could do this.

  • Well, if the size of the array at the moment, equals its capacity,

  • that feels like a lower level way of expressing the same thing.

  • If whatever the capacity is, if the size is the same, there is no more room.

  • And that simple statement also covers the scenario where the capacity is 0,

  • the size is therefore, 0.

  • So its the same question.

  • Either we have no space at all, or we have some space

  • but we've used it all-- size equals, equals, capacity.

  • So if the size equals capacity, or put more casually,

  • if I don't have enough space.

  • What do I want to do intuitively?

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: Allocate more memory.

  • And it turns out, you proposed, or someone proposed earlier,

  • reallocating memory.

  • We can use this function for the very first time.

  • Let me go ahead and say this--

  • the catch with realloc is you have to be smart about it,

  • because it returns a pointer.

  • So let me propose this code first.

  • First, just give me a temporary variable, call it, temp,

  • that's going to store the following.

  • Actually, no.

  • Let me start this more simply.

  • Let me go ahead and say, numbers should be reallocated please,

  • realloc by passing its self in.

  • And this time, give me the size of an int, times--

  • how many ints do I want this time?

  • How many numbers did the human just input presumably?

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: Just one.

  • Right.

  • Because literally, we've only called getInt once in this story.

  • So whatever the size of this array is now, we need to increase it by 1.

  • That's all.

  • So this line of code here is saying, hey computer,

  • go ahead and reallocate this array from whatever its current size is,

  • and make it this size instead.

  • The size of whatever it is, plus 1, times the size of an int.

  • Because that's what we're trying to store, is an int.

  • So we have to do that multiplication.

  • And realloc, as mentioned earlier, is pretty fancy.

  • It's going to take an pointer, whatever chunk of memory

  • you've already allocated, and it's going to then reallocate

  • a bigger chunk of memory.

  • Hopefully, what's going to happen is this--

  • if your chunk of memory initially looks like this,

  • it's going to hopefully notice, oh, this memory is free.

  • Let me just give you back the same address.

  • So if this is address 100, and you get lucky

  • and this address is also available, the realloc function's

  • going to remember that for the operating system.

  • It's going to return the number 100 again.

  • And you're good to go.

  • You can safely touch memory here.

  • Or if this is in use already, this chunk of memory, and therefore we

  • can't fit another byte there because some other code you wrote

  • is using that memory.

  • But there is twice as much memory available down here.

  • What realloc will do, is if you've stored the number 50,

  • it will handle the process of copying 50 to the new value.

  • This is going to be left as a garbage value for you to deal with.

  • And it's going to return to you the address of the new chunk of memory,

  • having done the copying for you.

  • So even though it's technically re-allocating the array,

  • it's not necessarily just going to grow it.

  • It might relocate it in memory to a bigger chunk,

  • and then give you the new address of that memory.

  • Question?

  • AUDIENCE: Is that process really preferable

  • to just creating extra memory in it's place.

  • And then saving the time and energy of reallocating them [? all at once. ?]

  • DAVID J. MALAN: That's a really good question.

  • Honestly, we could avoid this problem slightly by just doing, you know what,

  • give me at least--

  • go ahead and give me at least the size of an int, times--

  • I don't know, most humans are not going to type in more than 50 numbers.

  • Let's just pick 50.

  • So you could do this, and that would indeed save you time.

  • Because the approach I'm currently taking

  • is pretty inefficient because every damn time the user

  • calls getInt, and gives an int, we're resizing, resizing, resizing.

  • Very expensive.

  • As to what the best value is though--

  • 50?

  • Should it be 25?

  • Should it be 1,000?

  • I'm either going to under bet or over bet.

  • And it just depends on you to decide which of those is the worst decisions.

  • AUDIENCE: But, like, in terms of programs,

  • is it also pretty expensive to have memory that you're not using

  • or generally, is it usually more OK?

  • DAVID J. MALAN: Good question.

  • In programs you're writing, is it better to have more memory than you're using,

  • or should you really be conservative?

  • These days, memory is cheap.

  • We all have gigabytes of memory.

  • And so wasting 50 bytes or 200 bytes, times 4, of memory, not a big deal.

  • Like, just get the job done quickly and easily.

  • But in resource constrained devices, maybe, things like phones

  • or little internet of things style devices

  • that have a lot fewer resources, you don't really want to go wasting bytes.

  • But honestly, the CPUs, the brains in our computers,

  • are so darned fast these days, even if you're calling malloc

  • 10 times, 1,000 times, it's happening so darned fast

  • that the human doesn't even notice.

  • So there too.

  • These are what are called design decisions.

  • And these are the kinds of things that, in the real world,

  • you might actually debate with someone at a whiteboard,

  • saying, no, this is stupid because of this reason.

  • Or he or she might push back for other reasons.

  • And no one's necessarily right.

  • The whole goal is to just that thought process first

  • so you're at least confident in what you chose.

  • Yeah?

  • AUDIENCE: When we were writing to a file in the last PSET,

  • was it storing it in memory first or putting it right on the hard drive?

  • DAVID J. MALAN: When you were calling fread,

  • you were by definition in the forensics problem set

  • reading bytes from disk into memory.

  • When you were calling fwrite, you were copying bytes from memory back to disk.

  • If that answers the question.

  • OK.

  • Other questions?

  • Yeah?

  • AUDIENCE: Why did you say, size + 1, in line 16?

  • DAVID J. MALAN: Why do I say, size + 1, in line 16?

  • Because the whole goal is to make room in this array for the newly inputted

  • number that the human just typed in.

  • And so whatever the current size of the array is,

  • I clearly need one more space.

  • AUDIENCE: So that repeats on and on?

  • DAVID J. MALAN: It does repeat on and on.

  • Because at the moment, I'm inside of this while loop.

  • So we do need to ask a question, when is the human done inputting.

  • And it turns out-- and this is not obvious.

  • And it's not the best user experience on a keyboard for the human.

  • But we can actually detect the following sentiments--

  • if user is done inputting numbers, then let's go ahead and break.

  • But the question then is, how do you express that pseudo code?

  • Well, you could in some programs maybe type q for quit.

  • But is that going to work when using getInt?

  • Could we detect q?

  • Why not?

  • AUDIENCE: Because getInt immediately prompts you for another integer.

  • DAVID J. MALAN: Exactly.

  • Because getInt immediately prompts you for another int.

  • So because of the way we designed the CS50 library, you can't detect q,

  • or you can't have the human type quit unless you don't use getInt.

  • You instead use?

  • AUDIENCE: getString.

  • DAVID J. MALAN: We could use getString.

  • And then every time the human types in a number, we could use,

  • like, A2i to convert it to an int.

  • But if the human types in q or Q-U-I-T--

  • a string also-- we could just have an if condition with string compare and quit.

  • But honestly, then you're reimplementing getInt--

  • so trade-off.

  • Anyhow, a common way to work around this would

  • be, you know that Control-C quits programs, perhaps,

  • cancels out of your program.

  • There's another popular keystroke, Control-D,

  • that sends what's called end of file.

  • It simulates the end of a file.

  • It simulates the end of the human's input.

  • So it's kind of like the period at the end of an English sentence.

  • So if you want to signal to a computer that's waiting for input from you that

  • you don't want to quit the program-- that would be Control-C--

  • but you just want to be done inputting input to the computer,

  • you hit Control-D, otherwise known as EOF.

  • And the way to express this-- and you would only know this from

  • documentation-- would be to say something like this,

  • if the number the human typed in equals end of file--

  • but there is no such thing in this context--

  • you actually do this because of the CS50 library works.

  • It turns out that if the only values a function can return

  • are integers, that means you can return 0, 1, negative 1, 2 billion,

  • negative 2 billion give or take.

  • What humans did for years with old programming languages

  • is they would just steal one or a few numbers.

  • For instance, you'd steal the number two billion and call it intmax--

  • the maximum integer.

  • And you'd just say, you can never actually type 2 billion,

  • because we're using that as a special value to signify

  • that the human hit Control-D. Or you could do negative 2 billion,

  • or you could do 0, or 50.

  • But at some point, you have to steal one of the 4 billion available numbers

  • to use as a sentinel value, a special value

  • that you can then check for as a constant.

  • So anyhow, this just means, when the user is done typing input,

  • go ahead and break out of this while loop.

  • And as an aside, let me fix one thing.

  • It turns out things can go wrong with realloc.

  • And if realloc fails to allocate memory, it

  • can return null, a special value that just means, eh, something went wrong.

  • It's an invalid pointer.

  • It's the address 0.

  • And so it turns out there's a subtle bug here where, technically, I

  • should actually do this--

  • store realloc's return value in a temporary variable.

  • Because if temp = null, something went wrong.

  • And I should actually go ahead and quit out of this program.

  • But let me wave my hand at that for now because it's more of a corner case.

  • But you'll see in the online version of this program we have additional error

  • checking that just checks, in the rare case that realloc fails,

  • clean it up and return properly.

  • But I'll wave to the online code for that.

  • All right.

  • Any questions on that example before we move on?

  • Yeah?

  • AUDIENCE: So in realloc, when it creates the new pointer for the [INAUDIBLE],,

  • does it clear the memory from the original pointer?

  • Does it automatically clear it?

  • DAVID J. MALAN: Good question.

  • When you call realloc and it ends up allocating more space,

  • does it clear the original memory?

  • No.

  • And that is where garbage values come from, for instance.

  • Because they're just left in memory from the previous use.

  • Other questions?

  • Yeah?

  • AUDIENCE: What does the user actually type to break?

  • DAVID J. MALAN: Oh, Control-D. Control-D. And it's not break.

  • It is to send end of file, end of input.

  • Control-C kills or breaks out of the program itself.

  • AUDIENCE: And that's the same as the intmax kind of?

  • DAVID J. MALAN: Same as intmax?

  • Yes.

  • AUDIENCE: Because you're not adding, like, a giant value.

  • DAVID J. MALAN: Correct.

  • In the CS50 library, intmax, yes, is the symbol.

  • Yes.

  • Yeah?

  • AUDIENCE: Could you also just ask the user to say,

  • do you want to enter another number yes or no?

  • DAVID J. MALAN: Absolutely.

  • We could add more logic.

  • And you could use getString.

  • And we could prompt him or her, hey, do you want to input another number.

  • The only downside of that would be, now, I

  • have to type in not only my number, but yes or no constantly.

  • So it's just a trade-off user interface-wise.

  • All right.

  • So let me go ahead.

  • And let me go ahead and return 0 here just as my simple solution

  • to this problem of something going wrong.

  • I've just compiled this program.

  • Let me go ahead and run it.

  • I'm going to type in one number, two numbers, three numbers.

  • And now I'm bored.

  • I don't want to keep doing this.

  • How do I tell the computer I'm done?

  • AUDIENCE: Control-D.

  • DAVID J. MALAN: Control-D. Oops.

  • Oh, OK.

  • That's correct behavior because I forgot a key step.

  • What's that?

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: Yeah.

  • I'm not actually doing anything with the values.

  • I should probably for int I get 0, I less than size,

  • I + + code we had before.

  • And I should probably print out You inputted %I, this.

  • Save that.

  • Make list one.

  • So all I did was re-add the printing code.

  • Now if I rerun this-- one, two three, Control-D--

  • dammit.

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: Oh.

  • OK.

  • Now I broke my code here.

  • Let me do this.

  • We're going to get rid of this error checking

  • because I'm not actually ever resizing.

  • numbers gets realloc.

  • Oh, and maybe someone chiming in with this--

  • numbers bracket size gets the user's input.

  • Size + +-- was this a key detail someone wanted me to do?

  • OK.

  • So I didn't actually finish the program earlier.

  • Notice we left off as follows--

  • hey, computer, give me an array of size 0 initially

  • that's null-- there's no memory for it.

  • Therefore, the size of this array is 0.

  • Do the following forever.

  • Get a number from the human.

  • If the number equals this special value, intmax just

  • breakout because the program is done.

  • And actually, sorry.

  • This is why I write these in advance too.

  • OK.

  • Go ahead and prompt the user for a number.

  • If they have inputted the Control-D, just break out of this loop.

  • However, if the size of the array equals its current capacity,

  • go ahead and reallocate space for this thing being one number bigger than it

  • previously was.

  • Now, assuming that succeeded and we have memory,

  • go ahead, and just like our list 0 example, store in the numbers array

  • at the current location, which is 0, whatever number the human typed in.

  • And then increment the size by one to remember what we have done.

  • I'm also though going to need to do capacity + + here

  • to remember that we've increased the capacity of the array.

  • So again, two new measures.

  • capacity is how much space there is in total.

  • size is how much we're using.

  • They happen to be identical at the moment

  • because we're growing this thing step by step by step.

  • All right.

  • Let me go ahead and hit Save.

  • Let me go ahead and compile this one last time.

  • ./list1 and input 1, 2, 3.

  • Control-D. OK.

  • Now it's just an aesthetic bug.

  • I forgot my /n.

  • So just to prove that I can actually program, ./list1; 1, 2, 3; Control-D.

  • Phew.

  • All right.

  • So you inputted 1.

  • And the reason it didn't move to another line

  • is because Control-D gets sent immediately without hitting Enter.

  • All right.

  • Phew.

  • That's all using arrays.

  • Now let's do the sort of cake baked already and pull it out of the oven.

  • The third and final example here is list two.

  • And actually, before we get there, let me note one thing.

  • Yeah, let's do one last thing here.

  • Let me go ahead and run, per earlier, our new friend valgrind on list1.

  • Enter.

  • It's waiting for me to type in 1, 2, 3.

  • Let me go ahead and hit Control-D. Interesting.

  • I seem to have a buggy program even though I claimed a moment ago that I

  • knew what I was doing.

  • 12 bytes in one blocks are definitely lost in lost record one of one.

  • Again, I don't understand most of those words.

  • But 12 bytes definitely lost--

  • probably my fault. Why is it 12?

  • And what are those 12 bytes?

  • Yeah?

  • AUDIENCE: I think you made three integers.

  • DAVID J. MALAN: Yeah, 1, 2, and 3.

  • AUDIENCE: And each one is 4 bytes.

  • And you never freed them after you used malloc.

  • DAVID J. MALAN: Exactly.

  • I typed in three numbers--

  • 1, 2, and 3.

  • Each of those is 4 bytes on this computer.

  • That's 12-- 3 times 4.

  • And so I'd never freed them seems to be the source of the issue.

  • So at the end, let's just prove that valgrind

  • can detect correctness as well.

  • Free my numbers, semi-colon.

  • Let me go ahead and rerun make list1.

  • And now let me increase the size of this and do valgrind again

  • on list1, typing in the same values--

  • 1, 2, and 3.

  • Control-D. All he blocks were freed.

  • No leaks are possible.

  • So again, valgrind is your friend.

  • It finds problems that you didn't even necessarily notice.

  • And you didn't have to read through your lines of code again

  • and again to identify the source of the issue unnecessarily.

  • All right.

  • Any questions then on these arrays that are dynamically allocated

  • and the bugs we find therein with valgrind?

  • All right.

  • So the last demonstration of code is going to be this.

  • I have stolen, for this final example, some of the building blocks

  • that we had on the screen earlier.

  • In my code for list2.c, I need a structure called node.

  • And that node, as we claimed earlier with our human volunteers,

  • is going to contain a number called number,

  • we'll call it this time, instead of n.

  • And it's going to contain a ptr called next to another such node.

  • So that's copied and pasted earlier, albeit with the integer renamed

  • to number for clarity.

  • Now, notice in main what I'm doing first.

  • Go ahead and allocate an array of no space initially.

  • So this was like when Comey was holding up first and representing

  • the beginning of our data structure.

  • This is the analog using an array, that the piece of paper that

  • would be held up here would be numbers.

  • And it's just pointing at nothing, null-- like left hand

  • down on the floor.

  • Because there is no memory yet allocated.

  • But then, and while true, go ahead and get an integer

  • from the user with this code here.

  • Check if the user hit Control-D, as with this arcane technique.

  • And then our code is similar in spirit, but we

  • have to stitch these things together.

  • Allocate space for the number.

  • So when I malloc an additional volunteer from the audience

  • and he or she came down, the equivalent in code is this--

  • hey, computer, allocate with malloc enough space to fit the size of a node,

  • then store the results in a ptr called n.

  • So node *n just means, give me a pointer to a node, call it n,

  • and store the address that was just allocated from the audience as before.

  • Why do I have these lines of code here that I've highlighted in blue?

  • What's that expressing?

  • If bang n, or if not n would be how you pronounce it--

  • what's going on there?

  • Yeah?

  • AUDIENCE: If there is no more memory that you can point to, then it fails.

  • DAVID J. MALAN: Exactly.

  • This isn't going to happen all that often.

  • But if the computer is out of memory, and therefore malloc fails,

  • you don't want the program just to crash or freeze.

  • Like, all of us hate when that happens on Mac OS or Windows.

  • So check for it.

  • If not n, or equivalently, if n = = null, just return 1.

  • Quit gracefully, even though annoyingly.

  • But don't just crash or do something unexpected.

  • So you can simplify that check to just if not n-- if n is not a valid ptr,

  • return 1.

  • Now, here's the code with which we were implementing the demonstration

  • with our humans.

  • And this is the scariest looking or most cryptic at least

  • looking code we're going to see in C.

  • Today is our final day in C. We've been running up

  • a really steep hill of late, learning about memory,

  • and now data structures and syntax.

  • This is the last of our syntax in C.

  • So what are the symbols to be aware of?

  • This line of code here is how I handed one of our volunteers a piece of paper.

  • On the right-hand side is the number that was typed in--

  • 55, or 5, or 20, or whatever the value is.

  • On the left-hand side is where you want to put it.

  • n and then literally an arrow number does this.

  • It has, with malloc a line or so prior, given me in memory

  • just one of these big rectangles.

  • And again, the top of this in this example is called the number

  • and the bottom is called next.

  • So that's our human having stood up from the back of the room.

  • When I hand that human a number, like 55, it visually goes there.

  • The line of code with which you achieve that is this here.

  • Because notice on line 31 here, when I malloc that node,

  • I stored its address in a variable called n.

  • And that's a pointer, as drawn with an arrow, to that big node.

  • Or if we really want to be nit-picky, if this is in address 100,

  • yes, then the pointer actually has the value 100 in it.

  • But again, that's rarely useful information.

  • So we can abstract away with just an arrow.

  • So line 31 is what creates those boxes on the screen.

  • Line 38 is what puts the number--

  • for instance, 55-- into the box exactly, much like I handed a piece of paper

  • over.

  • So what is this?

  • This is the only real new notation today,

  • even though we're using lots of stars elsewhere--

  • arrow This is wonderfully the first time in C it actually maps to our pictures.

  • If n is the variable and you do n arrow something,

  • that means follow the arrow--

  • kind of like Chutes and Ladders if you grew up playing that--

  • and then put the number where the arrow has led you in the field called number.

  • So as an aside, we can think about this a different way. n is what data type?

  • What is this thing in blue--

  • n?

  • AUDIENCE: Pointer.

  • DAVID J. MALAN: It's a pointer.

  • And it's a pointer to one of these things that we created earlier.

  • So we're not doing students anymore with our structures.

  • We're implementing nodes, which have numbers and next pointers.

  • So it turns out that if n is a pointer to a node--

  • recall that dot notation from before--

  • this is not how you access number in this case.

  • Because n is not a node itself.

  • It's a pointer.

  • But if n is a pointer, how do you go to a pointer?

  • How do you go to an address?

  • With what notation?

  • AUDIENCE: Star.

  • DAVID J. MALAN: Star.

  • So recall from last week, if we want to go to an address,

  • you could do syntax like this.

  • Ignore the parentheses for a moment.

  • Just *n means if n is an address of a chunk of memory, *n means go there.

  • Once you're there, you're conceptually right here-- top left-hand corner.

  • How do you access individual fields like number or next?

  • You use dot notation.

  • So if you literally do *n.number, that means go to the address and access

  • the number field.

  • There is nice syntactic sugar in C, which

  • is just a fancy way of saying shorthand notation, where it's just the arrow.

  • But that's all it is.

  • This arrow notation doesn't do anything new.

  • It just combines, go there, with, access a field in a struct, all in one breath

  • if you will.

  • And this just looks a little prettier.

  • When I told our volunteers earlier, point your hand

  • down at the floor, that's all that line of code is doing.

  • It's saying, go to n's address, which is here, access the next field,

  • and write in that field null, which is just

  • the address 0-- the default, special address, like pointing at the floor.

  • This line of code, 40, is just a quick error check.

  • if (numbers)-- what is that equivalent to?

  • That's actually just saying, if numbers, not equals null.

  • So if numbers is legitimate, if malloc worked correctly, then let's go ahead

  • and do the following.

  • Phew.

  • This is a mouthful.

  • What is going on here?

  • So this is a for-loop that's not using numbers.

  • Well, or is it?

  • Almost every for-loop we've written and you've probably written just

  • uses I, J, maybe K, but just integers probably.

  • But that doesn't have to be the case.

  • What is a pointer?

  • It's an address.

  • What is an address?

  • AUDIENCE: A place in memory.

  • DAVID J. MALAN: A place in memory, or a number really.

  • So you can certainly use for-loops just involving addresses.

  • But how?

  • So we'll consider this line of code.

  • This here looks different today, but it's everything

  • before that first semi-colon.

  • That's just where you initialize a value.

  • So this is like saying, hey, computer, go ahead and give me

  • a variable called ptr and initialize it to be the start of my list.

  • Then I'm saying, hey, computer, do this so long as ptr does not equal null.

  • And then what am I doing?

  • if-- and let's ignore this for now, it's an error check--

  • go ahead and-- sorry, let me think for one second.

  • OK.

  • Let's do this.

  • What are these lines of code doing?

  • This is the code that was actually suggested

  • at the very end of our human example.

  • Like, what if we wanted to insert all of the elements

  • at the end of the link list?

  • How do you express that?

  • So in this highlighted lines of code, we're asking the question,

  • if the current pointer's next field is null, we've found the end.

  • Go ahead and update that next field to equal n and then break.

  • So let me translate this to an actual picture,

  • but using smaller boxes that makes clear where something is going.

  • So suppose that this program's been running for a little while

  • and we have a length list that looks like this,

  • where this one is pointing here and maybe this one's pointing here.

  • And this says null here.

  • And this points here.

  • And the numbers are, as we've been using today, 42, 50, 13.

  • So the start of this list is called numbers.

  • This points to the start of the list.

  • What am I doing in this for-loop?

  • I am just implementing the following logic with this loop--

  • give me a variable called ptr, as represented

  • in the story by my left finger, here, and initialize

  • that to be the start of the list.

  • If that node's next pointer is equal to null, add a new node here.

  • But this is not null.

  • I want to follow the bread crumbs to here.

  • And then, oh, we're at the end of the list.

  • I want to insert this new thing here.

  • So how do express this code actually in C?

  • So if I look back up here, this is the line of code

  • that allocates my left finger here called ptr and initialize it

  • to equal numbers, which is the same as pointing at the first element.

  • It's kind of like Comey was representing first earlier.

  • But now our array is called numbers.

  • Next, what am I doing?

  • Does ptr equal null?

  • Well, no.

  • If my left hand is pointing here, it obviously doesn't equal null.

  • So we don't have to worry yet.

  • Then what do I want to do?

  • If ptr next equals null, well, what does that mean?

  • Well, ptr is here.

  • ptr arrow next means here.

  • Does this equal null in this story?

  • I mean, it literally doesn't.

  • Because null is not written there. null is way down there.

  • So the condition does not pass.

  • So what do I do next?

  • If ptr is equal to null doesn't apply, here's a weird update.

  • ptr gets ptr next.

  • So it's cryptic-looking syntax.

  • But if ptr is pointing here, what is ptr next?

  • That's just this, right?

  • This is n.

  • This is next.

  • Or this is number.

  • This is next.

  • So ptr next is this.

  • So what is this value?

  • Well, this is a pointer pointing here.

  • So that highlighted block of code, ptr equals ptr next,

  • has the effect visually of doing this.

  • Why?

  • If the arrows are a little too magical, just think about these being addresses.

  • If this is saying, the next address is location 100,

  • ptr equals ptr next is like saying, well, this also equals 100.

  • Whatever 100 is, for instance, over here is

  • what both arrows should now point out.

  • And if you now repeat this process and repeat this process,

  • eventually that question we asked earlier is going to apply--

  • if ptr next equals null, what do I want to do?

  • Well, if ptr x equals null, there's two lines going on. ptr next equals n.

  • So ptr next is no longer null.

  • It should instead be pointing at n, which is the new node.

  • And then that's it.

  • Because this was already initialized to null.

  • And let's suppose this was 55.

  • And we're done.

  • So much easier to do, obviously, in person with just humans,

  • and moving around, and pointing with their left hands.

  • But in code, you just have to think about the basic building blocks.

  • What is each of these values?

  • Where is each of it pointing?

  • And which of those fields do you need to update?

  • And the only new code here-- even though we're kind of combining it all in one

  • massive example--

  • is this.

  • We are actually using arrow notation to say, go to that address

  • and access some value therein.

  • And this condition down here, which I'll wave my hand out for now,

  • just handles this situation where the list is initially empty.

  • Any questions on this thus far?

  • All right.

  • So let's take a look more graphically at some final problems we can solve.

  • And what you'll see in the days ahead is the following

  • when it comes to these linked lists and more.

  • We now have the ability to actually allocate things in memory dynamically.

  • We don't necessarily know in advance how many numbers we have

  • or, in the case of the next problem set, how many words we have.

  • We have the ability though to use malloc, and maybe even realloc,

  • to grow and grow our data structure in memory.

  • And we have the ability in code to actually

  • traverse those values in such a way that we

  • can access memory that's all over the board now

  • and not necessarily back to back to back.

  • But what happens if we want to combine these ideas into fancier solutions

  • still?

  • Well, let's take a look at that.

  • In particular, if I go let's say over here to the following,

  • let's consider a problem we might now solve.

  • If I wanted to store everyone's name in this room in a data structure,

  • I could do what?

  • Well, we could use an array.

  • So I could actually decide how many people are in the room--

  • let's call it n--

  • and actually draw n boxes on the board, and then iteratively ask

  • everyone for their name, and actually write it down.

  • If I then wanted to take attendance thereafter and say, oh, is Alice here,

  • or is Bob here, or is Kareem here, or Brian,

  • I could just look through that array and say yes or no, that human is here.

  • But what's the running time of that algorithm?

  • How long would it take to look up a name in a data structure

  • where I've just drawn it as an array, a big list on the board?

  • AUDIENCE: A big O of n.

  • DAVID J. MALAN: What's that?

  • AUDIENCE: A big O of n.

  • DAVID J. MALAN: A big O of n, right?

  • Because if it's just a list of names, it's going to take big 0 of n.

  • And frankly, that seems a little slow.

  • How could I do an optimization?

  • Well, what if we combined some of these ideas?

  • Arrays are nice because they give me random sort

  • of instant access to memory locations.

  • But linked lists are nice because they allow me to dynamically add or subtract

  • elements even if I want from the list.

  • So you know what?

  • Instead of writing down everyone's names, like Alice, and Bob,

  • and Charlie, like this in just one big array of some fixed size that might

  • paint me into a corner-- now I only have room for one more name--

  • what if I instead do things a little more cleverly?

  • So when I'm actually jotting down everyone's name in the room, what

  • if I instead did, OK, is Alice here.

  • All right.

  • Alice is here.

  • And then Brian is here.

  • I'm going to put Brian here.

  • And then maybe Charlie is here.

  • All right.

  • So Charlie.

  • And then maybe Arnold is here.

  • Where should I put Arnold?

  • So also starts with A. You know what?

  • Let's just put Arnold here.

  • Arnold.

  • And Abby is here.

  • So you know what?

  • Let's just put Abby up here as well.

  • Bob came as well.

  • So Bob-- so what's the pattern I'm obviously following

  • as I'm hearing names called out?

  • AUDIENCE: Alphabetically sorted.

  • DAVID J. MALAN: Alphabetically sorted--

  • kind of.

  • Like, Abby kind of ended up in a weird place here.

  • But that's fine because I didn't hear her name first.

  • But I did kind of bucketize people into different rows of the board.

  • In other words, all of the A names I seem

  • to just write down for convenience at the top,

  • and then all of the B names together, and C names.

  • And probably if I kept going, I could do this all the way

  • through Z in the English alphabet.

  • So what's nice about this is that, yeah, I'm making lists of names,

  • but how long is each of those lists?

  • If there's n people in the room, each of my lists

  • is not going to be n long, which is slow.

  • It's going to be what? n divided by 26, give or take.

  • If we assume that there's an equal number of people

  • with Z names and A names, it's going to be roughly n divided by 26 so

  • that I have these chains of human names, but they're

  • much shorter than they would have been if I just grouped everyone together.

  • And this is a fundamental technique in programming called hashing.

  • It turns out there are things in this world called hash functions.

  • These are just mathematical, or verbal, or code-implemented functions

  • that take as input something and produce as output a number typically-- a number

  • from 0 to, say, 25, or from 1 to 26.

  • But they can also output strings in other contexts as well.

  • So my hash function here in my mind is, if you hand me a name,

  • I'm going to look at the first letter in your name.

  • And if it's A, I'm putting you in location 0.

  • If it's B, I'm going to put you in location 1.

  • If it's a Z, I'm going to put you in location 25 at the end.

  • So these are all buckets I've got, so to speak,

  • in computer science-- like 26 buckets or room

  • on the board that represent the starts of people's names.

  • So what is that?

  • Well, it would seem that if I don't know in advance how many A names I have,

  • that's kind of like drawing this as a linked list, if you will,

  • that might just get longer and longer.

  • But I do know that I only have a finite number of first letters.

  • So that-- at the risk of drawing a little messily--

  • is kind of like drawing what data structure?

  • AUDIENCE: An array.

  • DAVID J. MALAN: Yeah.

  • It's kind of like drawing an array that just has 26 spots.

  • And what's nice about an array is that I have random access.

  • I can jump right to any letter of the alphabet in constant time, one step.

  • And once I get there, I'm still going to see a list of names.

  • Thankfully, thanks to linked lists, that list can be short or long.

  • But on average, let's say it's going to be

  • 126th the length that it would have been if I just used one array or one linked

  • list.

  • So this technique of using a hash function-- which, again,

  • I've defined as you give me a name; I take that as input;

  • I look at the first letter; and I return as output a number from 0 to 25--

  • a hash function lets you create a hash table.

  • And there's different ways to implement hash tables,

  • but perhaps one of the most common is indeed like this.

  • You decide in advance on the size of an array.

  • But that array does not contain the strings or the humans' names.

  • That array actually contains linked lists.

  • And it's the linked lists that contain the names.

  • So we borrow ideas from, like, week two.

  • We merge them with an idea today from week four of adding arrays

  • to linked list respectively.

  • And we kind of get the best of both worlds.

  • Because I can immediately jump to any letter of the alphabet super fast.

  • And once I'm there, yeah, there's a list,

  • but it's not nearly as long as it would have been if I didn't use this trick.

  • So what's the running time of all of this?

  • Well, it turns out that a hash table in the worst case

  • might still take you how many steps to find someone's name once it's

  • been added to the list?

  • In the very worst case, how many steps, if there's n people in the room?

  • AUDIENCE: n.

  • DAVID J. MALAN: Maybe n.

  • Why?

  • It's kind of a perverse situation.

  • But can you contrive a scenario in which,

  • even though we're doing this fanciness, it still

  • takes me n steps to confirm or deny that someone's here?

  • Yeah?

  • AUDIENCE: Everyone's name starts with the same letter.

  • DAVID J. MALAN: Everyone's name starts with the same letter

  • for some weird reason.

  • Now, it's a little silly in the human world.

  • But it could happen if you're just talking

  • data or whatever in the computer world.

  • This can devolve into, sure, an array with just one really linked list.

  • But in practice, that's not likely going to happen, right?

  • If we actually spent the time here and asked everyone for their name,

  • we'd probably get a reasonably uniform distribution of letters,

  • at least as is statistically likely with just human names.

  • So that would kind of spread things out.

  • And so there's this fundamental distinction between sort of real-world

  • running time, or wall clock time-- how many seconds are actually spinning

  • on the clock--

  • versus asymptotic running time.

  • We've talked for a couple of weeks now about running time as being big O of n.

  • And that might be still the case, that a hash table-- yes, in the worst case,

  • it's still a big O of n data structure.

  • Because in the worst case, it's going to take n steps.

  • But in the real world, big O of n is really big O of n divided by 26,

  • even though we always ignore those lower-order terms.

  • But when it's you, the human, running the code and analyzing the data,

  • running 26 times faster is actually real time saved,

  • even though a mathematician might say, ah, that's the same fundamentally.

  • And indeed, one of the problems ahead for the next problem set

  • is going to be to suss out exactly what the implications are

  • in your own code for actual wall clock running time.

  • And making smarter design decisions, like something like this,

  • can actually really speed up your code to be 26 times as fast, even

  • though, yes, a theoretician would say, ah,

  • but that's still asymptotically or mathematically

  • equivalent to just something linear.

  • So it's this fine tuning that will make your code even better and better.

  • Now, frankly, hashing on first names probably

  • isn't the smartest thing alone, right?

  • Like, does anyone's-- and this is going to be hard.

  • Does anyone's name start with X here?

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: [INAUDIBLE] is not here.

  • But thank you for that perfect counter-example.

  • But she's not here.

  • So look, there's no Zs.

  • So now we're down to 25 possible values.

  • And I could probably pick some less common letters too.

  • The point is there's probably a few more As than there are Zs

  • or a few more B's than there are Q's just by nature of human names.

  • So maybe just using the first letter isn't good enough.

  • And frankly, with 26 names-- suppose we did this for all of Harvard

  • and had thousands of names.

  • Each of my chains might still have hundreds or thousands of names.

  • So another design question is going to be, well, how many buckets should you

  • have, how big should the array be.

  • Maybe you shouldn't look at the first letter.

  • What if you look at the first and the second letter together-- so AA, and AB,

  • and AC, and then dot dot dot, BA, BB, BC, so you could come up

  • with more and more buckets?

  • But what else?

  • How else might we kind of uniformly distribute people?

  • What do all of you have that we could use as input to a hash function?

  • AUDIENCE: A last name.

  • DAVID J. MALAN: OK.

  • Well, you could do last name, which might give us

  • a different or similar distribution.

  • Yeah?

  • AUDIENCE: ID number.

  • DAVID J. MALAN: Whats that?

  • AUDIENCE: ID number.

  • DAVID J. MALAN: Yeah.

  • We could use your ID number and actually look at the first digit of your ID.

  • And odds are, it's 0 through 9.

  • So we could probably at least get 10 buckets that way.

  • And that's probably uniformly distributed.

  • I'm not sure.

  • We could use birth dates in some way.

  • Like, we could put all of the freshmen in one bucket,

  • all the seniors in another bucket, and everyone else,

  • and so forth, in their own buckets, which would also give us some input.

  • So again, a hash function is entirely up to you to program and design.

  • The goal though is to smooth things out.

  • You want to have roughly the same number of things in each linked list

  • just so that you have about the same performance

  • across all of these various inputs.

  • So let's take a look at a couple of other data structures,

  • again, in this abstract way.

  • Now that we know that, even though it's not obvious at first attempt,

  • we know how to construct arrays.

  • We kind of know now how to construct linked lists.

  • It stands to reason we could implement them together in code.

  • What else could we do now with these building blocks?

  • So for instance, this structure here is a very common one, known as a tree.

  • A tree like a family tree, where there's one patriarch or matriarch

  • at the top, and then their children, and then their grandchildren,

  • and great grandchildren, and so forth.

  • And what's nice about a tree structure is that, if you're storing data,

  • you can actually store the data in clever ways to the left child,

  • to the right child, and so forth, as follows.

  • Notice here, there's something curious about all the numbers in this data

  • structure.

  • What is noteworthy about them?

  • What is noteworthy?

  • Yeah?

  • AUDIENCE: Multiples of 11.

  • DAVID J. MALAN: What's that?

  • AUDIENCE: They're multiples of 11.

  • DAVID J. MALAN: They are multiples of 11.

  • That was just to make them look pretty though by the author here.

  • Yeah?

  • AUDIENCE: [INAUDIBLE].

  • DAVID J. MALAN: Yeah.

  • There's a mathematical significance too.

  • Like, no matter what node or circle you look at, the value in it

  • is bigger than the left child and it's smaller than the right child.

  • So it's kind of in-between.

  • Any circle you look at, the number to the left is smaller,

  • the number to the right is bigger.

  • And I think that applies universally all over the place.

  • Yes?

  • So what does that mean?

  • We'll recall from, like, week 0 when we had a whole bunch of phone book pages

  • that we were searching--

  • 1, 2, 3, 4, 5, 6.

  • Let's give ourselves a 7th one.

  • Recall that when we did divide and conquer, or binary search,

  • we did it on an array.

  • And what was nice about binary search was we started in the middle,

  • and then we maybe went left, or we maybe went right,

  • and we kind of divided and divided and divided

  • and conquered the problem much more efficiently in logarithmic time

  • than it would have been if we did it linearly.

  • But we know now weeks later that arrays are kind of limiting, right?

  • If I keep storing all of my values in an array,

  • what can I not do with the array?

  • Make it bigger, right?

  • I can't add an element to it without copying every darn element,

  • as we've discussed thus far today.

  • But what if I was a little smarter about it?

  • What if I stored my values, not just in an array,

  • but I started storing them in these circles--

  • let's call them nodes--

  • and each of those nodes is really just an integer plus two additional values?

  • How would we implement this data structure in memory?

  • Well, here's an int n-- could represent the number in question.

  • And we could put that in a data structure

  • called a node that just has the same syntax as earlier today,

  • but I've left room for two more fields.

  • What is it that I want to represent in code if I

  • want to start storing my numbers, not in this old-school week 0 array,

  • but in a tree?

  • AUDIENCE: Two pointers.

  • DAVID J. MALAN: Two--

  • AUDIENCE: Pointers.

  • DAVID J. MALAN: Two pointers.

  • Right?

  • A tree, as drawn here literally with arrows,

  • is just like saying every one of these nodes or circles

  • has a left child and a right child.

  • How do you implement children?

  • Well, you can literally just use pointer notation as well here.

  • A left child is just a pointer to another struct on the left.

  • And a right child is just another pointer to the child on the right.

  • And what's nice about this ultimately is that we can now

  • traverse this tree just as efficiently as we can traverse this array.

  • Because notice if I want to search for the number 66,

  • how many steps does it take me if I start at the top?

  • Just like Comey represented the start of our linked list,

  • so in the world of a tree does the root have special significance.

  • And that's where we always begin.

  • So how many steps does it take me to find 66 given the top?

  • AUDIENCE: Three.

  • AUDIENCE: Two.

  • DAVID J. MALAN: It looks like--

  • yeah, two or three, right?

  • I start at the top.

  • I look at it and say, hmm, 55, which way do I go.

  • I go to the right.

  • Then I see 77.

  • OK.

  • Which way do I go?

  • I go to the left.

  • So it's the same logic as week 0 in dividing and conquering the phone book

  • or an array a couple of weeks later.

  • But we get to the number we care about pretty quickly.

  • And it's not linear.

  • And in fact, if we actually did out the math, what's

  • really cool about a binary search tree is that if you have n elements,

  • n circles, the height of that tree is by definition mathematically log n.

  • So the height of the tree just so happens

  • to correspond to exactly how many times you can take n

  • and divide it, divide it, divide it, divide it in two.

  • And you can actually see this if you think about it the reverse direction.

  • On the bottom row, there are how many elements?

  • All right?

  • And on the middle row, there is?

  • AUDIENCE: Two.

  • DAVID J. MALAN: Two.

  • And on the top row, there's one.

  • So you can actually see it in the reverse direction.

  • This is like divide and conquer, but in a different conceptual way.

  • Every row in the tree has half as many elements as the one below it.

  • And so the implication of that is just like from week 0 in the phone book

  • when we're dividing, and dividing, and dividing in half, and half, and half.

  • So this is only to say, now that we have structures and pointers,

  • we can build something like this.

  • But let's try one other example here too.

  • This is a crazy looking example.

  • But it's kind of amazing.

  • Suppose that, if we wanted to store a dictionary of words--

  • so not humans' names this time, but English words.

  • So Merriam Webster or Oxford English Dictionary has what?

  • Thousands, hundreds of thousands of words

  • these days in English for instance?

  • How do you actually store those?

  • Well, if you just look up words in a dictionary back in yesteryear,

  • that is linear.

  • You have to start at the beginning and look through it

  • page by page, looking for words.

  • Or you could be a little smarter.

  • Because the words in any dictionary are hopefully alphabetized,

  • you can do the Mike Smith-style divide and conquer by going to the middle,

  • then the middle of the middle, and so forth--

  • log of n.

  • But what if I told you, you could look up words in constant time--

  • some fixed number of steps?

  • None of this divide and conquer complexity.

  • No log n.

  • Just constant time-- you want a word, go get it instantly.

  • That's where this last structure comes in, which is called a trie--

  • T-R-I-E-- short for retrieval, even though it's pronounced the opposite.

  • So a trie is a tree each of whose nodes is an array.

  • So it's like this weird Frankenstein's monster kind of data structure.

  • We're just really combining lots of different ideas, as follows.

  • And the way a trie works, as is implied by this partial diagram on the board,

  • is that if you want to store the name Brian, for instance,

  • in your dictionary-- it's the first word--

  • what you do is you start by creating a tree with just one node.

  • But that node is effectively an array.

  • That array is of size, let's say for simplicity, 26.

  • So A through Z. This location here therefore represents B for Brian.

  • So if I want to insert Brian into this tree, I create one node at the top.

  • And then for the second letter in his name, R,

  • I create another node, also an array, A through Z.

  • And so here, I put a pointer to this node here.

  • B-R-I. So I should have drawn some more boxes.

  • A, B, C, D, E, F, G, H, I. So here, I'm going to draw another pointer to B--

  • wait.

  • Bian.

  • [LAUGHTER]

  • OK.

  • That's wrong.

  • Billy shall be our name.

  • Billy is at B. Wait.

  • No.

  • Dammit.

  • B, B. B-I-A-- yes, this works.

  • This works.

  • OK.

  • Sorry.

  • So here we go.

  • We're inserting Billy into this fancy data structure.

  • So the first node represents the first letter.

  • The second node represents the second letter.

  • The third node represents the third letter.

  • And so forth.

  • But what's cool about this is the re-usability.

  • So notice if this is the second letter and I counted this out correctly,

  • I, this is going to lead to a third node deeper

  • in the tree where it's L that we care about next, and then another one

  • down here which represents another L.

  • And I'll start drawing the letters.

  • L. This is B. This is I. L. And we'll call this L.

  • And then, finally, another one over here, which is a Y. And this

  • gets pointing down here.

  • This gets pointing here.

  • And so forth.

  • So in short, we have one node essentially

  • for every letter in the word that we're inserting into the data structure.

  • Now, this looks stupidly inefficient at the moment.

  • Because to store B, I, L, L, Y, how much memory did I just use?

  • 26 plus 26 plus 26 plus 26 plus 26.

  • Just to store five characters, I use 26 times 5.

  • But this is kind of thematic in computer science--

  • spend a little more space, and I bet I can decrease the amount of time

  • it takes to find anyone.

  • Because now no matter how many other students are in this data structure--

  • and for instance, let's do another one.

  • If we had another one, like Bob--

  • so B is the same first letter.

  • That leads us to this second node.

  • O is somewhere else in this array, say, over here.

  • So this represents O. And then Bob has another one.

  • So there's going to be another array here.

  • And this is why the picture above draws this so succinctly.

  • This is how we might store Bob.

  • So B, I, L, L, Y. Or you can follow a different route, B, O, B.

  • So we can start to reuse some of these arrays.

  • So there's where you start to get some of the efficiency.

  • Any time names share a few letters, then you start reusing those same nodes.

  • So it's not super, super wasteful.

  • But the question now is, if there's like 1,000 students in the class,

  • or 1,000 students in the room, we're going have a lot of nodes

  • there on the board.

  • But how many steps does it take to find Billy,

  • or Bob, or any name with this data structure, and to conclude yes or no

  • that student is in the class?

  • So, like, five for Billy, three for Bob.

  • And notice none of that math has any relationship

  • to how many students are in the room.

  • If we instead wrote out a long list of 1,000 names, in the worst case,

  • it might take me 1,000 steps to find Billy or Bob.

  • Maybe I could be a little smarter if I sort it.

  • But in the worst case, big O of n, it's linear.

  • Or if I used a hash table before, and maybe there's

  • 1,000 students in the room, but, OK, there's

  • 26 letters in the English alphabet at least.

  • So that's 26 buckets.

  • So maybe it's 1,000 divided by 26, worst case,

  • if I'm using those linked lists inside my array.

  • But wait a minute.

  • If I'm using this structure, a trie, where every node in the tree

  • is just in an array that leads me to the next node, ala breadcrumbs, B, I, L, L,

  • Y is 5 and always 5.

  • B, O, B is always 3.

  • B, R, I, A, N would have been 5 as well.

  • None of these totals has any impact or any influence

  • from the number of total names in the data structure.

  • So a trie in some sense is this amazing holy grail

  • in that, by combining these various data structures, now you get constant time,

  • but you do pay a price.

  • And just to be clear, what is the price we seem to be paying?

  • AUDIENCE: Memory.

  • DAVID J. MALAN: Memory.

  • And in fact, this is why I'm not really drawing it much more.

  • Because it just becomes a big mess on the screen because it's

  • hard to draw such wide data structures.

  • It's taking a huge amount of memory.

  • But theoretically, it's coming faster.

  • Yeah?

  • Question.

  • AUDIENCE: So would you deal with a case if someone is in the Bob,

  • but then the other kid is in the Bobby?

  • DAVID J. MALAN: Good question.

  • So it's a bit of a simplification.

  • If you were storing both Bob and Bobby, you would actually keep going.

  • So each of these elements is not just one letter.

  • You also have essentially a node there or some other data structure

  • that says either stop here or continue.

  • And you'll see actually in the problems that we'll

  • propose to you how you can represent that idea if you

  • choose to go this route.

  • Indeed, the challenge ahead ultimately is something quite like this.

  • You will implement your very own spell checker.

  • And we will give you code that gets you started with this process.

  • And of course, a spell checker these days in Google Docs

  • and Microsoft Word just underlines in red misspelled words.

  • But what's going on?

  • And how is it that Word or Google Docs can

  • spell check your English or whatever language so quickly?

  • Well, it has a dictionary in memory, probably with tens of thousands

  • or hundreds of thousands of words.

  • And all they're doing constantly is, every time you type a word

  • and hit the Spacebar, or Period, or Enter,

  • it's quickly looking up that new word or those words in its dictionary

  • and saying, yes or no, should I squiggle a red line underneath this word.

  • And so what we're going to do is give you a big text file, ASCII text,

  • containing 100-plus thousand words.

  • You're going to have to decide how to load those

  • into memory, not just correctly, but in a way that's well designed.

  • And we'll even give you a tool, if you choose to use it,

  • that times how long your code takes.

  • And it even counts how much RAM you're actually using.

  • But the key goals for this week and our final week in C

  • is to take some of these basic building blocks,

  • like arrays, and pointers, and structures,

  • and decide for yourselves how you're most comfortable stitching them

  • together, to what extent you want to really fine tune your code beyond just

  • getting it correct, and to give you a better sense of the underlying code

  • that people have had to write for years in libraries

  • to make programming doable, ala Scratch.

  • Because in just a few weeks, we're going to transition to Python.

  • And the dozens of lines of code you've been writing now

  • are going to be whittled down to one line, two line,

  • because we're going to get a lot more features from these newer,

  • fancier languages.

  • But you'll hopefully have an appreciation of what is actually

  • going on underneath that hood.

  • So I'll stick around for any one-on-one questions.

  • Let's call it a day.

  • Take a duck on your way out for roommates as well.

  • And we'll see you next time.

[MUSIC PLAYING]

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it