Subtitles section Play video Print subtitles [MUSIC PLAYING] DAVID J. MALAN: All right, this is CS50, and this is week 4. So let's consider first what we did last time, which was focus on numbers and focus on how we can search them and how we can sort them. And in particular, we took a look at a number of algorithms that we finally gave names to, a couple of which we actually saw back in week 0, but didn't treat as formally. So we looked more formally last time at linear search, this process of searching elements from left to right or right to left as though you're walking along a line looking for some value. We looked at binary search, a divide and conquer algorithm whereby we started, in this case, in the middle, and then we looked left to right and then went to the left, went to the right, and then did the same problem again and again. We looked also at bubble sort. How do you get to the point where you can use an algorithm like binary search, which assumes that your inputs are sorted? So how do you get to that sorted state? Well, bubble sort was one algorithm that essentially swapped elements pairwise from left to right, from left to right, until finally the biggest elements bubbled up to the top, and the whole list was ultimately sorted. But we also looked at selection sort. And selection sort is the name implied, selected, then smallest element again and again and again, plucking it out of the list, and putting it where it belongs. Insertion sort, meanwhile, walked through the list really just once, and as it encountered each element, it then inserted it into the sorted half of the list, the left half of the list in its proper place. And then lastly, and where we really concluded and really got more bang for our buck was this final algorithm merge sort, which actually solved this same problem no less correctly, but so much more efficiently. And indeed, lends itself perhaps to a fundamentally better design in so far as it leverages a much better running time, so to speak. In fact, we started to ascribe these kinds of formulas to the running times of or the performance of our algorithms n squared, otherwise known as quadratic, whereby you take n times n number of steps or minutes or seconds or some unit of measure, n log in, which is actually what we achieved with the merge source, which was much faster recall than the end squared algorithms like bubble sort and selection sort and insertion sort in their upper bounds. n would be something linear, like a straight line relationship. Logarithmic of n, log of n, log base 2 of n, technically, would be something like week 0's divide and conquer approach looking for Mike Smith, as well as last time's binary search. And then one, which doesn't have to be literally one step. Maybe it's two, maybe it's 10, but it is a fixed finite constant number of steps. And that too might be the running time of some algorithm. Now how do we describe the running times of algorithms? Well, we use some special notation, asymptotic notation, so to speak, which while it might look cryptic at first glance, really is just a handy way of succinctly expressing the fact that you know what the upper bound on some algorithms running time is, what the lower bounds on some algorithms running time is. And if those are one in the same formulas in theta, do you have a coincidence of the two of big O, so speak, and capital theta. So that, while Greek, literally is just a way of expressing a bit more succinctly what these running times are. And we'll continue to revisit this issue as we look at more algorithms and soon data structures still. But this time, we apologize. We pull back a layer here and admit that there is no such thing as a string. Indeed, all this time we've been saying that there's ins, and there's floats, and there's chars, and there's doubles and more. And we've also been saying there are strings, but there really aren't strings. This is sort of a figment of the imagination of our so-called CS50 library. But it's a pedagogical simplification that we've been using for the past several weeks, so as to not get lost in the weeds, the lower level implementation details of what a string is, so that we can just get real work done in these first weeks. But now we'll begin to look underneath that hood and see what a string actually is and what the implications are. And it turns out, while more complicated in some sense, it really just boils down to some first principles, what it is the computer is doing underneath that hood. So let's take a look at string first by way of a couple of examples. Let's go in to CS50 IDE, create a new file, save it as Compare zero dot C, and look at the little program that actually doesn't necessarily do what we think it's going to do. In particular, let me go ahead and include our typical, include CS 50 dot H. And let me go ahead and include standard IO dot H as well. Let me go ahead and use main void, so I'm not going to worry about any command line arguments for now. And then I'm going to go ahead and just prompt the user. Hey user, give me a string called S for instance. And then I'm going to have no newline. I just want that all in one same line. Now I'm going to go ahead and do string S, gets gets string, open paren and close parens, so as to get a string from the user. And then, let me do this. Let me also print T colon and ask the user, essentially, for a string that I'll call T, since T comes after S quite simply. And now let me just compare these strings as the filename suggests. So I know how to compare, not with if s equals t, because that's the assignment operator. But we know that s equals equals t should compare the values on the left and on the right. So let's try this. So if s equals equals t, then I'm going to go ahead and print out same. Elts, they are presumably difference, so I'm going to go ahead and print out different with a newline character and then save it. So a pretty simple program, so let me go down into my terminal window and run make compare 0. Now let me go ahead and run dot slash compare 0. s, I'm going to go ahead and type in Zamyla. And I'm going to go ahead now and type in Maria. OK, and they're different. I expected as much. Now let's go ahead and run this again. Again with Zamyla. And let's just say Zamyla again, different. What did I do wrong? Let me try this again. Maybe it's the capitalization. So Zamyla in all lower case, different. Well, maybe it's just Zamyla's name. Let me try Rob or RLB? How about RLB? Those are different, as is Rob and Rob. So what is going on? Those strings pretty much look the same to me. I'm typing the same incantation of strings, so what is it that's going on here? You know what, let me do a test with something else. Let me go ahead and create a new file here. I'm going to call this copyzero.c. Maybe I'm just misunderstanding how comparison works. But surely I should be able to copy a string and make an identical copy, so let's do that now. Let me go ahead and create a file called copyzero.c. And let me do an include cs50.h. Include standard io.h into main void, so just as before. And now, let me go ahead and just prompt the user like before. Give me a string s. And I'll put that in a variable, s, calling get string as before. And now, I remember from prior classes that I'm supposed to do, if s equals equals null, maybe I need to do some error checking here. And if that's the case, I'm going to arbitrarily return 1. So recall that main can return values. 0 means all is well. 1 or any other non-zero value means something is wrong. So maybe that's what I was doing wrong earlier, just not error checking. So let's try that. Now, let me declare a variable of type string called t and assign it to s. So in other words, I want to copy s into t. And I know that this happens from the right to the left. So that's as I think it should be. And then, if strleng of t is greater than 0-- you know what, I'm going to do one other thing. Not just copy it. I'm going to go ahead and do this. Recall, that you can treat strings as arrays. So I'm going to say the zeroth character, the first character of t-- you know what? I want to uppercase it. I want to really make sure that these two strings are indeed different as I intend. And let's go ahead now and do this. Let me go ahead now and do printf of s. And let me plug-in s's value in a new line. Plug-in s. And then printf t:%s for a placeholder again. Comma t semi-colon. So in other words, if the length of t is greater than zero, let's capitalize it by changing that first character. And then, just print out s and t. And surely, by capitalizing only t, I should see only one capitalized word. Now, I'm using both strleng and two upper. So rather than let Clang have a chance to yell at me, I'm going to go in here and preemptively add ctype.h. Which recall is a library you might have seen or certainly will soon see that has a number of functions in it. Among them, two upper. And then, I also need to include string.h, so that I can use strlen-- the function that gives me the length of the string. So Clang would have yelled at me if I forgot that, but let me preemptively solve that problem. And now, do make copies zero. All seems to be well. Let's run dot slash copy 0. All right. Let's go ahead and type in-- how about just my own name in all lowercase. Huh? Now, why is this confusing? So I wrote code that got a string from the user and stored it in s. I then wrote code that declared a second variable, t. And I set t equal to s, thereby making I would think a copy-- as in past weeks of using the assignment operator. Then down here, I made sure t was long enough. That's just a quick sanity check. And then on this line here, I'm just saying, set the first character of t, t bracket 0, equal to the result of upper casing the first character of t. So the only code that's touching t is this one here. And yet, somehow my name gets capitalized both in s and in t. So what is it that's actually going on here? It seems to be broken still. In fact, let me go ahead and open another example, rather than type this one out ourselves. Let me go ahead and open up an example called no swap. As the name suggests, it's a bit of a spoiler. I wrote this program in advance to do the following. First, I've included standard io.h, so I can use printf. I have a prototype of my function called swap up here, because indeed, the goal at hand I decided was I just want to write a simple function that swaps two values. Given a and b, make b a and a, b. And then, return. So I'm just going to arbitrarily test this out by declaring a variable x. Setting it equal to 1. A variable y setting it equal to 2. And then, just as a sanity check, I'm going to print out x is such and such, y is such and such. And then, I'm going to claim swapping dot dot dot. And then, the key line is apparently this one. Call a function called swap, passing in x and y. And if I implemented swap correctly, this should swap the two variables. Thereafter, I'm going to claim swapped x as such and such, y as such and such. So let's run this program and see what else is apparently a lie. Make no swap in my source directory. ./noswap enter. And if we scroll back up in my history, you'll see x is 1, y is 2. Swapping swapped. x is 1, y is 2. All right. So maybe just the swap function is buggy. This isn't necessarily indicative of a misunderstanding. So let's look at the implementation of swap. Swap returns no value and I think that's OK, so long as it takes inputs. Swap takes two inputs, an ints and an ints, called a and b respectively. And then, let's consider how this works. So I've declared a temporary variable, called temp-- though I could call it anything I want. And I'm storing in it the value a. So I'm taking a-- and it's the number 1. And I'm just temporarily storing the number 1 in this additional temporary variable, so that I now have two copies of the number one-- in a and in temp. Then, I change the value of a to equal b. So at this point in the story, a should have a value of 2. b should have a value of 2. Which could be problematic, except I've saved the original value 1 in my temporary variable. And so, indeed I change b to equal that temporary value. So indeed, I can't do this magical switcheroo. Because if I simultaneously took one value and one value and put them in each other's places, I would essentially risk clobbering one or the other. Because the computer typically can only do one thing at a time underneath the hood, even though it's super fast. So I run the risk of somehow getting two copies of that same value, unless I'm careful. I really want to take one value, put it in a temporary space. Then, copy one value over. And then, do that switcheroo. But it doesn't seem to work. And yet, this is a good opportunity for one more tool. Let me do a further check. Let me go ahead and run debug 50 on no swap. And you know what? I'm going to set a break point right at the beginning of this program on line 10 by simply clicking in the so-called gutter of the line number here. A red dot appears, so that now when I run debug 50, notice my right hand bar pops out. Notice that the line 10 in question is highlighted, because execution is paused at this line. Let's take a look at what's going on the right hand side of my debugger. Notice that x has a value of 1 and y has a value of 2. And that's what I intend. Now, let's begin to step over these lines. So let's step over the printf. And we see on my terminal window that x is 1. Step over the next printf and we see that y is 2. Now, let's step over that and we see indeed swapping dot dot dot. But rather than step over my call to the swap function, let's step into it. So I can actually see maybe my logic is buggy. So let's go ahead and step into swap. And notice that debug 50 immediately jumps down to the first line and swap. And notice now on the right hand side, what's going on? a is 1 and b is 2. And that's as expected, because I passed an x and y. Swap calls it's two inputs a and b respectively, so a is 1, b is 2. Temp though, very curiously is 32,767. And where is that value coming? Well, it turns out when you declare a variable, but haven't yet said it's of some value it's going to by definition have an undefined value. A so-called garbage value. Remnants of other aspects of your program having used that chunk of memory before. And we'll see why this is in just a bit. So this is anomalous. It's not a bug in the program. It's to be expected, because I haven't yet assigned temp a value until-- let me zoom out. Until I step over line 21 and actually let temp equal a. And notice, if you watch the right hand side of my screen now, temp is currently 32,767. And when I step over it, OK. It becomes 1. So that's as expected. Now, a is about to become b. And indeed, we now see that a is 2 b is 2 and temp is still 1. So now we have to put temp into b, so let's step over that line. And indeed, we've done a switcheroo of a and b. a is 2, b is 1. OK. Let's go ahead and play this out. Let's continue here. I claim that I've swapped. I'm printing x-- and somehow x and y are still 1 and 2. Even though the debunker told me it would seem that my swap function was correct. So what is happening after all of these weeks of CS50 and all of these problems solved thus far? It would seem that all of our understanding of things is kind of unraveling. And that's because we've been very careful over the past few weeks, at least in class. And in sections to try to avoid tripping over some of these lower-level implementation details. And there's not that many of them. But today, is now a time to peel back this layer and understand exactly what it is that's going on. Indeed, all of this time when you run a program on your computer, double-click an icon or run the program's name with dot slash something or other at your terminal window-- what happens is that your computer gives that program the illusion of a really big chunk of memory all of its own. Maybe two gigabytes of memory, even though it doesn't necessarily use all of that memory. And that memory-- if you just think of your computer's memory, as we've done before, is like a rectangular region. And we could number of the bytes in my memory. But it doesn't really matter what the addresses are, what the numbers are, just that it exists. It turns out that when you run a program, your operating system typically lays out your program's memory in a fairly standard way. There's a chunk of memory down here for something called environment variables. There's a bigger growable chunk of memory down here called the so-called stack. On the opposite side of the picture is a so-called heap. Another chunk of memory that actually grows in the other direction. So long story short, bad things can happen if both of those grow bigger than you intend. Then, there's some kind of uninitialized and initialized data up top. And then, text. Now, it turns out text is the segment of memory where your program's zeros and ones live. So when you double-click an icon on your Mac or PC or when you run the command dot slash something, those zeros and ones are loaded from your hard disk or solid state disk into RAM or memory. And it's put conceptually at the top of the memory that your computer program is using. And below that is the actual data that your program is using. The variables and the values inside of it. Now, each of these types of memory have different purposes. And we'll see in just a moment what it is that's going on. And we'll ultimately peel back these layers. So what is it that's actually going on underneath the hood here? Well, let's consider this to be my computer's memory. So focusing on just that bottom most portion, which I called the stack a moment ago. So if we draw just the bottom of my computer's memory kind of like this, the bottom of it has technically environment variables. But let's focus on the region known as the stack. And the stack, as the name implies, is kind of like the stack of trays that you might see in a cafeteria or a dining hall, where you put trays on top of another until it can get potentially pretty tall. And it turns out when you run a program, not only is your program given the illusion of this really big memory space laid out as proposed, but it also by convention uses this memory in a fairly standard way. Specifically, when main is called, main is given a chunk of memory at the bottom, so to speak, of this stack space. And so, let me go ahead here and write main. And any local variables that main has and any arguments to main, namely argc an argv, end up inside here. So if indeed you are using something like argv and argc, you might have a value like this down here. And you might have another chunk of memory carved out here for argv. And if you have a couple of local variables, for instance x and another one, y, those two would be allocated space in this slice if you will. This frame of memory. Meanwhile, if main calls a function, like swap-- swap is allocated a swath of memory, a frame of memory, above main by design. So if I've called swap, its memory ends up here. And if the swap function itself has arguments, like a and b or any other local variables, those values too are put inside of the so-called stack frame. So this might be a and this might be b. In other words, the concepts that we've been taking for granted, both in Scratch and in C, at the end of the day, boil down to values needing to go somewhere physically. And so, if you assume that the big rectangular region here is your computer's memory. And then, you consider that the operating system really just slices and dices this memory, such that mains memory is down here. Any function that main calls is immediately above it. And frankly, if swap called its own function, it would end up above it. But now, given this basic definition of memory management and the layout of computer program's memory space, you can perhaps start to infer why all of these failures have started to happen in my program. A moment ago, I didn't have argv and argc. I just had for instance x and y. And I had the value 1 and I had the value 2. Main then called swap and put a copy of 1 there and a copy of 2 there. And indeed, that's the key insight. When a function calls another function, passing in arguments as inputs, that the function is being passed to copies of those inputs. So at this point in time, if you opened up the lid of your computer and looked inside digitally, you would see 1 and 2 down here. And you would see another pattern of bits representing 1 and 2 up here in duplicate. So now when my swap function operates, it declares a temporary variable recall, called temp. So let me draw that here. And as I recall, it stores in temp, which value? The value of a. The value of a is 1. It then took the value of b, put it in a-- which puts that value here. And now at this point in the story, a and b are incorrect. We still need to put the value 1 in b. And that's why we then took temps value, put it in b. Thereby giving the me ultimately the number 1 in b's slot as well. And so, at this point in the story, temp still exists. And a and b have the correct answers, 2 and 1 respectively. But the catch is the moment that swap returns, this happens. Essentially, everything that was being used above main disappears. It's not actually deleted. All of those bits are still there, so technically the numbers are still there. But the computer just forgets about those values. And indeed, the key takeaway here is that when execution returns to main-- that is when swap says, return I'm done. And main takes over operation again, 1 and 2 are completely unchanged. So all of the word that we just did up there was a complete waste of time in some sense. Because even though it was correct in so far as it swapped a and b, it did not actually have a permanent impact on the variables that I actually cared about. Let's consider another example now. Recall my previous example where when I whipped up some code to compare values, I called get string once and stored its return value in s. I called get string twice and stored its return value in t. And then, I just tried to compare s and t. Well, what does that actually mean? It turns out that when you call get string, it's not technically returning a string to per say. It's not technically returning to you a sequence of characters. It's actually returning something much simpler. When I call get string and do this-- string s gets get underscore string, all that's happening is this. The left hand side of this expression is telling the computer, hey, computer. I need a variable called s that's big enough to fit a string so to speak. On the right hand side, get string indeed gets a string from the user, like D-A-V-I-D or Z-A-M-Y-L-A. But where does that string come from? It turns out it comes from somewhere else in memory. And suppose that the user has indeed typed in Z-A-M-Y-L-A quote unquote so to speak. That's just kind of floating somewhere in my computer's memory. And if this string happens to exist at address number 123, byte number 123 in my computer's memory. What's actually going to get returned is the value 123. In other words, a string is technically just the address of a character in memory. In fact, if we zoom in further on Zamyla, recall that Zamyla's name really looks like this underneath the hood. It's an array, a block of contiguous memory with a backslash 0 at the very end. And if this first byte just happens to be the number 123, the second byte is going to be 124. The third byte it's going to be 125. And so forth. In other words, if my computer has a billion bytes of memory, 2 billion bytes of memory-- like a gigabyte or two gigabytes, you can certainly number each of those bytes and the computer does underneath the hood. And so Z-A-M-Y-L-A backslash 0 is simply a sequence of 7 characters that live somewhere physically in memory. And what a string technically is, is just kind of a breadcrumb-- specifically the address of, the number of the first byte of Zamyla's name. So when you return a string, you're not handing back Z-A-M-Y-L-A backslash 0, you're handing back something much smaller. A little scrap of paper, if you will, that's a map to Zamyla's name somewhere in your computer's memory. And so over here, this number 123 is generally called, not just a number, but an address. Much like post boxes and houses have addresses physically that uniquely identify them, 123 uniquely identifies that byte in memory. And so via this address, this map if you will, can my program later on find Zamyla's name and do something with it. Like printed out, or capitalize it, or compare it, or anything else. And so, how though does the computer know where Zamyla's name begins and ends? Well, it begins at 123. And again, recall from two weeks ago, it ends with backslash 0. So with those two markers, where does it begin? Where does it end? Can you infer the entirety of someone's name or the entirety of a string in between? Now, consider my second line of code. String t gets get string. So what does this mean? This is another function call. And suppose that I just happen to type in Z-A-M-Y-L-A enter, just like before. What's happening there is that the computer, blind to the fact that, that might already exist somewhere in memory, is going to give me another chunk of memory somewhere with Z-A-M-Y-L-A. And then, backslash 0. Whereby, this is effectively an array of its own. And I don't know where it is. Maybe it's at address number 234, followed by 235 dot dot dot. It's somewhere else in memory. I don't know or need to even care where it is. But the key detail here is that t in so far is it is a variable itself. It gets what value? Well, if the string's Zamyla, the second time I type it in it just happens to end up at location 234. What ends up in t is simply the number 234. So if we change back to my code where I was comparing s and t, as I am here. I've called get string once, stored the value in s. Called get string twice stored the value in t. And then, I'm quite simply saying, if s equals equals t, then say same. Else, say different. Now, when you consider what is really going on underneath the hood all these weeks, certainly 123 is not the same as 234. And so, s does not equal t. Capitalization meanwhile. What happens with that other program? Well, we'll do this one more quickly. But if I for instance, call string s gets get underscore string open paren close paren. That again, is going to give me something like this. And it's going to give me something like Z-A-M-Y-L-A backslash 0, all of which I can think of, as before, like an array-- if sloppily drawn. Maybe starting again at address 123. And so that's what ends up in s. But in my copy 0 program, recall that I did this. String t gets s. I didn't call get string again. I just said, store s inside of t. Effectively, I thought, making a copy of it. What this left-hand side gives me is another box called t. If this is s, this is t. But what goes inside of this new box t? Well, what goes inside is literally a copy of s. What is s? s is 123. OK. t then, is 123, which means later in my copy program, when I simply decided to capitalize t bracket 0, the first character in my string t-- that's kind of misleading, because my string t is really just my string s. They're sort of synonyms for one another at this point, because one indeed equals the other. And so what has happened is that I have gone to the first character in t, which of course is z. And recall from my example earlier when I typed it in all lowercase, David, for instance, with a lowercase d, it capitalized it not only in t, but also in s. In other words-- and frankly, it doesn't really matter, typically, what these addresses are. I'm just using 123 and 234 because they're sort of easy to say. But you can really think of s as again, being a map that leads you to the string you care about. A pointer, if you will. Literally an arrow. And t, similarly, can be thought of as a pointer. And the key detail here is that because I've set s equal to t-- or t equal to s-- they are effectively pointing at the same thing. So strings are a lie. There is no such thing as a string data type. There are things called chars, characters that can live somewhere in memory. And we humans can arbitrarily decide that hey, if we put a backslash 0 character at the end of a sequence of other characters, we can all just agree universally to treat that as the end of a quote unquote "string," that is a word or a phrase or paragraph or anything even bigger. But we need a convention for remembering where strings begin. We've already solved the where do they end problem. So where does the string begin? It begins at an address. It begins with a pointer. And so this special data type that we declare in CS50's library called string really is just in CS50 IDE an 8-byte value, a 64-bit value that is just a really big number that represents the address in memory of a string. And I say really big just because the IDE gives you access to lots of memory, certainly numbers bigger than 123. But a string is just a number, is just an address, AKA a pointer. And that explains, then, why all three of these examples did not behave as I might have hoped, because rather they were taking things a little too literally. Or I was failing to appreciate what's actually going on. Let's pause for a moment, take things down a notch. Make things a little more real with a bit of claymation that will motivate, eventually, peeling back this layer further and seeing what's really going on. [STRUMMED CHORD] NICK PARLANTE: Hey, Binky. Wake up. It's time for pointer fun. BINKY: What's that? Learn about pointers? Oh, goodie! DAVID J. MALAN: Binky, who exist here only in claymation form, is the product of a good friend of ours, Nick Parlante at Stanford, who teaches computer science there. You'll see more of Binky and hear more of Nick in just a moment. But these here are sort of metaphorically the training wheels that we've had on for the past few weeks. And the goal now at hand is to take these off, and to finally start looking at what's really going on underneath the hood. And starting to remove, if you will, let's see if-- [BANG] --probably not the best idea. Remove, if you will, these training wheels, and actually see what's going on, and understand and take advantage of the same. As follows. Let's go ahead now into CS50 IDE, and go ahead and open up, let's say, compare1.c, which I wrote in advance to look as follows. And you'll notice that it works a little differently from version zero. Here we have a prompt for string s. And we store in it the return value of get string. But notice what's on the left-hand side. Char star s, all of a sudden. Indeed, all of this time, I've been treating things as though they are strings, literally. But it turns out a string is just a synonym for a data type known as a char star. And the new syntax today, then, is this star operator. The asterisk that actually has special meaning in certain contexts, not just multiplication. But in this case, it specifies that the type of a variable is not a char literally, but a char star, the address of a char, a pointer to a char. Now, why char? I thought we were talking about strings. But again, recall that a string is just a sequence of characters back to back to back, and therefore, you can define a string by the address of its first character. Ergo, what we really need underneath the hood is a data type that lets us store the address of a character. There is no string. And so what does this allow us to do? Let me go into CS50 IDE, and let me declare then, on this line here, that s, this time, will not be a "string" quote unquote. That was from the CS50 library. But rather it's going to be a char star. It turns out that all this time get string, again, does not return a string, it returns the address of a string, AKA the address of the first character in a string. And so the type of value it's returning is not just a number. It's not just an int. It's a special type of an int. It's used for a different purpose. It's simply an integer that represents the address of a char. And the way you type out address of a char is literally char star. So this, then, is identical to my previously in weeks past having typed string s. Now I'm going to start typing it as char star. Meanwhile, t is going to be the same. So when I prompt the user for another string there, I'm going to store that return value inside of t. And now, notice, just for good measure, I'm making sure that both s and t are not null. I'm using a bit of conditional logic there, saying if s is not null and t is not null, it's safe to proceed. Because recall that get string can accidentally return null sometimes if your computer is out of memory, or something else goes wrong. Or not so much accidentally, but by design. But notice this new chunk of code. It turns out-- and we know now from a moment ago-- you can't just compare s against t. They're not going to equal the same thing if you type in two independent strings. We need a special function that actually compares strings in a conceptual way. I mean that a string is equal to another string if every one of its characters equals every one of the other string's characters. Thankfully, there exists in C a function that does exactly that called strcomp, string compare, and it takes two arguments. The first is a string. The second is a string. Or more properly, the first is the address of a string. Or even more properly, the first is the address of a character. The second is the address of a character. And str compare is just going to hope that both of those strings eventually end in a backslash zero, so that they don't loop forever through memory. They eventually hit that special null terminating byte. And if so, and those characters are all entirely equal, you print same. Else, as before, we print out different. So now, when I compile this program, make compare 1, and then I do ./compare1. Now I'll type in david in all lowercase, david in all lowercase. They're indeed the same. Let's do it again. Zamyla with a capital Z. Zamyla with a capital Z. There are indeed the same. Let's do Zamyla with a capital Z. zamyla with a lowercase z. Different. And then FUBAR, clearly different. Now we're actually comparing these things properly. Because now we're appreciating what it actually means to be a string, and we are underneath the hood comparing what we should be doing. Now, underneath the hood, what is str compare doing? Honestly, it's probably just a while loop or a for loop that is iterating over the string and their lengths and looking at the i-th character in each string, and making sure they're all in fact equal. Let's go ahead and fix copy with version 1 here. Copy 1.c that I've written in advance now looks like this. I still prompt the user for a string s with these lines here. I then, just to be safe, say hey, wait a minute, if s equals equals null, return 1. And again, 1 is just arbitrary. I just want to get out, lest I break something later. Now, down here, this is a new line of code. And this is perhaps one of the most powerful ingredients we'll see this week, is this new function called malloc, memory allocate. This is a special function via which you can ask the operating system, Linux in the case here, or Mac OS, or Windows, if you're running the code locally, hey operating system, please give me a bunch of bytes of memory. Now, why do I want this? This program is copy 1.c. The goal at hand is to create a program that copies a string s and stores the copy in t, so to speak. Last time, it was not sufficient just to say t equals s, because that copied the addresses. That didn't give me a copy of Z-A-M-Y-- Z-A-M-Y-L-A backlash 0. It instead just gave me a copy of the address. So how do I get a complete copy of Zamyla's name? I need to preemptively do a little bit of arithmetic and say, all right, how long is Zamyla's name? Well, it's the length of s, str len of s. But plus one. Why plus one? Why plus one? Yeah? Exactly. We now were hit the lowest level of the computer. If we don't ask the operating system for memory for that extra backslash 0 byte, we're not going to get it. So we have to explicitly say, give me one more byte, because I know how strings are implemented underneath the hood. I need to put that backslash zero there, ultimately, and then whatever that expression is, the length of Zamyla, so Z-A-M-Y-L-A, six, plus one, seven bytes. Times the size of a character. Turns out it's always going to be one, by definition. But just for good measure, I'm clearly saying, give me seven times the size of a char, which is going to be one. That gives me seven total bytes. So just to simplify. If you multiply all this out, because the line looks unnecessarily cryptic at the moment, this really is equivalent, at the moment, to just this. Call the function malloc. Give it the number seven, so that malloc, and in turn, the operating system, looks inside of its memory bank, so to speak, and says, hmm, where are there are seven available bytes that aren't currently in use? Ah, here is a chunk of them. And it's a contiguous chunk. It's going to find a block of memory, a rectangular region, if you will, and grab seven bytes, and return them to my main function. But what do I mean to return a chunk of memory? Well, just as get string returns a string by returning the address of the first character in that string, so does malloc equivalently simply return the address of the first byte of memory. But the danger now is that unlike a string, malloc is not giving you characters. It's just giving you seven bytes in a row that you are now free to use. It does not give you a backslash zero at the end of them. If you want to remember the length of the chunk of memory you just allocated, the burden is entirely on you, the programmer. And indeed, one of the most common sources of bugs in writing code in C is to forget about how long was this chunk of memory, and to accidentally, with a loop, go too far past the end of it. And we'll see what can happen in those cases. So now, assuming I do have in t the address of that chunk of memory, let me just say, if t equals equals null, return 1. Something happened that's bad, probably the operating system just didn't have seven extra bytes of memory to give me. So fine, I'll quit. Then down here, what do I want to do? Well, I now need to implement, at least in this example, my own copying process. Here, at this point in the story, I have two variables, s and t. s contains the address of Zamyla's name. t contains the address of a new chunk of memory of length seven. So here's what I want to do. Just like a couple of weeks ago, I'm going to iterate from zero on up to the length of the string. But not up to, but up through the length of the string. Because in this case, I actually want to iterate with a for loop up through that backslash 0 byte. And then just this syntax from a couple of weeks ago, when we simply manipulated strings as for our cryptography ciphers, character by character. Make the i-th character of t equal to the i-th character of s. And this is perfectly valid, because so long as this loop doesn't go past n, the number of characters that I allocated, seven, in this case, I can go to t bracket 0, bracket 1, bracket 2, all the way up through n, effectively copying the string. And so now when I actually print out s and t, I should see truly a copy of t. Because even when I force its first character to lower case with this same line of code here as before, I'm actually changing different memory. So let's compile this. Make copy 1, ./copy1. And let me go ahead and type in zamyla in all lowercase, and now notice the program does seem to work. Zamyla is reprinted in lower case for s, but it's then print in uppercase for its first letter for t. And because the z's look pretty similar, let's do my name again, whereby I type david in all lowercase. Type Enter. And now, you see s is still david all lowercase, but t has only now been capitalized itself. It hasn't had a side effect of some sort on s, because they're different chunks of memory. Why? Well, what has just happened in this program is this. We have, again, done string s gets get string. And when we typed get string, this gives me a chunk of memory for the address of s. Get string gives me a name, like D-A-V-I-D in all lowercase, plus that backslash 0. Which again, is really just an array underneath the hood like this, that starts at some byte, and maybe it's again, by coincidence, 123. Just because it's easy to say, but that's not where it's necessarily going to end up. And so what ends up here is 123. And then later, when I allocate t, I again get this little chunk of memory that's supposed to store the address of a character. And actually, recall that we're now doing this as char star, not even string. So t is similarly a char star. And what happens, malloc, when I ask it for seven bytes, gives me 1, 2, 3, 4, 5, 6, 7 bytes of memory. There's no null terminating character just yet. It's just a block of memory. And frankly, there could be some random values here, as denoted with question marks here. It's just a chunk of memory that might have been used previously in my program for some other purpose. But what gets stored here, if this happens to be at address 234, is this value here, 234. And if you're not liking the numbers, again, you can think of these as just being pointers, arrows, to these chunks of memory. But now, in my C code, when I have a few lines above this loop whereby I am copying from s bracket i into t bracket i, each of the characters in my loop, what's actually happening? Well, fairly intuitively, this lower case d ends up getting copied here. This lower case a ends up getting copied here. V-I-D, on through. And David can't count, so-- backslash-- oh, right. David's name is shorter than Zamyla's name, which means we didn't actually ask for this many characters over here. But we have taken the computer more literally now. Give me six bytes, not seven bytes, in this case. And then literally copy each of the characters from the original string into this new string all the way up through that backslash 0 character. And then when you capitalize the first character in t, you are literally only changing this-- we can do better than this. We are only changing this first character here, which looks like that now. And that's what's going on underneath the hood. So this is why, then, in the beginning of the class, we don't introduce strings as char stars, because you very quickly get caught up in a lot of this minutia. But at the end of the day, it's not all that complicated once you realize that a string is just an address, and malloc, this new function, also just returns an address. This is very powerful, because now you have these sort of breadcrumbs that can lead you to different places in memory. A little map, so to speak, that can lead you to actual strings in memory, and can actually now solve problems more effectively. For instance, we can go back and solve one other problem we saw a moment ago, which was swap. So this version of swap was broken why? What was the source of this fundamental problem? Yeah? STUDENT: [INAUDIBLE] DAVID J. MALAN: Yeah. When you went back to main, you erased the memory on top of it, and the fundamental problem there was when I passed in x and y, they became copies called a and b, in different chunks of memory. And so the fundamental problem seems to be that swap is incorrectly implemented. It's logically correct. It does swap two values. And we saw that with debug 50, but it's kind of fundamentally flawed in so far as it requires, it seems, by design of C, that a and b be passed in by value as copies, so to speak. We need some way to change this function to say main, hey, uh-uh, don't give me copies of your variables. Give me a treasure map that will lead me to your variables. Give me the address of x. Give me the address of y. And I'll still call them a and b, or whatever I want, but lead me to the original values. Don't just pass me copies of those values. And so we can change swap as follows from a program or a function that's incorrect entirely, but into one that is correct. And we need to change the syntax a little bit. So before is what we had here. After is what we now have. Before, after. Before, after. So if you see it in rapid succession there, all you see is that a whole bunch of stars are appearing in the code. And unfortunately, C was not designed in the best of ways to make clear what star means in different contexts, but it's all related as follows. The fact that I have now changed a and b to be not ints, but ints stars, int pointers, if you will, means that when main calls swap, it is by design of the C language, going to pass in the address of x and the address of y. So that's what the star means. Give me the address of an int and the address of an int, not actual ints. Now, down here, the star unfortunately means something slightly different, but related in spirit. Int temp just gives me an integer, an int variable called temp. Star a, in this context, without the word int in front of it again, means go to that location. Follow the treasure map, so to speak. Go to the address that is in a. So for instance, if the address of a is say, 33 Oxford Street, Cambridge, Massachusetts. That happens to be the address of the CS building at Harvard. Star a means go to 33 Oxford Street in Cambridge, Massachusetts. The star just means go to that particular address. So what does that mean, then, down here, when I say star a gets star b? That means go to the address in b and get its value, and store it at whenever a is pointing at to. So go to a and wait for me for a value. Go to b, get a value, and put that value at the location in a. And then lastly, this just means go to the location in b, go to whatever building that is, so to speak, and put the value that is in temp inside of that building. So a pointer is just an address. These stars just mean pointers are involved. Give me the address of an int, give me the address of an int. And again, confusing, admittedly, the star in this context where we don't have the word int in front of it again, on the side of the equal sign, just means go to that address. Go to that building. Go to that other building and put something there. So we can now fix our swap program correctly. We can now open up, as I will here, swap dot c, which I wrote in advance. That looks almost the same, except that I've changed the swap function as follows. a is now int*, b is now int*, and I also borrowed the stars inside of the function, as well. But something's gotta change. There's one more line of code I need to change for all of this to work. What is that? What line needs to change? Well who cares about swap? It's main that was calling this thing in the first place, so let's go back to the original story. Main, here, declares x and y as 1 and 2, does some printfs here, as before. But notice this line has to change. So one more piece of syntax today. And we're running out of new symbols. We've seen most of C already. &x and &y means get me the address of x, and get me the address of y, and pass those in instead. So x,y would just mean pass in a copy of x and a copy of y, or the values thereof. &x &y means give me a little, you know, map with the address of x and a little map with the address of y, so that swap-- who's receiving those maps-- can go there. So what does this mean in pictorial form? If we now go back to the beginning of this story, where we were looking at my computer's memory as this big rectangular region like this. With main's chunk of memory at the bottom here. And inside of main was two variables, like x, and another variable y. And inside of those were the numbers 1 and 2. And then I called swap. And so swap gets its own frame on the stack, so to speak. This is swap's frame. It, too, had a variable called a and a variable called b. But what goes in there now? It's not 1 and 2. We need to know a little something more about my computer's memory. And I don't know where everything's laid out, but let me just arbitrarily assume that, you know, it's inside of my computer's memory. Maybe this is byte number 90. This is going to be 91. This here is going to be 92, 93, 94, 95, and so forth. I just need to know that there's some kind of numbering scheme there. So what goes inside of a is 91. What goes inside of b is 92. And not the values 1 and 2, but rather the addresses of those values 1 and 2. Because now my code for the swap function, consider what it does. It says, upon receiving the address of an integer, called a, upon receiving the address of another integer, called b, go there and store that value in temp. Go to the address in b and store that value at the address in a. Store the value in temp at the address in b. So let's see what happens then. So first of all, I need another variable here, called temp. Temp, meanwhile, is not a pointer. It's just an integer, but what does it store? Well, according to my code, temp gets the value of going to a, going to the address in a. So what is a? a is 99. That's like a treasure map leading to, OK, this chunk of memory down here in my computer. And what value is there once I've gone there? Once I've gone to the CS building inside of it, I see the number 1, and so I put the number 1 in temp. Meanwhile, my second line of code says go to the address in b and grab its value, and put it at the address that's in a. So what does that mean? Well, star b means start here and go to 92. So it's like an arrow-- kind of like chutes and ladders, if you know the game-- like go to address 92. What value is there? The number 2. And the other half of the equation, on the left, said, go to the address in a, which is here, and copy the 2 into that location. And then the last line-- only one more line-- is this, get the value in temp-- that's easy-- and put it at the address in b. So we have to go to b and put temp in there. So to do that, here's temp, it's the number 1. I have to go to the address in b. The address and b is 92. So let's go there, and aha, let me go ahead then, and overwrite the value that's there with the number 1. So now this frame of memory on the stack-- the 91, the 92, and the temporary variable-- they are, by design of my new function, disposable. I really don't care, after swap returns, if those things continue-- I did care a little bit about that. I don't care if those things continue to exist. All I care about is that x and y continue to exist. So in this way is the new and improved version of the swap function actually having a permanent impact on my data? And with the frame, the memory still looks like that, because it's gone to the address in a. Gone to the address in b, which leads it to the original x and y. And so by way of pointers, by way of these addresses, do we have the ability to actually go much, much deeper into a program and actually get at values that previously we had no way of even expressing. So it's at this point in the story where I usually admit that, at least for me, this has been among the most challenging topics when I, myself, was a student. And in fact, all these years later-- it's like, 20, 20 year-- yeah, I think we're up to 20 years ago. 20 years ago-- I didn't take this photo then-- but I sat in what was, at the time, the back right hand corner of Elliot House's dining hall, here at Harvard. And I sat down with my teaching fellow, who of all the TFs I had as an undergrad, still remember to this day, [? Nishat ?] [? Meda ?]. And we just reconnected on Facebook, all these years later. Very exciting. And it was he who wonderfully sat down with me at office hours one day in the dining hall, trying to help me understand pointers, because it was just so much more technical than all the other stuff. Like, there is no puzzle piece in Scratch for the address of something that leads you somewhere so powerfully as these stars seem to be able to, here. And this is only to say that this is among those topics that might take a little bit of time to sink in, but it does. And when it does, it really is that proverbial light bulb that goes off. And for me, that light bulb went off right then and there. Now, what more can we do with these things, after that motivational speech? Pointer arithmetic. So, sort of complicated sounding topic, but really, it just goes back to first principles, as to what a pointer actually is. And it allows us now to do things like this. Let me go ahead and open up one other program that I wrote in advance here, called pointers dot c. And take a look at what this thing does here. It works a little differently from the syntax we're used to, and from any of our crypto problems thus far. So notice on this first line here, I get string and I store in s. No more string right now, just char star. We can be real and talk about it as the address of a char. A little sanity check, is s equal equal to null? If so, just return. Something went wrong, so let's not deal with it now. Down here, a for loop. For i gets 0 all the way up to n. So this is just a standard syntax we've used a few times now, even back in week 1 when we just wanted iterate over. Or in week 2, when we wanted iterate over the characters in a string. But we've never seen this kind of craziness before. A star, and then some arithmetic in parentheses. In the past, when we wanted to print out a character, as implied by %c here, we quite simply, as I recall, did this. Which was nice and intuitive, right? The square brackets denote to treat the string as though it's an array, which it really is, an array of characters. And that means get the i-th character of s. But now that we understand what s is, we don't need to use this syntactic sugar, as it's called. Any time a language has a feature that's convenient to use, and easier to read sometimes, but isn't fundamentally necessary to express yourself, it's often called syntactic sugar, which means it's just kind of a nicety to have. And indeed, that square bracket notation is just sugar for this more arcane, but perhaps more well-defined syntax now. The star operator in this context is the dereference operator, technically. It's the go there operator, as I've been describing it. Go to some address. Well s, recall, is a string. But there is no string. Strings are just the addresses of characters now. The first in a string. So initially in this loop, what am I doing? s is the address of a string, the address of its first character. And I'm saying, add to s, the value i. Well, i is just this variable in my for loop that's initialized to 0. So s plus 0 is obviously just s. s is the address of a char. *s means go to s. What do you find when you get there? A character, because s is a char star, the address of a character. And so printing out %c *s effectively means, go print that character right there. On the second iteration of the loop, i is, of course, 1. So s plus 1 is 1 byte farther from the beginning of the string. And the star means go to that character and print it with %c. One more iteration, i is now 2. s plus 2 is 2 bytes away from the start of the string. Go there and print that character in the string. And so forth. And do this up until the length of the string. Now this is perfectly correct. And if you really kind of want to look cool with your code you can use pointer arithmetic in this way, because all it is is just expressing more precisely what is going on underneath the hood. But it's a little more cryptic. It's certainly a couple more characters. But it is functionally equivalent to what we've been doing for weeks, which has been, again, just s [i]. So whereas some of today's ideas are admittedly new-- allocating memory and actually looking underneath the hood at what a string is-- we're not really getting newfound capabilities that we didn't already have when it comes to manipulating existing strings. So this is pointer arithmetic, so to speak, insofar as we are doing arithmetic with pointers. Math with pointers. All right, let's take a look now at where things can go wrong. So this is a program written by our friend Nick Parlante at Stanford, inspired by Binky, who will return in just a moment. And it's fundamentally broken. This code is incorrect. It also doesn't really do anything useful. But it's meant to be demonstrative of things that can go wrong. So at the top of this program, notice we are declaring two variables, x and y. But today those variables are not ints, as they might have been in weeks past, they are int*s, the addresses ints. They're not being initialized yet to any value, so that's fine. So really, this is just giving us, if you will, like, two boxes on the screen. So x at this point in the story looks like this, y at this point in the story looks like this. I have no idea what's inside of them. They have garbage value, so to speak. Because if we didn't assign them a value, the computer is not going to do it for us, so they might just have remnants of some past usage of that memory. So we don't know what's inside of them. But that's OK, because on this next line I call malloc and actually allocate enough memory for one int. Now this is kind of a silly use. It's not the best way to give yourself an int. We've seen for weeks now how you get an int. You say, like, int x, or int y, or int z, and the computer gives you an int. But if we want to use malloc in the simplest way possible, we can just say, hey, malloc, give me enough space for an int. And recall from the past that an int here is generally 4 bytes. So give me 4 bytes of memory, specifically the address of the first of those bytes. That's all malloc is doing, and it's storing that address in x. For good measure I should check for null, but we're not, in this case, per Binky. So what's the next line do? This next line means go to the address in x and put the number 42 there. That seems OK. Because assuming malloc returns the address of a chunk of memory, the first address of 4 bytes, we can go there and put the number 42 in binary, in 4 bytes worth of bits. But what about this line? I've flagged it in red because this program probably is going to go no further. In fact, something very, very bad is about to happen. Why? Well, what is the value in y? Well, originally x was a garbage value until I called malloc and asked malloc, hey, malloc, give me enough space for an int. So I'll draw it as a box here. Give me the address of that chunk of memory. So this question mark is really now an arrow to that chunk of memory. And then I said, hey, computer, go ahead and go there and put the value 42. But then my next line of code said, hey, computer, go to the value in y and put the unlucky number 13 there. So that's like saying, go-- I don't know where to go, because this has no value. And so something very bad is going to happen. Because the question mark implies, this is a garbage value. Maybe it's 0, maybe it's 1,000, maybe it's some number in between or bigger. It's some garbage value, which means if I just go there, who knows where I'm going to end up? But odds are I'm going to end up somewhere I shouldn't, because I should not be touching memory that is not my own. And indeed, thanks to Binky, we're about to see that bad things indeed happen when you touch memory that you shouldn't. Let's take a look. [VIDEO PLAYBACK] -Hey, Binky, wake up. It's time for pointer fun. -What's that? Learn about pointers? Oh, goody! -Well, to get started, I guess we're going to need a couple pointers. -OK. This code allocates two pointers which can point to integers. -OK, well, I see the two pointers, but they don't seem to be pointing to anything. -That's right. Initially, pointers don't point to anything. The things they point to are called pointees, and setting them up's a separate step. -Oh, right, right, I knew that. The pointees are separate. Er, so how do you allocate a pointee? -OK, well, this code allocates a new integer pointee, and this part sets x to point to it. -Hey, that looks better. So make it do something. -OK, I'll dereference the pointer x to store the number 42 into its pointee. For this trick I'll need my magic wand of dereferencing. -Your magic wand of dereferencing? Uh, that's great. -This is what the code looks like. I'll just set up the number and-- [POP] -Hey, look, there it goes. So doing a dereference on x follows the arrow to access its pointee. In this case, to store 42 in there. Hey, try using it to store the number 13 through the other pointer, y. -OK, I'll just go over here to y and get the number 13 set up, and then take the wand of dereferencing and just-- [BUZZER] Oh! -Oh, hey, that didn't work. Say, Binky, I don't think the dereferencing y is a good idea, because, you know, setting up the pointee is a separate step, and I don't think we ever did it. -Oh, good point. -Yeah, we allocated the pointer y, but we never set it to point to a pointee. -Hm, very observant. -Hey, you're looking good there, Binky. Can you fix it so that y points to the same pointee as x? -Sure. I'll use my magic wand of pointer assignment. -Is that going to be a problem, like before? -No, this doesn't touch the pointees. It just changes one pointer to point to the same thing as another. -Oh, I see. Now y points to the same place as x. So wait, now y is fixed. It has a pointee, so you can try the wand of dereferencing again to send the 13 over. -Oh, OK, here goes, -Hey, look at that. Now dereferencing works on y. And because the pointers are sharing that one pointee, they both see the 13. -Yeah, sharing, whatever. So are we going to switch places now? -Oh look, we're out of time. -But-- -Just remember the three pointer rules. Number one, the basic structure is that you have a pointer and it points over to a pointee. But the pointer and pointee are separate, and the common error is to set up a pointer, but to forget to give it a pointee. Number two, pointer dereferencing starts at the pointer and follows its arrow over to access its pointee. As we all know, this only works if there is a pointee, which kind of gets back to rule number one. Number three, pointer assignment takes one pointer and changes it to point to the same pointee as another pointer. So after the assignment the two pointers will point to the same pointee. Sometimes that's called sharing. And that's all there is to it, really. Bye bye, now. [END PLAYBACK] DAVID J. MALAN: So what else can go wrong now that we have the ability to touch, correctly or incorrectly, any memory that we actually have access to inside of our program? Well, memory leaks are one such problem. Now that you have the capability to ask the operating system for memory via the malloc function, you have the ability to accidentally not give that memory back. In fact, a very common mistake in some programming languages is to ask the operating system for a chunk of memory, use it, and then never actually free it. To give it back so that other parts of your program, or other programs on the computer, can actually make use of that memory. But fortunately there exist tools via which we can detect this, and one of them is called Valgrind. So this is another debugging tool that you'll now be able to use once you start dynamically allocating memory in your program. So up until now you probably have not used malloc, and therefore you have not likely actually asked the operating system for more memory in this very dynamic way. Instead, you have just declared an integer or an array, and asked the computer for memory in other ways, not using malloc. But suppose that we have a program like this. In memory dot c I've made some mistakes. Deliberately, but mistakes nonetheless. And indeed, this draws inspiration from the documentation for this very tool to highlight a couple of its most useful features. So let's look at this program here. There's no prototype for this function f, just because the example that Valgrind, this tool, gives just puts it up top, and that's fine. But let's focus on main first. Main takes no command line arguments, per it's mention of void, and it returns an int as usual. The function f gets called here. So f open bracket, close bracket just means call the function f, but pass it no inputs, and then eventually return 0. So let's look at f. What happens? Well, f takes no inputs, produces no outputs. It only apparently has these two lines of code. And let's consider what it does. It calls malloc, memory allocate, 10 times the size of int. So that just means, hey, malloc, give me enough space for 10 integers. Now we know on CS50 IDE that the size of an int recall is 4 bytes, or 32 bits. So this is like saying, hey, computer, give me 40 bytes total. 4 times 10. Malloc recall returns the address of that chunk of memory, and it stores it in x, which, according to the star, is the address of an int. So that's all. So borrowing some of the same ideas from the Binky example. What is bad about this is this next line. Why is x[10]=0 a problem? Well, the size of this chunk of memory is what, 10 ints worth. So 10 times 4, 40 bytes. But remember how chunks of memory are indexed when you use square bracket notation. When you use square bracket notation, you better start counting at 0, which means if we have 10 ints, first one is 0, the last one is 9, the 10th one does not exist. And so if I'm saying xx[10] gets any value, that's like saying go there and put the value 0 there, but I don't have access to that chunk of memory. I did not ask the computer for that memory. I asked the computer for all of this other memory, only 10 of these bytes. So this feels problematic. Moreover, I have asked the operating system for memory via malloc, hey, give me 40 bytes. I never handed it back. Thereby introducing what we would call a memory leak. Now, as an aside, once your program quits, all of the memory it's allocated is actually automatically given back, but the problem is that, for long running programs, things like your browsers, or word processing programs, or any number-- Skype-- or any number of other programs that you might run locally on your Mac or PC, if those bigger programs have memory leaks, such that they keep asking Mac OS or Windows for memory and never actually remember to give it back, you might experience what you might have in the past of your Mac or PC really starting to slow down and kind of crawl. And that can be for any number of reasons, but one of them is if your program or programs inside of them have some form of memory leak. Not your fault. It's the programmer's fault. But memory leaks, nonetheless. So what is Valgrind good for here? Well, it turns out, that in Valgrind you can run the command like here. Valgrind. Leak check equals full. Dot slash. And then the name of your program. And so let me do that. Let me go ahead back into CS50 IDE and let me go ahead and make memory. Compiles OK, so it doesn't appear to be syntactically flawed. Dot slash memory. Nothing bad seems to happen. So that's all fine and good. And you might think that, OK my program works, it's time to submit, all is well, but not necessarily. Let me go ahead and run Valgrind. And let me actually go ahead-- let me go up here and not create a new file, but a new terminal. Just so that we have a much bigger window. And let me go ahead into this directory and run Valgrind, dot slash memory. Without any command line arguments besides that. And hit Enter. And the downside of Valgrind, frankly, is that it's just-- output is atrocious. Like it's just not easy to read. But you start to learn to notice patterns in it. So I've learned to notice when I make mistakes in programs that invalid right of size 4, not a good thing. That's indeed a problem of some sort. But what's nice about Valgrind is that it tells me that the source of this invalid right, whatever that means, starts in memory.c line 13 and really rears its head in line 8 of memory.c. Well, what are those lines? Let's go to line 13. Ah, that's my call to F. And line 8. Oh, that's my use of x bracket 10. So why is Valgrind telling me invalid right of size 4? Well, size 4 sounds familiar. That's the size of an int. Invalid just means bad. Right means change something, like mutate a value. So invalid right of size 4 maybe just means this. I am in an invalid way changing 4 bytes of memory by storing 0 from the right hand side into x bracket 10 on the left hand side because, again, that x bracket 10 is an 11th byte, rather, an 11th int, that I never ask the operating system for. I only asked it for 10, not 11. And so Valgrind is telling me, somewhat cryptically, that I've screwed up. On line 8 ultimately I've screwed up by touching 4 bytes of memory that I shouldn't have. Meanwhile, a little more worrisome is this. Address such and such as zero bytes after a block of size 4 allots, da-da da-da. Oh, not that. Let's focus on-- here we go. Uh oh. Leak summary. Definitely lost. Oh, my god. Like I have lost 40 bytes of the computer's memory. Now what does that mean? This simply means that Valgrind has noticed that I must have called malloc or some similar function, asked for memory, 40 bytes' worth, and never gave it back. Well, we haven't introduced the way of giving memory back so let me at least address that now. But let me take this advice. Rerun with leak check full to see details of leaked memory. So let me go ahead and do that. Let me go ahead and copy this so I can rerun Valgrind. Whoops. Valgrind. With that command line argument. Dot slash memory. And now I'll see a little more detail sometimes. Although, it appears in this case it's already pretty verbose. And so indeed I've still lost 40 bytes in one block. And up here again is a reiteration of that invalid right of size 4. So how do I fix this? Well, let me go back into my program. And you know what? The program's still pretty useless, but at least let me fix that mistake. And go ahead now and go at x bracket 9. Let me remake my program. Make memory. And let me in my bigger terminal window do Valgrind dot slash memory. And now, oh, damn it. Still have definitely lost 40 bytes in one blocks, but notice the output is much shorter now. I don't have a mistake that was higher up here before. So I fixed one problem. Let me go ahead and fix that other problem. After using this memory, let me go ahead and free x. So free is the opposite of malloc in this case. And if I go ahead and recompile make memory and then run it in my bigger terminal window, nice. All heap blocks were freed. No leaks are possible. Now what does that mean? Well, it turns out, when we look at our computer's memory, there is indeed that stack at the bottom that we talked about earlier. And the stack is where frames of memory go, slices of memory go, that are used when functions are being called. And they get layered on top of each other and get un-layered only once the function's return. The heap, meanwhile, is a chunk of memory above the stack, by design. At least as we've drawn it here. And in the heap you have just a pool of available memory that you can draw from at any point. And what's powerful about the heap is that, whereas the stack keeps growing up and then as functions return it grows back down thereby throwing away, effectively losing track of, any memory that we had on the stack, like x's and y's and a's and b's and temp variables still, the heap does not experience that kind of disposal of memory. If you ask for memory via malloc, malloc gives you a chunk of memory over here, or over here, or over here, up above the stack inside of the so-called heap. And even when your function returns, whether it's f, or swap, or something else, that use of memory persists permanently. Only once you call free do you give that chunk of memory back to the operating system and allow it to reuse that memory for something else. But unfortunately this picture kind of suggests a problem and bad design in some sense. Although, given a finite amount of memory, eventually something's got to bump up against something else if you want to keep asking for and using memory. Those arrows kind of suggest that the stack and the heap are really destined to collide with each other if what happens? Well, if the stack grows too large. In other words, if you keep calling function after function after function after function and never return. And maybe this is accidental. Maybe you're using recursion, per last time, and you accidentally don't have a base case so you just keep calling functions. So much so that you run out of stack space. Or maybe you call malloc so many times that the heap keeps growing down and down and hits the stack, meanwhile. Either of those problems can persist. And you might recognize some familiar terms. The first of those scenarios describe stack overflow. So if any of you have ever discovered the website, Stack Overflow, which is a wonderful place to go for advice and tips and tricks with programming, more generally. stackoverflow.com draws its name from exactly that programming risk and problem. Heap overflow is similar in spirit. It's just the opposite. When the heap grows down too far, as we've drawn it. And they're both examples, more generally, of what you might call buffer overflows. A buffer is a chunk of space that's typically finite in some form and you can eventually overflow it by trying to use more memory then you should have. And a buffer overflows typically actually relates to arrays and chunks of memory. And, in fact, a great example from Wikipedia is this one here. The code is a little cryptic too because it doesn't really do anything useful, but it's fundamentally flawed in the following way. Let's consider exactly what happens when you have a stack overflow. It turns out that this is a bad enough problem that you can actually lose control over your computer itself. It can be hacked into if an adversary, a bad guy, has the ability somehow to inject his or her own adversarial code, code that deletes your files, or encrypt your files, or does something malicious, into your program because of a memory related mistake you've made. Let's consider this example. Here's the main function. In this particular example, it takes command line arguments arg-c and arg-v and just apparently calls this function foo, passing in the second word that the user typed in. So the first command line argument that the user ran at the prompt. And it just blindly does this. It doesn't check arg-c, it doesn't check for null, it doesn't check anything. It just blindly passes in arg-v bracket 1. And that's generally bad practice. Right? If you're not error checking, bad things are likely going to happen eventually. Well, what does foo do? Foo takes a char star as inputs. Now, a week ago, we would have said string bar, but we don't need to hide that detail anymore. Bar is just the address of a character. It's a string. Foo returns nothing, but has two lines of code. First, it declares an array called c-- just because. This is kind of a contrived example online, but the pictures will tell it all-- of size 12. So this is saying, hey, computer. Give me an array of 12 characters. Or give me a chunk of memory of size 12 that I plan to put characters into. And then, mem copy, you might not have seen before, but it essentially does this it copies into this chunk of memory whatever is at that chunk of memory up to this many bytes. So mem copy, as the names suggests, copies memory from here to here a total number of this many bytes. So why is this worrisome? Well, if for whatever reason you have only allocated 12 bytes and the user has typed a word at the prompt when running your program that is more than 12 bytes, it would seem that the user is able to touch memory that you the programmer never intended. Why? Well, your use of mem copy is checking the length of something, but it's checking the length of bar. So if the user types in not 12 characters, but 20 characters, strleng of bar is going to be 20. The user's input is 20 characters long. So your code is saying copy 20 characters from bar into c. Unfortunately, c is only of size 12 and so that's 8 bytes you end up copying that you shouldn't. So if you were given a chunk of memory here that's only of size 8 and you're just 12, you're touching 8 more bytes to the right of it, so to speak, that don't actually belong to you. Now, at best, nothing's going to happen. Your computer, your program is not going to notice. Things are just going to hum along and all is going to seem fine. Much like my memory program a moment ago. Seemed fine, but Valgrind thought otherwise. In this case, really bad things can happen. Because if what the human adversary, the bad guy, has typed at the prompt is not just some word like foo, or bar, or David, or Zamyla, or any number of sort of innocuous strings, but is actually the ASCII equivalent, so the textual equivalent, of some malicious code-- so if he or she actually somehow typed at the prompt a pattern of characters that really represent a pattern of bits that do something bad, like delete all your files, or spam everyone in your address book, or any number of things. If they provide a string that's longer than 12 characters, the overflow is going to end up somewhere in memory. And where is that? Well, let's zoom in a little lower level to what your computer's memory looks like on the stack. So it's a little more detail. So let's absorb this for just a moment. Think of this now as the bottom of the stack, parent routines stack. So you can think of this as main, or foo, or some function on the stack. It turns out-- and I didn't mention this earlier-- that besides there being arguments or parameters on the stack and local variables on the stack, it turns out that the computer also uses the stack just to kind of store a reminder for itself of the address in memory of the function that called it. So if this picture represents foo, the function right now, who called foo? Main called foo and so what foo does is, on its own stack, it just kind of jots down a note to itself here in pink, where it says, return address, when I'm done, return to this address of main. The computer does that for itself. So when foo is done executing, previously, I just wipe the screen and remove foo. But how does the program know where to go back to after foo is done? It's because it put a little bread crumb for itself. The address of main, itself. Not the address of variables, the address of the function itself, which, recall, is in the text segment of your computer's memory. But never mind that for now. So now, let's look at what the stack frame for foo is actually like. Here is that array of size 12. And it's drawn as a rectangle. If we made it wider, we could just do 0, 1, 2, all the way through C 11, but the author of this graphic has just drawn things a little more square like. So this is the first byte in C and this is the last byte in C, 0 and 11, respectively. And then bar, recall, is the only argument to the foo function and that, as I did mention before, belongs on the stack as well. But here's the problem. And this is by design. It appears that whoever designed the stack, has it growing upwards, as in this picture. And that's how I described it earlier. So the stack itself grows up. Frames go up and up and up and up. But within a frame, notice what happens. Within a frame, if you've got a buffer, an array of memory, a chunk of memory, and you write to it, you start writing to that memory, by design, at the top left corner, so to speak, to the right, and then down, and then down. So top left to bottom right, so to speak. Pictorially. This was not necessarily a good thing, at least in this case. Because if the stack is growing up and this is the top of foo's frame and it's use of the array c is by design going down, that's all fine if the user only provides a string of length 12 or less, 12 or shorter, but in this case if the adversary types in a 20 character string or 200 character string, the computer is not just going to stop there. Unless you wrote code with an if condition to check the length that the user is trying to pass in and protect against that, it's going to overwrite all of these bytes, all of these bytes, whatever this is, all of these bytes, and this is what's worrisome. If you have a really smart adversary up against you, trying to hack into your system, and he or she is smart enough or lucky enough to figure out how to inject a pattern of bits into your program in this way, such that he or she overwrites this return address, a really clever adversary can trick your program into returning to, not the function that called foo, main, but returning to the address of the adversary's own function that he or she has somehow injected by way of typing input at the prompt, in this case. In other words, if the user types in just, hello, h-e-l-l-o, backslash, 0, all is fine and good and it fits perfectly in that frame. But because we didn't have good error checking on the length of the memory we're copying, suppose the adversary includes a whole bunch of adversarial code, which the author here abstracted away as a, a, a, adversary, or attack, attack, attack, what might actually happen here? Well, if that adversary gets lucky enough or is smart enough, he or she might be able to overwrite these 4 bytes here with the address, brilliantly, of his or her own attack code, which is the bits or the characters that he or she typed at the prompt. So again, a lot of this is luck. Or a lot of this is trial and error sometimes, to attack a program in this way, but if you can trick the computer into jumping to an address that happens to point at data you injected into the program, you have effectively taken over that user's program, tricking the program into running any and all code that in this case you provided by a arg-v. So if you've ever heard of a server getting hacked, or in the future you will hear of a server getting hacked, or a program getting compromised, could mean any number of things, but one of them, and an all too common approach, is that the programmer who wrote the software used a buffer, used an array, a chunk of memory, and he or she did not check the boundaries of that array sufficiently. Did not make sure that his or her own code was not going too far as was this particular example here. So now that we have this power of going anywhere we want in memory, it's incredibly easy to accidentally or maliciously use, but that's it. That boils down to just this basic understanding of what's going on underneath the hood and what you can now do with the computer's memory. And now, let's transition to the second of our real world demands. Now that we have the ability to talk about memory addresses and actually do things at this lower level, turns out, we can start solving some really interesting problems, among them related to images and forensics, the art and science of recovering information. In fact, you might recall from various TV shows or movies, that it's all too common for the good guys in the movies and the TV shows to be looking over the shoulder of a tech and seeing some footage of a burglary or some other crime that's committed and say, hey, can you clean that up? Or can you enhance that? And, indeed, let's take a look at one such clip from the real world and see how it relates to the actual world and how we can leverage some of this new found savvy with memory addresses and computing to actually solve some of these problems for real. [VIDEO PLAYBACK] -He's lying. -About what? -I don't know. -So what do we know? -That at 9:15 Ray Santoya was at the ATM. -OK, so the question is, what was he doing at 9:16? -Shooting the 9 mm at something. Maybe he saw the sniper. -Or was working with him. -Wait. Go back one. -What do you see? -Bring his face up. Full screen. -His glasses. -There's a reflection. -That's the Nuevita's baseball team. That's their logo. -And he's talking to whoever's wearing that jacket. [END PLAYBACK] DAVID J. MALAN: OK. So this is just nonsense. Like, you can't just clean up an image, or look at something that's very pixelated, so to speak, where pixels are dots on the screen, and just kind of zoom in and clean that up and enhance it. And, indeed, that's kind of this running joke. Enhance doesn't really generally mean enhance when you have a finite amount of information. For instance, here is a finite amount of information. A photograph of Zamyla that looks to be of very high quality. You can see lots of detail. And so you might think that we can maybe see the reflection of someone in her eye if we really just zoom in and enhance, but no. This is what you see, and it's actually kind of creepy at this distance. This is what you see when you zoom in on that exact same photo of Zamyla to find that glint in her eye of the bad guy, or the badge on his shoulder, or whatever it was. There is only a finite amount of information here. There is literally only a finite number of rows and columns and therefore really big pixels from which to glean information. Now, there do exist algorithms and software that can smooth this out. You can enhance the image in the sense that you can kind of maybe tweak the color so that it looks a little more gradual, the color changes, and a little less jarring, but you can't just put pixels there, put bits there that are not existent in the first place. And so, with this, invites the question well, then what is an image, how do you represent it, and what can you actually glean from it and do from it? Well, let's consider in the simplest case what an image might be. Here is an image of a wonderfully simple smiley face. It's black and white. And if it's black and white, frankly, we can get away with just ones and zeros. I might arbitrarily say that the number 1 shall represent the color white and the number 0 shall represent the color black and, as such, if I just have a bunch of patterns of 8 bits, ones and zeros, I can effectively think of them as a grid. Rows and columns. And if I see a 0 pixel, assume it's a black dot, if I see a 1 pixel, it's a white dot. And, as such, I can create a bitmapped image. Bitmap, it's like a map, x's and y's, of bits, bitmapped image. There is the simplest, perhaps, smiley face we can draw. Now it's not all that interesting. It would be a pretty useless photograph if you only had dots of that level of detail. And indeed we saw, when you really zoom in on Zumilah's eyes, you have really big grids. And those were colorful, but there isn't that much information. We need more resolution. We need more dots, more pixels, more bits. And so the general case is to use not something as simple as this, but a more standard format like JPEG. JPEG is what you see on Facebook and on your cameras, most likely. It's an image format that photographs are commonly stored in, because it stores millions of colors-- potentially for photographs-- and it also allows you to compress them. You can compress JPEGs in such a way that you throw away some of the 0's and 1's, thereby degrading the image. So it's a little blurrier, or a little splotchy, but at least it's much, much smaller and can be emailed or texted or stored with far less space involved. And it turns out, if we put on our forensics hats now-- it turns out that JPEGs all share something in common. Any of the JPEGs you see on the internet or have on your hard drive or on your phone start with the same three bytes, typically. Specifically, the first three bytes are the numbers 255, 216, 255. Why that? It's just the standard that was adopted some time ago for JPEGs. It's like a magic number, if you will, at the start of the image. Now it turns out these are decimal numbers. And frankly, in the world of computing, and really the world of file inputs and output-- file management-- we don't really talk in decimal. It's just not commonly done. More commonly done is to talk, not in decimal, not in binary as we did in week 0, but in hexadecimal. So it looks a little spooky at first, but it really is an alphabet of 16 characters instead of 10, instead of 2. And you start counting at 0, as the real world does-- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9-- but there is no 10, because these are single digits. So after 9 comes A, B, C, D, E. So with hexadecimal, you can effectively count from 0 to 15 so long as you borrow some letters of the alphabet to do 10, 11, 12, 13, 14, and 15. So let's map that onto this pattern of numbers that demarcates the start of a JPEG. These numbers, if we do them out in binary, equals this. And we can leave this as an exercise at home if you'd like, but trust me, per week 0, that these are the patterns of bits that are equivalent to 255, 216, and 255. Those are the light bulbs you would need to turn on and off. Meanwhile, we can add some space there-- so before, after-- and I did this deliberately. Because it turns out that hexadecimal, insofar as you can count as high as 15 for 16 total characters-- it turns out that hexadecimal is really useful. Because with four bits, four 0's and 1's, you can count from zero all the way up to what? All the way up to 15, which is perfect, because if you have four 1 bits, that's 15. If you have four 0 bits, that's 0. And it turns out that patterns of 4 bits-- so half of a byte-- map perfectly to hexadecimal characters. You can express chunks of 4 bits with hexadecimal characters perfectly. As such, it turns out that 15, again, is F. So we have FF. And it turns out that 1101 and 1000 from 216. If you take those 8 bits and just spread them out slightly into two groups of four, is the letter D in hexadecimal and the number 8 in hexadecimal. And then we have another couple of 15's-- two sets of four 1 bits. So that's FF. Which is to say that the first three bytes at the start of any JPEG happen to be FF D8 FF. And it's human convention to prefix hexadecimal digits for clarity in the real world with just 0x. It means nothing fundamentally other than, hey world, here comes a hexadecimal number, so that the world just knows what it's looking at and doesn't confuse it for some other numeric or base system. So 0x just means here comes a series of hexadecimal digits. So that is what is at the beginning of a JPEG. And in fact, one of the problems we put before you in CS50 each year is to recover some JPEGs. In fact, it's all too common, unfortunately, to lose photographs from a memory card or to accidentally delete files from your computer. And typically, you might freak out or be worried that, damn, I didn't mean to delete those photos or delete those files. And sometimes they are gone for good. But it turns out that computers and even phones don't necessarily delete photos right away. They forget where they are, but they don't outright delete them. They don't change the 0's to 1's and the 1's to 0's unless you start taking more and more photos immediately. But if you have accidentally formatted or corrupted your memory card in a phone or a computer, such that you think you've now lost your photos, what if you did this? What if you wrote software that iterated from the start of your memory card or your phone all the way to the end of your phone or your memory card's memory looking for this pattern of bits, this pattern of numbers or hexadecimal digits, this pattern of 24 bits-- 8 plus 8 plus 8? With high probability, if, upon iterating through your phone's memory or camera's memory, you see FF D8 and FF, you can perhaps infer that the following megabytes-- 2 megabytes, however big the photograph is-- is the entirety of a photo. And if you just grab those bits and save a copy, maybe you can recover all or some of the photo. And indeed, one of the challenges you'll soon see in our problems that focus on forensics is going to be exactly this. We will accidentally format a memory card from a camera or a phone. We will give you a forensic image of that memory card, so to speak, a perfect copy of all of the 0's and 1's therein, give it to you as a file, and challenge you with writing software that iterates over that file's bytes, from the 0th byte all the way to the end of the file, looking for this pattern. And any time you see this pattern, you will need to say, this is probably a JPEG. Let me start copying these bytes into a separate file name-- something.jpg. And indeed, if you get the code right, suddenly out of this forensic image pops any number of photographs that we thought were lost forever. Bitmap is another type of image file-- older, perhaps, than JPEG and simpler-- truly a bitmapped image with slightly less fancy algorithms for compression. But if you remember this world, this is perhaps a perfect example of a bitmap image from Windows XP back in the day. The wallpaper or desktop image that many of you might have had on your computers looked like this. As an aside, if you go poking around I think on Wikipedia, someone wonderfully went to this same spot, found it where the artist had taken this photograph for Microsoft, and it now looks like this. So it's not quite the beautiful place it once was, but that, indeed, is the same hill. And this is really just an opportunity to talk about this picture, which is what's really underneath the hood of that beautiful grassy meadow. It's this kind of file header. We very quickly can make nature seem very boring quickly with this. But this is a very complicated, convoluted looking way of just formalizing what we mean by a file. Like, what is a file? A file is just a bunch of bits, 0's and 1's, stored somewhere on your laptop, or desktop, or phone, or wherever. But what does it mean to be a Microsoft Word file versus a JPEG versus a bitmap versus an Excel file versus any number of other file formats that you might have on your computer? MPEG4, or MP3, or AAC, or any number of media formats as well-- well really, what it means to be a file, is to be filled with 0's and 1's-- that's true-- but to follow a certain pattern of 0's and 1's such that the first several bits or first many bits of a file typically follow a pattern. And indeed, JPEGs have a pretty simple pattern. Those first three bytes, those first 24 bits, follow a certain pattern of 255, 216, 255. And that demarcates the start of a JPEG. And there's some more complexity therein, too. In a bitmap, this looks overwhelming at first glance, but this is how the world decided to standardize what's inside of the first few bytes of a bitmap file like that grassy meadow from Windows XP. And we'll come back to this in more detail and the problem set before long. But you'll notice some identifiers here that might jump out at you. So width, height, size, compression-- so there are some key words in there that might make some intuitive sense. And indeed, somewhere inside of a bitmap image, like that grassy meadow, is information stored, like what is the width of that image? What is the height of that image? Is it compressed, and how? And then more interestingly, at the bottom of this picture is the beginning of what bitmap calls RGB triples-- Red, Green, Blue triples. It turns out with patterns of three bytes, you can represent, top to bottom, left to right, essentially, the color of every one of the dots in an image. So when we enhanced Zamyla's image, and you literally saw those really big dots or pixels. Each of those dots or pixels has 24 bits representing its color-- how much red, how much green, how much blue per our discussion of colors in week 0. And so that's what most of a bitmap file is here. The dot dot dot just means these can go on for quite some time. What color is this dot? What color is this dot? What color is this dot? All of that is standardized inside of the file. And it turns out in C, we can express and represent that kind of structure using one last keyword for this week. In C, there is also a keyword called struct-- structure-- that allows you to create an actual structure inside of which you store information. For instance, the programming language C does not come with a data type that represents students. It comes with ints, and chars, and floats-- heck, it doesn't even come with string. It certainly doesn't come with student. But I would generally think of a student as being the combination of a few things. And let's keep it simple. A student is a combination of the student's name and their dorm. And bunches of other detail, but for now, let's just focus on name and dorm. Now the syntax is a little new for us here, but all this is saying is, hey, C, define a type for me that is a structure containing two fields inside of it, name and dorm, and call this new type students. So it's a little more complicated than our typedef for a string, because a string was just a synonym for char star. This case, a student, is a synonym for this container of things. It's like encapsulating two pieces of data-- name and dorm. But this is really useful as follows. With this kind of C code, can we now do things like this? Let me go ahead and open up structs.h. And you'll see a little header file I whipped up here that contains very little information, just that same typedef for struct, but I've also included CS50.h in my own header file. Why? Well, just for today's purposes, I wanted to relax our discussion of char star for a moment and talk about string. But technically, I could just do this. And technically I could do this. And then I could get rid of that, because now this is true C code. There is no need for the CS50 library. But either is equivalent, and whatever you're more comfortable with for now is certainly fine. But let me now open up structs-0.c and do something with this. So in this program, notice that the top of this file, I first define a value called students to be 3. This is an example of a constant. Sharp define or pound define here says, define, not a variable, but a constant-- and the convention in programming is to use all capital letters-- define a constant equal to some value. This is not a variable whose value can change. This is just a keyword, STUDENTS, in all caps, that is fixed now on that value 3. And it's one way to define a constant in C, a value that does not change. What am I now going to do with this value? I first, here, declare a variable called students containing three students. So it's a little confusing that we've got students, STUDENTS, student, but they all mean different things. students is just going by the name of my array. STUDENTS in all caps is the value 3 from that constant up above. And then student is the type of this array per the header file, structs.h. So this is just really a succinct way of saying, hey, computer, give me an array of size 3 inside of which are three students, ultimately. Now here's an array that iterates from i up to three. I print out name colon, prompting the user with getstring for a name. And then notice this new operator dot. Because struct is being used, and students bracket i is a structure, not an int, or a float, or a string itself, we have to go inside of that structure to get its field called name. So dot name says, here's the structure student-- go inside, and get at the name field, and put at that name field whatever the return value of getstring is. And do the same thing for dorm. Go into the i-th student's structure, and store at the dorm field the string that has been typed in next. Now let's scroll down a little lower and take a look at what this program proceeds to do. It's a very simple loop that sort of, just for demonstration's sake, says so-and-so is in such-and-such, printing out the student's name and dorm respectively. In other words, if I go down here and compile structs-0, and then run structs-0-- I'm going to go ahead and type in David Mather and Rob and Kirkland and Zumilah and Courier, and you see literally a regurgitation of what I wrote. But what's key here is the syntax that I've introduced. I'm using a struct, and I'm accessing its properties or fields within by way of this dot operator. Just means go inside the struct and get that value. But more cool than that is that now that we understand files, and now that we have the ability to store arbitrary structures that did not come with C, so to speak, notice what else we can do. In this version of the program, I'm adding one final feature, combining today's discussion of pointers with this most recent discussion of files. It turns out, in C-- and it's a little inconsistent-- there is a data type called FILE in all capital letters, for whatever reason. In lower case here, I'm just saying hey, computer, give me a variable called file of type file star. That is, give me a variable that will store a pointer to a file, the address of a file, if you will. fopen is a new function that we've not used before that says, hey, browser, open up the file called students.csv in write mode. w means write mode. So I want to save a file, not open and read a file, but write and save a file. Sanity check-- is file not equal to null? If so, we're going to proceed. I just want to make sure nothing went wrong. And something might go wrong, if maybe I don't have permission on the computer to create files, maybe the computer's out of space or something-- something could go wrong and null is returned per the documentation for fopen or its man page. So here I have a loop. For i from 0 to 3-- so do this 3 times-- fprintf. So fprintf is identical to printf that you've come to get to know in recent weeks, but fprintf is file printf. And it allows you to print strings, not to the screen, but to a file. So its first argument is the name of the file that you've opened that you want to print to. Second argument-- and all the others-- are just the same as printf as always-- a string, maybe with some format codes, and then some values that you want to plug in here and here. So if you've ever seen a CSV file in the real world-- a Comma Separated Values, these are very simple Microsoft Excel files or Apple Numbers files or Google Spreadsheets can export these as well-- they are just text files whereby all of the fields are separated by commas. And so we now, today, have the ability not only to understand what's going on underneath the hood memory-wise, but we now have the ability to use the star operator-- and therefore pointers more generally, which are required in order to manipulate files with fopen and this file data type-- and with fprintf can I write information? Can I print information to those files? So for the first time ever, I have written a program that, when I run in just a moment, is going to have the ability to save information. I have effectively implemented the equivalent of the File Save menu for the first time. Every other program we've written so far just throws its information and its memory away, but not this time. Let me go ahead and do make structs-1, and then dot slash structs-1, and let me go ahead and type in David and Mather and Rob and Kirkland and Zumilah and Courier. Enter. Nothing seems to happen. But let me go to my file browser in the IDE, and notice this-- students.csv. If I open that, I have created this file-- David comma Mather, Rob comma Kirkland, Zamyla comma Courier. And if I downloaded this file, double-clicked it as you can now if you'd like, and open it on your Mac or PC, if you have Numbers installed or Excel installed or some comparable program. That should open that program, because it will recognize the .csv extension. And it will display all of those names and all of those dorms or houses in separate columns. Because the world decided some time ago that the format known as CSV, Comma Separated Values, is just simple text file with commas separating values. And so we now have the ability to express all of this and more. So on the horizon is to solve some of these same kinds of problems and to actually implement for yourself code that writes files and reads files. But to solve actual problems motivated from the real world domain, to recover information that's been deliberately scrambled or obscured, to recover information that's been accidentally or maliciously deleted, to recover things, photographs, memories that you actually care about-- all with a fundamental understanding of what's going on inside of your own computer. But until then, keep an eye out for such TV shows and movies as these. [VIDEO PLAYBACK] -OK. Now let's get a good look at you. -Hold it. Run that back. -Wait a minute. Go right. -There. Freeze that. -Full screen. -OK, freeze that. -Tighten up on that, will you? -Vector in on that guy by the back of you. -Zoom in right here on this spot. -With the right equipment, the image can be enlarged and sharpened. -What's that? -It's an enhancement program. -Can you clear that up any? -I don't know. Let's enhance it. -Enhance section A6. -I enhanced the detail and-- -I think there's enough to enhance. Release it to my screen. -Enhance the reflection in her eye. -Let's run this through video enhancement. -Edgar, can you enhance this? -Hang on. -I've been working on this reflection. -Someone's reflection. -Reflection. -There's a reflection of the man's face. -The reflection. -There's a reflection. -Zoom in on the mirror. -You can see a reflection. -Can you enhance the image from here? -Can you enhance it right here? -Can you enhance it? -Can you enhance it? -Can we enhance this? -Can you enhance it? -Hold on a second. I'll enhance. -Zoom in on the door. -Times 10. -Zoom. -Move in. More. -Wait, stop. -Stop. -Pause it. -Rotate it 75 degrees around the vertical, please. -Stop. And go back to the part above the doors again. -Got an image enhancer that can bitmap. -Maybe we can use the Pradeep Sen method see into the windows. -This software is state of the art. -The eigenvalue is off. -With the right combination of algorithms-- -He's taken illumination algorithms to the next level, and I can use them to enhance this photograph. -Lock on and enlarge the z-axis. -Enhance. -Enhance. -Enhance. -Freeze and enhance. [END PLAYBACK] DAVID J. MALAN: All right. That's it for CS50. We will see you next time. [VIDEO PLAYBACK] -Everyone knows you went to Yale, but nobody knows what happened. What can you tell me about that weekend? What can you tell me about Rosebud? [DRAMATIC CHORD] [DRAMATIC MUSIC PLAYING] [END PLAYBACK]
B1 memory string address program swap file CS50 2016 - Week 4 - Memory 187 29 Amy.Lin posted on 2017/01/30 More Share Save Report Video vocabulary