Subtitles section Play video Print subtitles And in fact, we can visualize this even a little metaphorically. So for instance, here is, for instance, a mailbox. And suppose that this is address 123. What is in address 123? Well it's a variable of type int, called n, looks like it's storing the number 50. Right? We saw these letters-- these numbers last week. So here's the number 50, which is an integer inside of this variable, today, represented as a mailbox instead of as a locker. Well suppose that this mailbox over here is not n but suppose this is p. And it happens to be an address 456. But who really cares? If this variable p is a pointer to an integer, namely that one over there, when I open this door, what am I going to find? Well I'm hoping I find the equivalent of-- we picked these up at the Coop earlier --the equivalent of a conceptual pointer saying the number n is over there. But what specifically, at a lower level, is actually inside this mailbox if that variable n is at location 0x123? What's probably inside this mailbox? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Yeah, the address, indeed, 123. So it's sort of like a treasure map if you will. Oh, I have to go to 123 to get this value. Oh, the integer in question is indeed 50. And that's the fundamental difference. This is the int that happens to be inside of this variable of type int. This is the address that's a pointer that's in this other variable, p, but that is conceptually, simply pointing from one variable to another, thereby giving any sort of conceptual breadcrumbs. And we'll see-- frankly, in one week --how amazingly powerful it is. When you can have one piece of memory pointing at another, pointing at another, pointing at another, you can start to construct very sophisticated data structures, as they're called, things like family trees, and lists, and other data structures that you might have heard of. Or even if you haven't, these will be the underpinnings next week of all of today's fanciest algorithms used by, certainly the Googles, and the Facebooks, and the Microsofts of the world to manage large data sets. That's where we're going next week, in terms of application. So questions about that representation? Yeah, in the middle. AUDIENCE: Does that mean that your memory has to be twice as big? DAVID J. MALAN: Sorry can you say it once more? AUDIENCE: Is that to say your memory has to be twice as big to store pointers? DAVID J. MALAN: Ah, really good question. Is it the case that your pointers need to be twice as big? Not necessarily, just, this is the way life is these days. On most modern Macs and PCs, pointers use 64 bits-- the equivalent of a long, if you recall that brief discussion in Week 1. So I deliberately drew my pointer on the screen here as taking up 8 bytes or 64 bits. I've deliberately drawn my integer n as taking up 4 bytes or 32 bits. That is convention these days on modern hardware. But it's not necessarily the case. Frankly, I could not find a bigger mailbox at Home Depot, so we went with two identical different colored ones. So metaphor is imperfect. All right. So moving from this to something more familiar now, if you will. Recall that we've been talking about strings for quite some time. And in fact, most of the interesting programs we've written thus far involve maybe input from the human and some form of text that you are then manipulating. But string we said in Week 1 is a bit of a white lie. I mean, it is the training wheels that I promised we would start taking off today. So let's consider what a string actually is now in this new context. So if we have a string like EMMA here, declared in a variable called s, and quote unquote, EMMA in all caps, as we've done a couple of times now. What does this actually look like inside of the computer? Well somewhere in my computer's memory there are four, nay, five bytes, storing E-M-M-A, and then additionally, that null terminating character that demarcates where the end of the string is. This is just eight individual 0 bits. So that's where EMMA might be represented in the computer's memory. But recall that the variable in question was s. That was my string. And so that's why over the past few weeks any time you want to manipulate a string, you use its name, like s. And you can access bracket 0, bracket 1, bracket 2, bracket 3, to get at the individual characters in that string like EMMA, E-M-M-A, respectively. But of course it's the case, especially per today's revelation, that really, all of those bytes have their own addresses. Right? We're not going to care after this week what those addresses are but they certainly exist. For instance, E might be at 0x123. M might be at 0x124-- 1 byte away --0x125, 0x126, 0x127. They're deliberately 1 byte away because remember a string is defined by characters back-to-back-to-back. So let's say for the sake of discussion that EMMA name in memory happens to start at 0x123. Well, what then really is that variable s? Well, I dare say that s is really just a pointer. Right? It can be a variable, depicted here just as before, called s. And it stores the value 0x123. Why? That's where Emma's name begins. But of course, we don't really have to care about this level of precision, the actual numbers. Let's just draw it as a picture. s is, if you will, a pointer to Emma's actual name in memory, which might be down over here. It might be over here. It might be over here, depending on where in the computer's memory it ended up by chance. But this arrow just suggests that s is pointing to Emma, specifically at the first letter in her name. But that's sufficient though, right? Because how-- if s stores the beginning of Emma's name, 0x123. And that's indeed where the E is but we just draw this pictorially with an arrow. How does the computer know where Emma's name ends if all it's technically remembering is the beginning? AUDIENCE: The null terminating character. DAVID J. MALAN: The null terminating character. And we stipulated a couple of weeks ago that that is important. But now it's all the more important because it turns out that s, this thing we've been calling a string, has no familiarity with MMA or the null terminator. All s is pointing at technically, as of today, is the first letter in her name, which happens to be in this story at 0x123. But the computer is smart enough to know that if you just point it at the first letter in a string, it can figure out where the string ends by just looking-- as with a loop --for that null terminating character. So this is to say ultimately, that there is no such thing as string. And we'll see if this strikes a chord. There is no such thing as a string. This was a little white lie we began telling in Week 1 just so that we could get interesting, real work done, manipulating text. But what is string most likely implemented as would you say? AUDIENCE: An array of characters. DAVID J. MALAN: An array of characters, yes. But that was Week 1's definition. What technically now, as of today, must a string be? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Sorry, over here. AUDIENCE: A pointer. DAVID J. MALAN: A pointer. Right? s, the variable in which I was storing Emma's name would seem to manifest a pattern just like we saw with the numbers a moment ago, the number 50. s seems to be storing the address of the first character in that sequence of characters. And so indeed, it would seem to be a string. Well, how do we actually connect these dots? Well suppose that we have this line of code again where we had int n equals 50. And then we had this other line of code where we said, go ahead and create a variable called p and store in it the address of n. That's where we left off earlier. But it turns out that this thing here is our data type from Week 1. This thing here, int star, is a new data type as of today. The variable stores, not an int, but the address of an int. It turns out that something like this line of code, with Emma's name, is synonymous with char star. Right? If a star represents an address and char represents the type of address being pointed at, just as int star can let you point at a value like n-- which stored 50 --so could a char star-- by that same logic --allow you to store the address of and therefore point at a character. And of course, as you said, from Week 1, a string is just a sequence of characters. So a string would seem to be just the address of the first byte in the sequence of characters. And the last byte happens to be all 0s by convention, to help us find the end. So what then more technically is a string and what is the CS50 library that we're now going to start taking off as training wheels? Well last week we introduced you to the notion of typedef, where you can create your own customized data type that does not exist in C but does exist in your own program. And we introduced this keyword, typedef. We proposed last week that this was useful because you could actually declare a fancy structure that encapsulates multiple variables, like name and number, and then we called this data structure, last week, a person. That was the new data type we invented. Well it turns out you can use typedef in exactly the same way even more simply than we did last week by saying this. If you say typedef char star string-- typedef means give me a new data type, just for my own use. Char star means the type of value is going to be the address of a character. And the name I want to give to that data type is going to be string. And so literally, this line of code here, this is one of the lines of code in CS50 dot h-- the header file you've been including for several weeks, where we are creating a data type called string to make it a synonym for char star. So that if you will, it's an abstraction, a simplification on top of the idea of a sequence of characters being pointed at by an address. Any questions? And honestly, this is why-- and maybe those sort of blank stares --this is why we introduced strings in Week 1 as being an actual type as opposed to not existing at all. Because who really cares about addresses and pointers and all of that when all you want to do is like, print, hello world, or hello, so and so's name? Yeah, question. AUDIENCE: What other-- what other functions are created-- major functions are created by CS50 are not intrinsic to-- DAVID J. MALAN: Really good question. We'll come back to this later today. But other functions that are defined in the CS50 library that are training wheels that come off today are getString, getInt, getFloat, and the other get functions as well. But that's about it that we do for you. Other questions? Yeah. AUDIENCE: Can you define all of these words again? Like, it's-- so string is like a character pointer which points-- I was confused about that. Can you repeat that? DAVID J. MALAN: Sure. A string, per this definition, is a char star, as a programmer would say. What does that mean? A string is quite simply a variable that contains the address of a character. By our human convention, that character might be the beginning of a multi character sequence. But that's what we called strings in Week 1. So a string is just the address of a single character. And we leave it to human convention to know that the end of the string will just be demarcated by eight 0 bits, a.k.a. the null terminator. And this is the sense in which-- especially if you have some prior programming experience --that C is much more low level. In Python, as you'll soon see in a few weeks, everything just works so splendidly easily. If you want a string, you can have a string. You don't have to worry about any of these low level details. But that's because Python is built here, conceptually, where C is built down here-- so to speak --closer to the computer's memory. But there's no magic. If you want to string, fine. Just remember where it starts, remember where it ends. And boom, you're done. The star in the syntax today is just a way of expressing those ideas in code. So let's go ahead then and experiment with this string, just as we did a moment ago using Emma's name now instead of an int. So let me go ahead and erase those lines earlier. And let me go back to Week 1 style stuff, where I just say string s equals quote unquote, Emma. And then of course, if I to print this, I can simply say this as before. So just as a quick safety check, let me go ahead and make address again. Whoops. What did I do wrong? Let me scroll up to the first-- of many it seems --errors. Yeah. AUDIENCE: You're using string, [INAUDIBLE] DAVID J. MALAN: Yeah, I kind of shouldn't have taken off all the training wheels just yet. I'm still using string. So let me go ahead and put that back just for now. That will give me access to that typedef for string. Let me recompile it as make address. That worked. So that was the solution, thank you. And then address again. We just see Emma. So what can we now do that's a little bit different here? Well, one, you know what I can actually do? I can get rid of this-- the solution a moment ago --and say, I don't need string anymore. I don't need those training wheels. If s is going to represent a string, technically, s is just going to store the address of the first character. And it suffices actually, just to write this. So literally instead of string, you write char star. Technically, you don't need-- you can have extra space to the left or right. But most programmers write it just as I have here, char star variable name. That looks scarier now but it's no different from what we've been doing for weeks. If I now do make address without the CS50 library, still works, because C knows what I'm talking about. And if I run address now, I still see Emma. But now I can start to play around. Right? If s is the address of a character, what was the format code I can use to print an address? Not percent i, but-- AUDIENCE: Percent p. DAVID J. MALAN: Percent p, a pointer. So let me go ahead and recompile this now. Make address, that compiles too. And when I run dot slash address, I'm not going to see Emma now. What should I see instead? Some address, right? I have no idea what it is. It looks like Emma's name is stored at 0x42A9F2, whatever that number translates to decimal, somewhere in the computer's memory. But it turns out then too, what about this? Let me go ahead and add another line of code and say, you know what, I'm really curious now. What is the address of the first letter in Emma's name? How do I express in C, the first letter only of Emma's name if Emma is stored in s. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: s bracket zero, right? That would seem to be that. But that is what? That's a char. s bracket 0 is a char. How do I get the address of s bracket 0? AUDIENCE: Ampersand. DAVID J. MALAN: Yeah, I can just say ampersand. Right? So it's ugly looking but that's fine for now. Make address, enter. Whoops. It's uglier because I forgot my semicolon. Let me go ahead and make address again, enter. Seems to compile. And when I run dot slash address now, notice I get the same thing. And this is because C is taking me literally. When you print out s, a string, it's technically just the address of the first character. And indeed, I can corroborate as much by running s bracket zero then get the address of the first character. And they are indeed one in the same. So a string is this sort of abstraction on top of a bunch of characters. But again, s is just an address. And that's all we're emphasizing now. And if I get really curious-- not that you would necessarily do this in a real program --what if I print out a few more characters in Emma's name, like s bracket 1, 2, and 3? Let me go ahead, just out of curiosity and make this program and dot slash address. Now notice what I see, is again, s's address is at 42AB52. The first character in s is at the same thing, by definition of what a string is. And then notice what's kind of neat-- if this is-- if-- for some definition of neat --53, 54, 55 is noteworthy. Why? They're one byte apart. So this whole time, whenever you implemented Caesar, or substitution, or some other cipher in problem set two, anytime you were manipulating individual characters-- you didn't know it --but you were just visiting different mailboxes. You were just visiting different addresses in the computer's memory in order to manipulate them somehow. All right. Can I do one last demo that's a little arcane and then we'll make things more-- more real? All right. So it turns out if all that's going on underneath the hood is just addresses, watch what I can do here. If I want to go ahead and print out what is at the address s, what will I find in memory if I go to the address in s? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Sorry, a little louder. AUDIENCE: The first letter. DAVID J. MALAN: The first letter in Emma's name, right? If we can all agree-- even if it's a little unfamiliar still --that s is just the address of a character, and I say, go to s, what should I see specifically? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Probably E in Emma Right? If s is the address of the first character of her name, star s would mean go to that character. So let me go ahead and print that as a char. So let me go ahead now and make address dot slash address, enter. There is the E because I can say, go to that address and print what's there. And I can actually do this for all of her letters in her name. Let me go ahead and print out another one here. So how do I get at the second letter in Emma's name? Previous-- normally, like last week, we would have done this. And that just magically gets you to the second letter in her name. But I can do it a little differently. What if I go to s and then, from where do I want to go from s to get the second letter? AUDIENCE: Plus one. DAVID J. MALAN: Plus one, right? I mean, maybe we can literally just do arithmetic here. If s is the address of her first letter, it stands to reason that s plus 1 is the address of her second letter. So make address now dot slash address. And I should see EM. And I can do this twice more maybe and go ahead and do this and then this. But this time add 2 and this time add 3, just doing some simple arithmetic. Make address dot slash address, there is Emma but in a much lower level detail. So what is this bracket symbol? In computer science, this is what's called syntactic sugar. It's kind of a silly name. But it just refers to a handy feature so that you, the programmer, can say, s bracket 0 or bracket 1. But what the computer is actually doing underneath the hood-- the compiler, Clang --it's actually converting all of your uses of square brackets since Week 1 to this format here. It's just doing arithmetic underneath the hood. Now you don't have to do this moving forward. But I point out this low level detail just to give you a sense of, there really is no magic. When you say, go print an address or go do this, the computer is taking you literally. Whew. OK, that was a lot. Yes, question. AUDIENCE: So [INAUDIBLE] DAVID J. MALAN: Star s would mean go to the address in s. AUDIENCE: So why for instance, if you [INAUDIBLE] character [INAUDIBLE] DAVID J. MALAN: Really good question. Why, when you print out s, does it print out the whole string and not just the character? That's what the printf format code is doing for you. When you tell printf to use percent s, that has special meaning to printf. And it knows to go to the first address and not just print the second-- the first char, but print every character thereafter until it sees what? AUDIENCE: The null terminator. DAVID J. MALAN: The null terminating character. So printf and percent s are special and have been special since the Week 1. They just know to do exactly what you've described. So pointer arithmetic, to be clear, is just taking addresses and like, doing arithmetic with them, adding 1, adding 2, adding 3, or any other manipulation like that. All right. So [CHUCKLE] let's take another stab at a meme here. [CHUCKLE] OK, a few of us. All right. All right, it's trying too hard. All right. So what then do we have when it comes to strings? Well, let's now try to learn from these primitives and actually trip over some mistakes that we might otherwise make. I'm going to go ahead and open up a new file. I'm going to go ahead and call this one, compare. So we'll save this as compare dot c. And this will be reminiscent of something we started doing last week. And you've done this past week, particularly for implementing voting and comparing strings. I'm going to go ahead and make a quick program that just compares two integers. I'm going to put the training wheels back on temporarily, just so that we can get some numbers from the user pretty easily, including CS50 dot h and standard I/O dot h. I'm going to do int main void as my program. I'm going to get an integer called i and ask the human for that. I'm going to get another integer called j, ask the human for that. And then I'm going to go ahead and say if i equals equals j, then go ahead and print with printf that they're the same. Else, if i does not equal j, I'm going to go ahead quite simply and print out different backslash n. So if i equals equals j, it should say, same. Else, if it's different, it should say different. So let me go ahead and make compare dot slash compare. And I should see, hopefully, if I type in say, 1, 2, they're different. And if I instead do 1, 1, they're the same. All right. So it stands to reason that logically this is pretty straightforward when you want to compare things. So instead of using numbers, let me go ahead and change this. Let me go ahead and do, say, string s gets getString, just as before but using getString instead and ask the human for s. Then give me another string, t, just because it's alphabetically next. And I'll ask the human for t. And then I'm going to go ahead and ask this question, if s equals equals t, print same, else, print different. So now let me go ahead and make compare again. I'm going to go ahead and type in dot slash compare. We'll type in Emma. We'll then type in Rodrigo. And of course, it's going to say different. But if I instead run it again and type in Emma and all right, I'll type Emma again-- hmm, different. Maybe it's a capitalization thing? No. But why as of today, are they indeed different? Last week we kind of waved our hands and said, ah, they're arrays, you have to do some stuff. But why are they different? AUDIENCE: They're stored in different locations. DAVID J. MALAN: Exactly, they're stored in different locations. So when you get a string with getString and call it s, and then you get another string with t and call it t, you're getting two different chunks of memory. And yes, maybe the human has typed the same thing into the keyboard, but that doesn't necessarily mean that they're going to be stored in the exact same place. In fact, what we really have here is a picture not unlike this. If I have a variable called s-- and I'm just going to draw it as a box there --and if I have a variable called t-- I'll draw it as another box here --and I typed in Emma-- E-M-M-A --that's going to give me somewhere in memory, E-M-M-A backslash 0. And I'll try it as an actual array, albeit a little messily. And then here, if I type EMMA again in all caps, it's going to end up-- thanks to getString, at a different location in memory. By nature of how getString works, it's going to store anything you type in it. And what's going to get stored in s and t? Well, for the sake of discussion, let's suppose that this chunk of memory with the first input-- sorry --happens to be at 0x123. And the second chunk of memory happens to be at 0x456, just by chance. Well, what am I technically storing in s? 0x123. And what am I storing in t? 0x456. So when you say, is s equal equal t. Is it? Well, no. You're literally comparing 123 versus 456. The computer is not going to presumptuously go to that address for you unless you somehow tell it to. Put another way, if I instead draw these boxes, not as actual numbers, what we really have-- sorry --what we really have is what we'll draw as an arrow more generally, just a pointer to that value. Who really cares where the address is? So this is why last week we kind of waved our hand and said, eh, you can't just compare two strings because you probably have to compare every character. And that was true. But what you're technically comparing is indeed the addresses of those two variables. Any questions then on this here? Yeah. Sure, yes. AUDIENCE: So you said earlier that the, I guess, the pointer, and the actual thing it's pointing are like kind of somewhere in the memory not in a specific-- they're just somewhere, right? DAVID J. MALAN: OK. AUDIENCE: So do you need something that points to the point-- how does the computer know where the pointer is? DAVID J. MALAN: Oh, how does the computer know where these pointers are? So that's a really good question. And let's answer it right here. All this time when you've been calling getString to get a string, you've probably been assigning it to a variable like I have here on line six, with string s. But we know as of today that if we get rid of the CS50 library, technically, string is just synonymous with char star. And so both here and with t, do you technically have char star, right? It's just a find and replace if we get rid of that training wheel. Char star just means s is storing the address of a character. And char star t means t is storing the address of a character. Ergo, all this time since the Week 1 of CS50, what type of value has getString been returning, even though we never described it as such? What must getString be returning? Yeah. AUDIENCE: The index of the first letter. DAVID J. MALAN: Not even the index per se, but rather the-- AUDIENCE: It houses the memory of that. DAVID J. MALAN: The address of the first character. So anytime you called getString, getString code we wrote is finding in your computer's memory some free space, enough bytes to fit whatever the word was that got typed in. getString then, if we looked at its code, is designed to return the address of the first byte of that chunk of memory. So getString, this whole time, has been returning, if you will, what's called a pointer. But again, nuances that we didn't want to get into in the very first week certainly, of C programming. All right. Well, let's go ahead and make this a little more concrete. If I pull up this code, I don't have to just check if they're same or different, let me just go ahead and print them out. If I do percent p backslash n, I can literally print out s. And if I go ahead and print out the same thing for t using percent p, I can print out the value of t. So let me go ahead and make compare. Seems to compile OK. And I don't know what the addresses are in advance. But let me go ahead and type in, for instance, Emma and Emma. So even though those strings look the same notice, it's a little subtle this time, the first Emma's at 0xED76A0. The second Emma's at 0xED76E0, which is a few numbers away from the first Emma. So that just corroborates the instincts last week that we can't just compare them like that. So what are the implications then? Let's do one other example here. Let me go ahead and save this as copy dot C. And let's try a very reasonable goal. If I want to go ahead and get the user's input and actually copy a string and capitalize the string from the user, let's see this. So let me go ahead and give myself the temporary training wheels again, just so I can get a string from the human. Let me go ahead and include standard I/O dot h and then an int main void. Let me do a simple example, the goal of which now, is to get a string from the user and capitalize a copy thereof. So I'm going to go ahead and do string s gets getString and call it s, as before. I'm going to go ahead and then do string t equals s to make a copy of the variable. And then I'm going to go ahead and say what? Let me go ahead and capitalize the copy. And to capitalize the copy, I can just change the first character in t, so t bracket 0, to what? I think we had toupper a while back. Does this seem familiar? You can call the toupper function. And the toupper function, if you don't recall, you technically have to use C type dot h. This might be reminiscent of the second c problem set, where you might have used this in Caesar, or substitution, or the like. All right. And now, let me go ahead and print out these two strings. Let me go ahead and print out s. And let me go ahead and print out t. So again, all I've done in this program is get a string from the user, copy that string, capitalize the copy called t. And let's just print out the end results. So let me go ahead and save the file. Let me go ahead and make copy. Seems to compile OK. Let me go ahead and run copy. And let me go ahead and type in emma, in all lowercase, deliberately, because I want to see that t is capitalized but not s. Hmm. But somehow they're both capitalized. Notice, that emma in all lowercase ended up being both capitalized in s and capitalized in t per the two lines of output. That's a bug? Right? I only capitalized t, how did I accidentally also capitalize s do you think? Any thoughts? Doesn't matter if I avert the lights, I still can't see any hands. OK, how about here in front? Yeah. AUDIENCE: So when you say t equal s you have to [INAUDIBLE] DAVID J. MALAN: Exactly. When I say t equals s on this line, I am getting a second variable called t. And I am copying s. But I'm copying s literally. s as of today, is an address. After all, string is the same thing as char star for both s and t. And so technically, all I'm doing is copying an address. So if I go back to my picture from before, this time, if I've gone ahead and typed in an array of emma, with all lowercase-- e-m-m-a --and then a backslash 0, somewhere in memory using getString, and I've gone ahead initially and stored that in a variable called s-- and I don't care about the addresses anymore. I'm just going to use arrows now to depict it graphically. When I created a second variable called t and I set t equal to s, that's like literally copying the arrow that's in s and storing it in t, which means t is also pointing at the same thing. Because again, if I didn't do this hand wavy arrow notation, I literally wrote out 0x123. I would have just written out 0x123 in both s and t. So when, in my code, I go ahead and say, you know what, go to the first character in t and then go ahead and uppercase it. Guess what the first character in t is? Well, it's this e. But guess what the first character in s is, literally that same e. So this does not suffice to copy a string by just saying t equals s, as it has up until now with every other variable. Any time you've needed a temporary variable or a copy of something this worked. Intuitively, what do we have to do probably instead to truly copy Emma into two different places in memory? Yeah. AUDIENCE: Probably create a char or create a variable exactly the same size and copy each character individually. DAVID J. MALAN: Nice. So maybe we should give ourselves a variable that has more memory, the same amount of memory being stored for the original Emma, and then copy the characters from s into the space we've allocated for t. And so we can actually do this. Let me go ahead and get rid of all but that first line, where I've gotten s as before. And I'm going to go ahead and do this, I'm to say that t is a string-- but you know, we don't need that training wheel anymore. String, char star, even though it looks uglier. Let me go ahead and allocate more memory for myself. How do I do that? Well, it turns out-- we've not used this before --there's a C function called malloc, for memory alloca. And all it asks as input is how many bytes you want. So how many bytes do I want for Emma to store her name? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: I heard 4, 5. Why, 5? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: So we need the null terminating character, e-m-m-a and then backslash 0. So that's 5. So I could literally hard code this here. Of course, this feels a little fragile because I'm asking for any string via getString. I don't know it's going to be Emma. So you know what, let me go ahead and ask a question? Whatever the length is of the human's input in s, go ahead and add 1 to it for the null character and then allocate that many bytes. So now my program's more dynamic. And once I have this, well, how can I go ahead and copy this? Well, let me just do old school loop. So for int I get 0, i is less than the string length of s, i plus plus-- so this is just a standard for loop iterating over a string --and I think I can just do t bracket i equals s bracket i in order to copy the two strings. There's a subtle bug and a subtle inefficiency though. Anyone want to critique how I've gone about copying s into t? Yeah. AUDIENCE: [INAUDIBLE] getString [INAUDIBLE].. DAVID J. MALAN: Yeah. This was inefficient. We said a couple of weeks ago this is bad design to just keep asking the question, what's the length the s? What's the length of s? So remember that we had a little optimization a couple of weeks ago. Let's just declare n to equal the string length of s and then do a condition of i is less than n. So we've improved the design there. It's a little more efficient. We're wasting less time. There's still a subtle bug here. How many byte-- yeah. AUDIENCE: Aren't you not copying the null terminator DAVID J. MALAN: I'm not copying the null terminator. So every other time we've iterated over a string, this has been correct. Iterate up to the length but not through the length of that string. But I technically do want to go one more step this time, or equivalently, one more step. Because I also want to copy not just e-m-m-a, which is str length 4-- e-m-m-a is 4 --I also want to do it a fifth time for the null character. So in this case, I'm deliberately going one step past where I usually want to go to make sure I copy 5 bytes for Emma, not just 4. All right. Let's go ahead now and capitalize Emma. So t bracket 0 gets toupper of Emma's first character in the copy. And now let's go ahead and print out both strings s and t, just as before, with percent s of t. And let me make one change, I use strlen now. So I know I'm going to get an error if I don't do this. I need to use string dot h-- recall --anytime you use string length. So I'm going to go proactively add that. So what's different? This line is the same as before. I'm getting a string from the user. This line is the same as before. I'm capitalizing the first letter. And these two lines are the same. I'm just printing out s and t. So the new idea here is, with my malloc, am I allocating as many bytes as I need to store a copy of Emma, and then with this for loop am I actually doing the actual copy? Let me go ahead and do make copy again. Seems to run OK. Run dot slash copy. Type e-m-m-a in all lowercase. And voila, now I've capitalized t but not s. Yeah? AUDIENCE: When you use malloc, it's just allocating number of bytes, it doesn't matter where? DAVID J. MALAN: It is just allocating that many bytes for you. It does not matter where. You indeed should not care where it is because you're just being handed the address and using C code, can you just go there as you want. All right. Let's clean this up too. Surely, people copy strings for years. And in fact, we don't need to do this for loop ourself. It turns out we can simplify this code a little bit by enhancing this as follows. It turns out, if you look in the manual page for strings, you can actually use something called strcopy-- no-- without any vowels. And you can copy into t, the contents of s. strcpy is a function written a long time ago by some other human. And they went ahead and implemented, probably, that loop for us. And it tightens up our code here a little bit more. AUDIENCE: Professor? DAVID J. MALAN: Yeah. AUDIENCE: What if I forgot to copy in the null character at the end? DAVID J. MALAN: Really good question. What if you forgot to copy in the null character at the end? It is unclear what would happen. If there just happened to be some bits in that location in memory from earlier-- from some other part of your program --and you try printing out s and printing out t, you might print out many more characters than you actually intended-- if there's no backslash 0 actually there. We'll see this more and more. Anytime you don't initialize the value of a variable, it's what's called a garbage value, which means who knows what 0s and 1s are there. You might get lucky and it's all 0s. But most likely it's going to print some garbage value instead. All right. Any questions on this? Yeah. AUDIENCE: Is the string length function only in the CS50 library? DAVID J. MALAN: Is the-- which function? AUDIENCE: String length. DAVID J. MALAN: Oh, strlen, no, that's in string dot h. That is a standard C thing. AUDIENCE: OK. If string length is a standard function but strings are not-- DAVID J. MALAN: So what's the dichotomy here then? If strings don't exist-- as I've noted multiple times. And yet, there's functions like strcpy and strlen --what's going on? C calls them char stars. It is c that does not call them strings. We, CS50, and the world in general, calls addresses of sequences of characters, strings. So the only training wheel here, really is the semantics. We gave you a data type called string so that in the first week of C and CS50, you don't have to see or type char star, which would arguably be a lot more cryptic so early on. It's arguably a bit cryptic today too. Other questions? All right, yeah. AUDIENCE: So is char star ID type [INAUDIBLE] DAVID J. MALAN: Is-- say that once more. AUDIENCE: Char star ID type [INAUDIBLE]. DAVID J. MALAN: Not all of them, but any of them that take a string, yes. In fact, any time you have seen us or TF in CS50 say string, you can literally, starting today, change that expression to char star and it will be one and the same. Phew. OK. That was a lot. Let's take our five minute break here with cookies outside. All right. So we are back. That was a lot. Let me draw our attention to what the newest feature was just a moment ago, this notion of malloc, memory allocation. So recall that getString I claim as of today, all this time, it's just returning to you the address of the string that was gotten from the human. malloc, similarly, has a return value. And when you ask malloc for this many bytes-- maybe it's five, for emma, plus the null terminator, malloc's purpose in life is to return to you the address of the first byte of that memory as well. So memory alloc means, go get me a chunk of memory somewhere, hand me back a pointer there too. And the onus is on me to remember that address, as I'm doing here, by storing it in t. But it turns out, now that we're taking the training wheels off, unfortunately, we have to kind of do a bit more work ourselves. And there's actually a latent bug in this program. It turns out that I am mal-allocating memory with this but I'm never actually freeing it. The opposite of malloc is a function called free, whose purpose in life is to hand back the memory that you asked for so that you have plenty of memory available for other parts of your program and so forth. And long story short, if you've ever-- on your Mac or PC --been running a program that maybe is a little bit buggy --you might notice your computer is getting slower, and slower, or maybe it even runs out of memory explicitly, per some error message --that might be quite simply, that the programmer of that program kept using mallc, and malloc, and malloc to grow, and grow, and grow their use of memory, but they never got around to freeing any of that memory. So programs can run out of memory. Your computer can run out of memory. So it's good practice, therefore, to free any memory you're not using. However, how do you find this mistake? So we've got one final debugging tool for you. This one's not CS50 specific like debug50. This one is called Valgrind. Unfortunately, it's not the easiest thing to understand at first glance. So I'm going to go ahead and do this. I'm going to run Valgrind on this program, dot slash copy, and hit Enter. And unfort-- AUDIENCE: [INAUDIBLE] [CHUCKLE] [COUGH] DAVID J. MALAN: Gotcha. OK. I'm going to go ahead and-- there we go. AUDIENCE: [INAUDIBLE] So what you missed was a very scary message. So I'm going to go ahead and run Valgrind on dot slash copy. We see this esoteric output up top and then my prompt for s-- because it's the same program. It's prompting me for a string --so I'm going to give it emma, all lowercase, and enter. And you'll notice now, that there's some summary going on here but also some mention of error. So heap summary-- we'll come back to that in a bit --5 bytes in 1 blocks are definitely lost in loss record 1 of 2. Leak summary, I've got 5 bytes leaking in 1 blocks. I mean, this is one of these programs in Linux-- the operating system that we use, that's quite common in industry too --I mean, my god. There's so-- there's so many more characters on the screen that are actually enlightening for me. Let's see if we can focus our attention on what matters. Memory leaking, bad. So how do we go about chasing down where memory is leaking? Well, as before, we can use help50. And in fact, help50 will analyze the output of Valgrind-- it's still going to prompt me first string. So I'm going to again, type in emma --it's going to look at that. It's to ask for help. And voila, highlighted in yellow, is a message that we, help50, recognize. And notice our advices, looks like your program leaked 5 bytes of memory. Did you forget to free memory that you allocated via malloc. Take a closer look at line 10 of copy dot C. Now once you've done this a couple of times and made the same mistake, you can probably scroll up and glean for yourself where the error is. We're not revealing any more information than is right in front of you. And in fact, you can see here, ah, in main on copy dot C, line 10, there's some kind of 5 bytes in 1 blocks are definitely lost. So there's a lot of words there but it does draw attention to the right place. So let me go ahead and scroll down, focus on line 10. And indeed, line 10 is where I allocated the memory. So it turns out the solution for this is quite simple. Down here, I'm just going to go ahead and free t, the address of the chunk of memory that malloc returned to me. So I'm undoing the effects of allocating memory by de-allocating memory. So now let me go ahead and run copy. And if I run copy, it's not going to seem to run any differently. It's still going to work correctly. But now if I analyze it for mistakes with Valgrind, so Valgrind of dot slash copy-- I'm going to again type in emma in all lowercase and I cross my fingers --that indeed now, leaked summary, 0 bytes in 0 blocks. So unfortunately, even when all is well, it still spits out a mouthful. But now I see no mention of blocks that are actually leaked, at least in the top part here. And we'll see more of this over the next couple of weeks as we use it to chase down more complicated bugs. But it's just another tool in the toolkit that allows us to detect these kinds of errors. Let me try one other thing actually. This is a program that I wrote in advance. This one is called memory dot C. And as always, these are all on the course's website if you'd like to tinker after. And it's a little pointless. It's just meant for demonstration purposes. So here is a program. And it's copied from this online manual for Valgrind, the tool I just used. So let's see what's going on. Here I have main, at the bottom of my code. I copied it. I didn't use a prototype. I just copied what they did. And see here, it calls a function called f and then returns 0. Well what does f do? f is this random function up here that takes no inputs per the void. And in English, how would you describe what's happening in line 7 now-- that we've introduced malloc and stars-- or pointers? What's this doing? Yeah. AUDIENCE: It's allocating enough memory in [INAUDIBLE] for [INAUDIBLE].. DAVID J. MALAN: Good. Allocate enough memory for 10 integers-- and then let me add-- elaborate on your words --and then store the address of that chunk of memory in a pointer called x, if you will. So sizeof is new. But it literally does what it says. If you say sizeof open paren, close paren, and then the name of a data type, it will tell you that an int is 4 bytes. It will tell you that a long is 8 bytes. It will tell you that a char is one byte. It's just a dynamic way of avoiding having to memorize those kinds of things. So this just means give me 10 times the size of an int, which happens to be 4 bytes. So that means give me 10 times 4, or 40 bytes of memory. That's effectively an array of memory that I can store integers in. And malloc, per its definition, is going to return to me the address of the first byte of that memory. What is now scary about line 8, relatively speaking? What might worry you with line 8, which is buggy, unfortunately? Yeah. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Exactly. I'm doing x bracket 10 and just arbitrarily storing the number 0. Why? Just because. But 10 does not exist. Right? If I have 10 int, it's bracket 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, not bracket 10. So this is an example of overflowing a buffer, so to speak. Anytime you're talking about memory, any time you're talking about an array of memory-- which this effectively is, 10 integers, room for 10 integers back to back to back --if you go one step too far, that's what's called a buffer overflow, whereby the buffer is the array. And in fact, this would make it even more clear. Suppose I tried to go there, bracket 10,000. That is definitely not among the bytes of memory I allocated. That's definitely going beyond the boundaries of my array. But so is it true that bracket 10 is one step too far. So what's nice about Valgrind is this. Let me go ahead and rerun Valgrind after compiling this memory program-- whoops --in my source directory. Let me go ahead and make memory. All right. It compiled OK. Valgrind dot slash memory-- and unfortunately, we're going to see some crazy arcane error messages for a moment. But let's see what it says. Notice here, invalid write of size 4-- that sounds bad --and 40 bytes in one blocks are-- OK, they didn't really add an if condition in Valgrind. --40 bytes in 1 blocks-- plural --are definitely lost. So let's fix the second of those first. Why am I leaking 40 bytes exactly? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: I'm never freeing it. So I think I can get away with just doing this here, just free the memory after I'm done using it-- even though I'm not really using it for anything purposeful here. So let me try this again. Make memory, now let me do Valgrind dot slash memory. And-- OK, better. I don't see 40 bytes lost anymore. So that's good. But I do still have this issue. But here's where it's sometimes useful to understand the various data types and their sizes. Invalid write of size 4. Writing in a program just means changing a value. And it mentioned line 8 here. In what sense is this an invalid write of size 4? Well, how big is an int? Four bytes. You're trying to change it arbitrarily to 0. But I could have made that 50 or any other number. But I'm trying to touch an int that should not be within the memory I have allocated for myself. I asked for 40 bytes, or 10 ints, but again, because arrays are zero indexed, this is like going one beyond the boundary. So let me fix this and just arbitrarily say, let's go touch that part of it. Let me go here and do make memory. Let me go ahead and do Valgrind dot slash memory. And now, arcane output aside, notice that that error message went away too. So this will be helpful over the coming couple of weeks as we continue to use C to implement a number of programs that now start to manipulate memory. It's just a tool that helps you spot errors that certainly, your TF might otherwise, or that might be causing your program to crash, or to freeze, or to segfault-- if you've seen that yourselves before. All right. So that's just a tool. Let's go ahead and transition now to some actual use cases here. Recall from last week that it was pretty useful to be able to swap values. Right? With bubble sort, with selection sort, we needed to be able to exchange values so that we could put things into the right place. Turns out this is pretty straightforward. Right? And we can actually mimic this in the real world. We just have opportunity for one volunteer today, one volunteer. Can we get a little-- OK, over here. Yeah. What's your name? FARRAH: Farrah. DAVID J. MALAN: Sorry. FARRAH: Farrah. DAVID J. MALAN: Vera. FARRAH: Farrah. DAVID J. MALAN: Oh, here, come on up. Then I can hear you up here. OK, what's your name? FARRAH: Farrah. DAVID J. MALAN: Vera. FARRAH: With an F. DAVID J. MALAN: Fera. FARRAH: Farrah. DAVID J. MALAN: Farrah. FARRAH: Yes. DAVID J. MALAN: Farrah. Yes, OK. Good. Come on up. Still come up. [CHUCKLE] Thank you. Thank you. [APPLAUSE] [CHEERS] OK, nice to meet you. FARRAH: Hi, nice to meet you. DAVID J. MALAN: Farrah. FARRAH: Yes. DAVID J. MALAN: OK. So let's go ahead here. Let me give you this so that you can be mic'd as well. OK, so the goal at hand is here, I have two glasses of colored water. So we have some purple here. [WATER POURING] OK. And we've got some green here. [WATER POURING] And the only goal at hand is to do a very simple operation like we needed to do quite a bit last week, which is to swap two variables just like we swap two numbers. So if you could go ahead and get the purple liquid in here and the green liquid in here, go. [CHUCKLE] FARRAH: Is it OK if they overlap? DAVID J. MALAN: Ideally, no. We want to put only purple here, and only green here, and no temporary store. [LAUGHTER] FARRAH: Oh. DAVID J. MALAN: OK, but you're hesitating. Why? FARRAH: Because you told me they couldn't touch [CHUCKLE] and-- DAVID J. MALAN: Well, you can touch the glasses. But you're hesitating to swap them, why? [CLINK] OK, that's just cheating. [LAUGHTER] [APPLAUSE] [CHEERS] OK, very clever. Supposing you can't just move things around in memory, what if I gave you a temporary variable. FARRAH: OK. DAVID J. MALAN: Does this help? FARRAH: Yes. DAVID J. MALAN: So how can we now get purple in there and green in there? [CHUCKLE] FARRAH: Can I put purple in here first? DAVID J. MALAN: Sure. FARRAH: I'm going to spill it. DAVID J. MALAN: It's OK. [WATER POURING] OK. So purple goes into the temporary, very nice. [APPLAUSE] FARRAH: Thank you. DAVID J. MALAN: Green goes into what was purple. FARRAH: Yes DAVID J. MALAN: OK, good. And then purple goes in-- from the temporary variable into the original green glass. Now, a proper round of applause if we could. OK. [APPLAUSE]
B1 david malan malan string address emma memory CS50 2019 - Lecture 4 - Pointers 4 0 林宜悉 posted on 2020/04/15 More Share Save Report Video vocabulary