Subtitles section Play video Print subtitles [Valgrind] [Nate Hardison, Harvard University] This is CS50, CS50.TV] Some of the most difficult bugs in C programs come from the mismanagement of memory. There are a huge number of ways to screw things up, including allocating the wrong amount of memory, forgetting to initialize variables, writing before or after the end of a buffer, and freeing keep memory multiple times. The symptoms range from intermittent crashes to mysteriously overwritten values, often at places and times far removed from the original error. Tracing the observed problem back to the underlying root cause can be challenging, but fortunately there's a helpful program called Valgrind that can do a lot to help. >> You run a program under Valgrind to enable extensive checking of heap memory allocations and accesses. When Valgrind detects a problem, it gives you immediate, direct information that allows you to more easily find and fix the problem. Valgrind also reports on less deadly memory issues, such as memory leaks, allocating heap memory, and forgetting to free it. Like our compiler, Clang, in our debugger, GDB, Valgrind is free software, and it is installed on the appliance. Valgrind runs on your binary executable, not your .c or .h source code files, so be sure you have compiled an up-to-date copy of your program using Clang or Make. Then, running your program under Valgrind can be as simple as just prefixing the standard program command with the word Valgrind, which starts up Valgrind and runs the program inside of it. When starting, Valgrind does some complex jiggering to configure the executable for the memory checks, so it can take a bit to get up and running. The program will then execute normally, be it much more slowly, and when it finishes, Valgrind will print a summary of its memory usage. If all goes well, it will look something like this: In this case, ./clean_program is the path to the program I want to run. And while this one doesn't take any arguments, if it did I'd just tack them on to the end of the command as usual. Clean program is just a silly little program I created that allocates space for a block of ints on the heap, put some values inside of them, and frees the whole block. This is what you're shooting for, no errors and no leaks. >> Another important metric is the total number of bytes allocated. Depending on the program, if your allocations are in the megabytes or higher, you're probably doing something wrong. Are you unnecessarily storing duplicates? Are you using the heap for storage, when it would be better to use the stack? So, memory errors can be truly evil. The more overt ones cause spectacular crashes, but even then it can still be hard to pinpoint what exactly led to the crash. More insidiously, a program with a memory error can still compile cleanly and can still seem to work correctly because you managed to get lucky most of the time. After several "successful outcomes," you might just think that a crash is a fluke of the computer, but the computer is never wrong. >> Running Valgrind can help you track down the cause of visible memory errors as well as find lurking errors you don't even yet know about. Each time Valgrind detects a problem, it prints information about what it observed. Each item is fairly terse-- the source line of the offending instruction, what the issue is, and a little info about the memory involved-- but often it's enough information to direct your attention to the right place. Here is an example of Valgrind running on a buggy program that does an invalid read of heap memory. We see no errors or warnings in compilation. Uh-oh, the error summary says that there are two errors-- two invalid reads of size 4--bytes, that is. Both bad reads occurred in the main function of invalid_read.c, the first on line 16 and the second on line 19. Let's look at the code. Looks like the first call to printf tries to read one int past the end of our memory block. If we look back at Valgrind's output, we see that Valgrind told us exactly that. The address we're trying to read starts 0 bytes past the end of the block of size 16 bytes-- four 32-bit ints that we allocated. That is, the address we were trying to read starts right at the end of our block, just as we see in our bad printf call. Now, invalid reads might not seem like that big of a deal, but if you're using that data to control the flow of your program-- for example, as part of an if statement or loop-- then things can silently go bad. Watch how I can run the invalid_read program and nothing out of the ordinary happens. Scary, huh? >> Now, let's look at some more kinds of errors that you might encounter in your code, and we'll see how Valgrind detects them. We just saw an example of an invalid_read, so now let's check out an invalid_write. Again, no errors or warnings in compilation. Okay, Valgrind says that there are two errors in this program-- and invalid_write and an invalid_read. Let's check out this code. Looks like we've got an instance of the classic strlen plus one bug. The code doesn't malloc an extra byte of space for the /0 character, so when str copy went to write it at ssubstrlen "cs50 rocks!" it wrote 1 byte past the end of our block. The invalid_read comes when we make our call to printf. Printf ends up reading invalid memory when it reads the /0 character as it looks at the end of this E string it's printing. But none of this escaped Valgrind. We see that it caught the invalid_write as part of the str copy on line 11 of main, and the invalid_read is part of printf. Rock on, Valgrind. Again, this might not seem like a big deal. We can run this program over and over outside of Valgrind and not see any error symptoms. >> However, let's look at a slight variation of this to see how things can get really bad. So, granted, we are abusing things more than just a bit in this code. We're only allocating space on the heap for two strings the length of cs50 rocks, this time, remembering the /0 character. But then we throw in a super-long string into the memory block that S is pointing to. What effect will that have on the memory block that T points to? Well, if T points to memory that's just adjacent to S, coming just after it, then we might have written over part of T. Let's run this code. Look at what happened. The strings we stored in our heap blocks both appeared to have printed out correctly. Nothing seems wrong at all. However, let's go back into our code and comment out the line where we copy cs50 rocks into the second memory block, pointed to by t. Now, when we run this code we should only see the contents of the first memory block print out. Whoa, even though we didn't str copy any characters into the second heap block, the one pointed to by T, we get a print out. Indeed, the string we stuffed into our first block overran the first block and into the second block, making everything seem normal. Valgrind, though, tells us the true story. There we go. All of those invalid reads and writes. >> Let's look at an example of another kind of error. Here we do something rather unfortunate. We grab space for an int on the heap, and we initialize an int pointer--p--to point to that space. However, while our pointer is initialized, the data that it's pointing to just has whatever junk is in that part of the heap. So when we load that data into int i, we technically initialize i, but we do so with junk data. The call to assert, which is a handy debugging macro defined in the aptly named assert library, will abort the program if its test condition fails. That is, if i is not 0. Depending on what was in the heap space, pointed to by p, this program might work sometimes and fail at other times. If it works, we're just getting lucky. The compiler won't catch this error, but Valgrind sure will. There we see the error stemming from our use of that junk data. >> When you allocate heap memory but don't deallocate it or free it, that is called a leak. For a small, short-lived program that runs and immediately exits, leaks are fairly harmless, but for a project of larger size and/or longevity, even a small leak can compound into something major. For CS50, we do expect you to take care of freeing all of the heap memory that you allocate, since we want you to build the skills to properly handle the manual process required by C. To do so, your program should have an exact one-to-one correspondence between malloc and free calls. Fortunately, Valgrind can help you with memory leaks too. Here is a leaky program called leak.c that allocates space on the heap, writes to it, but doesn't free it. We compile it with Make and run it under Valgrind, and we see that, while we have no memory errors, we do have one leak. There are 16 bytes definitely lost, meaning that the pointer to that memory wasn't in scope when the program exited. Now, Valgrind doesn't give us a ton of information about the leak, but if we follow this little note that it gives down towards the bottom of its report to rerun with --leak-check=full to see the full details of leaked memory, we'll get more information. Now, in the heap summary, Valgrind tells us where the memory that was lost was initially allocated. Just as we know from looking in the source code, Valgrind informs us that we leaked the memory allocated with a call to malloc on line 8 of leak.c in the main function. Pretty nifty. >> Valgrind categorizes leaks using these terms: Definitely lost--this is heap allocated memory to which the program no longer has a pointer. Valgrind knows that you once had the pointer but have since lost track of it. This memory is definitely leaked. Indirectly lost--this is heap allocated memory to which the only pointers to it also are lost. For example, if you lost your pointer to the first node of a linked list, then the first node itself would be definitely lost, while any subsequent nodes would be indirectly lost. Possibly lost--this is heap allocated memory to which Valgrind cannot be sure whether there is a pointer or not. Still reachable is heap allocated memory to which the program still has a pointer at exit, which typically means that a global variable points to it. To check for these leaks, you'll also have to include the option --still-reachable=yes in your invocation of Valgrind. >> These different cases might require different strategies for cleaning them up, but leaks should be eliminated. Unfortunately, fixing leaks can be hard to do, since incorrect calls to free can blow up your program. For example, if we look at invalid_free.c, we see an example of bad memory deallocation. What should be a single call to free the entire block of memory pointed to by int_block, has instead become an attempt to free each int-sized section of the memory individually. This will fail catastrophically. Boom! What an error. This is definitely not good. If you're stuck with this kind of error, though, and you don't know where to look, fall back on your new best friend. You guessed it--Valgrind. Valgrind, as always, knows exactly what's up. The alloc and free counts don't match up. We've got 1 alloc and 4 frees. And Valgrind also tells us where the first bad free call-- the one that triggered the blowup--is coming from-- line 16. As you see, bad calls to free are really bad, so we recommend letting your program leak while you're working on getting the functionality correct. Start looking for leaks only after your program is working properly, without any other errors. >> And that's all we've got for this video. Now what are you waiting for? Go run Valgrind on your programs right now. My name is Nate Hardison. This is CS50. [CS50.TV]
B1 memory heap program block allocated cs50 Valgrind 64 7 Amy.Lin posted on 2017/01/30 More Share Save Report Video vocabulary