Subtitles section Play video Print subtitles [MUSIC PLAYING] DAVID MALAN: All right. This is CS50, and this is the day before our test, of course. But this is lecture 8, in which we're actually going to finally transition from C, this lower-level language that we've been spending quite some time to. And the goal today isn't so much to focus on Python per se, but honestly to do what we hope will be one of the most empowering aspects of the class, which is to emphasize that this has not been a semester learning C. This has been a semester learning programming, a certain type of programming called procedural or imperative programming. But more on that in another higher-level class perhaps. But really, that this class is about ultimately teaching yourself to learn new languages. And indeed, what you'll find is that as we explore some of the features and the syntax of Python, odds are today, it might look as cryptic as C did just a few weeks ago. But you'll find that once you start recognizing patterns, as you have with C, it will be all the more accessible and all the more useful when solving some problems. So unrelatedly, just earlier this week, I happened to be in Mountain View with some of the team. And you might recall from last lecture at Harvard we offered this glimpse of one of the earliest racks of servers that Google itself had. Well, turns out they changed buildings. But we happened to stumble upon the actual display. So pictured here is a photo from my own phone, which was actually really cool to see. So inside of this, you'll see all of the old hard drives they've used. We actually looked at some of the labels. And indeed, hard drives manufactured in 1999, which is when Google started getting some of its momentum. You can see the green circuit boards here, on which would be CPUs and other things, potentially. So if you'd like a stroll down memory lane, feel free to read up on this on Wikipedia or even on the excerpts here. And then strangely enough, at the conference some of us were at did we discover this-- perhaps the biggest duck debugger made up of smaller duck debuggers, one of whom was our own. So that, too, was how we spent this past week. All right. So how are we going to spend this week and the weeks to come? So you'll recall that when we transitioned from Scratch to C, we drew a couple of comparisons between syntax and features. And I thought it'd be useful to take that same approach here, really to emphasize that most of the ideas we're going to explore today are themselves not new. It's just how you express them and how you write the syntax in the language known as Python that's indeed going to be different from Scratch, from C, and now here we are with Python. So back in the day, in week 0, when you wanted to say something in Scratch, you would literally use this blue purple puzzle piece, say hello. And we called that a function or a statement. It was some kind of verb action. And in C, of course, it looked a little something like this. Henceforth, starting today in Python, it's going to look like this. So before, after. Before, after. So it's pretty easy to visually diff these two things. But what are just a couple of the differences that jump out at you immediately? C, Python. So there's no more backslash n, it would seem, in this context. So that's kind of a nice relief to not have to type anymore. What else seems to be different? No semicolon, thank god. Right? Perhaps the stupidest source of frustration that you might have experienced by just omitting one of those. And someone over here? Yeah, so printf is now just print, which is pretty reasonable unto itself. So these are terribly minor differences. But it's sort of testament to the kinds of mental adjustments you're going to have to start to make. Fortunately, thus far we've seen that you can start leaving things off, which is actually a guiding principle of Python in that one of its goals is it's meant to be easier to write than some of its predecessors, among them C. So in C we might have implemented this hello, world program that actually ran when you clicked the green flag using code like that at the right. And this was, if those of you who had no programming experience coming in to CS50, what probably looked like the proverbial Greek to you just a few weeks ago. And we teased apart what those various lines meant. But in Python, guess what? If you want to write a program whose purpose in life is to say, hello, well, just write def main. Print hello, world. So it's a little similarly structured. And in fact, it does not lack for some of the more arcane syntax here. But we'll see soon what this actually means. But it's a little simpler than the one before. And let's tease this apart. So def here simply means define me, a function. So whereas in C we've historically seen that you specify the type that the function should return, we're not going to do that in Python anymore. Python still has data types, but we're not going to explicitly mention what data types we're using. Meanwhile, here is the name of the function. And main would be a convention, but it's not built into the language in the same way as it is in C, as we shall see. Meanwhile, this silly incantation is just a way of ensuring that the default function to be executed in a Python program is indeed going to be called main. But more on that when we actually start creating. But this is perhaps the most subtle but most important difference, at least early on. And it's even hard to see at this scale. But notice the colons both here and here that I've highlighted now in yellow, and these dots, which are not to be typed, but are just meant to draw your attention to the fact that I hit the space bar four times in those locations. So if you have ever sort of gotten some feedback from your TA or TF that your style could be better, closer to 5 out of 5, because of lack of indentation or pretty formatting, Python's actually gonna help us out with this. So Python code will not run if you have not invented things properly. So gone are the curly braces that encapsulate related lines of code within some block of functionality. And instead, they're replaced generally with this general structure. You have a colon, and then below that and indented are all of the lines that are somehow related to that earlier line of code. And the indentation must be consistent. So even though your own eye might not quite distinguish four spaces from three, the Python environment will. And so this will actually help implicitly enforce better style, perhaps, than might have been easily done from the get-go. So then, of course, in Scratch, we had a forever block, which says, hello, world forever, much like in C, we could implement it like this. Now there's actually a pretty clean mapping in Python. We already know we can get rid of the semicolon. We already know we can get rid of the curly braces. We're going to have to add in a colon. But it turns out we can get rid of a little more, too. So what more is absent from this translation of hello, world to Python? This one's more subtle. So we definitely got rid of the curly braces, relying now just on indentation. OK, so there's no parentheses around while. And so this, too, is actually meant to be a feature of Python. If you don't logically need parentheses to enforce order of operations, like in arithmetic or the like, then don't use them because they're just a distraction. They're just more to type. And the code now is just visually cleaner and easier to read. There's a minor difference, too-- True and False are going to be capitalized in Python. But that's a fairly incidental detail. But notice this kind of captures already the spirit of Python. It's not a huge leap to go from one to the other. But we've just kind of started to get rid of some of the clutter and the stuff that never really intellectually added much, and if anything was annoying to have to remember early on. So True here is our Boolean. And now we have a finite number of iterations. We might want to say hello, world exactly 50 times. In C, this was a crazy mess if you wanted to do this. You'd have to initialize a variable with which to count up to, but not including 50, plus plussing along the way and so forth. In Python, it's going to be a little cleaner. And we'll come back to what this means exactly. But if you kind of read it from left to right, it kind of says what you mean. Right? For i in the range of 50. So i is probably going to be a variable. And notice we're not mentioning its type. It's going to be implied by whatever the context is, which in this case has to do, apparently, with numbers, per the 50. Range is actually going to be a data type unto itself. It's a little funky in that sense. It's called a class. But this essentially is a special feature of Python that, unlike in C, where if you want to iterate over an array of values or 50 such values, you would literally have an array of 50 values. Range is kind of cool in that it kind of stands there. And every time you iterate through a loop, it hands you the next number, but just one at a time, thereby using maybe as little as one 50th the amount of memory because it only has to keep one number around at a time. And there's a bit more overhead than that. It's not a perfect savings, quite so. But this just says for i in range 50, and that's going to implicitly count from 0 up through 49. And meanwhile, what's below it is what's going to get printed this time. So meanwhile, here was one of our bigger Scratch blocks early on. And i translates pretty literally to code in C. And you can perhaps guess, if you've never seen Python before today, what the Python code might now look like. If this here on the right is the C code, what are some of the features syntactically that we're about to throw away? Yeah. AUDIENCE: You can throw away the curly braces and the parentheses. DAVID MALAN: Curly braces and parentheses are going to go away. What else might go away? The semicolons are going to go away. The backslash n inside of the print statements. Great. One more thing, I think. The if. So we don't strictly need the parentheses because it's not like I'm combining things logically like this or that or this and that. So it should suffice to get rid of those two. And there's a couple of other tweaks we're going to have to make here. But indeed, the code's going to be a lot tighter, so to speak. Now you're just going to say what you mean here. And there is one weird thing. And it's not a typo. What apparently are we going to have to start knowing now? Elif whatever. So elif is not a typo. It's indeed how you express the notion of else-if. But otherwise, everything is exactly the same. And notice the colons. Frankly, ironically, whereas previously it might have been annoying to occasionally forget a semicolon, now the colons my take on that role. But at least everything below them is meant to be indented. So here's a fundamental difference beyond the sort of silly syntactic differences of this and, say, other languages-- the flow of work that we've been using thus far has been essentially this in C. You write source code in a file generally ending in .c. You run a compiler, which, as a quick check, is called clang. So it's not technically make. Make is just this helpful build utility that automates the process of calling clang. So clang is, strictly speaking, the compiler. And clang outputs zeros and ones, otherwise known as machine code. And your computer-- Mac, PC, whatever-- has a CPU, Central Processing Unit inside, made by Intel or some other company. And that's CPU is hardwired to understand certain patterns of bits, zeros and ones, otherwise known as machine code. So that's been our world in C. With Python-- so the code that you might have compiled in C, for instance, might have been this, which we said we run clang on like this. And if you don't specify a default file name as output, you'll instead just get in your file all of the zeros and ones, which can then be executed by way of ./a.out, the default name for the assembler's output here. So in Python, though, the world gets here, too, a little simpler, as well. So we just now have source code and an interpreter. So there's no machine code, it would seem. There's no compiler, it would seem. And frankly, there's one fewer arrow, which suggests to me that the process of running Python code itself is actually going to be a little easier. Running C code has typically been two steps. You rerun clang, or via make you run clang. Then you run the program. And it's fine. It's not all that hard. But it's two steps. Why not reduce to two steps what you could instead do in one? And we'll see exactly what this means. Now, technically, that's a bit of an oversimplification. Technically, underneath the hood, if you wanted to run a program like this that simply prints out hello, world, you would simply run python hello.py. And the result of that would be to see hello, world on the screen, as we'll soon see. But technically, underneath the hood, there is some other stuff going on. So there actually kind of is a compiler. But there's not something called machine code, per se. It's called bytecode. There's even something called a Python virtual machine. But all of this is abstracted away for us, certainly for the sake of today's conversation, but also in the real world, more generally. Humans have gotten better over the decades at writing software and writing tools via which we can write software. And so a lot of the more manual processes and a lot of the lower-level details that we've been focusing on, if not struggling on, in C, start to go away. Because much like in week 0, where we started layering on idea after idea-- zeros and ones, ASCII, colors, and whatnot-- similarly with our actual tools are we're going to start to do the same. So whereas in actuality what's going on underneath the hood is this process here, we can start to think about it, really, as something quite simpler. Now, if you're curious, and if you take some higher-level class like CS61 or another, you'll actually talk about things like bytecode and assembly code and the like. And we saw a glimpse of the latter a bit ago. This happens to be an intermediate language that Python source code is converted into before it's run by the computer. But again, we're going to turn a blind eye to those lower-level details. So here are some of the tools now in our toolkit. In Python, there are data types, though as of now we've not seen any examples whereby I specify what types of values are going to be in my variables or what types of values a function's going to return. But they are there. Everything is sort of loosely typed in that whatever you want a variable to be, it will just take on that data type, whether it's an int or string or the like. It's not going to be the full word string. In Python, it's literally called str. But there are some familiar types here-- bool and float and int and others. And, in fact, among the others, as we'll soon see, are features like range. But before that, note too that we'll provide for at least our first foray into Python a few familiar functions. So Python has different mechanisms than C for getting input from the user. We've abstracted some of those details away in a new CS50 library for Python that you'll really just use one or few times before we transition away from even that, but will give you functions like get_char, get_float, get_int, get_string that handle all the requisite error checking so that at least for your first few programs, you can just start to get some real work done without diving into underneath the hood there. And then lastly, here are some other tools in our toolkit. And we'll just scratch the surface of some of these today. But what's nice about Python and what's nice about higher-level languages more generally-- like more modern languages that learned lessons from older languages like C-- is that you get so much more for free, so much more out of the box. There's so much more of a kitchen sink. There's so many metaphors we can use here, all of which speak to the fact that Python has more features than C, much like Java, if you took AP CS or something else, had than C. So does Python have a whole toolkit for representing complex numbers, for representing dictionaries, otherwise implemented as hash tables, as you now know; lists, which is kind of synonymous with an array. But a list is an array that can sort of automatically grow and shrink. We don't have to jump through hoops as we did in C. Range we've seen briefly, which just hands you back one number after another in some range, ideally for iteration. Set is the notion from mathematics, where if you want to put bunches of things into a data structure and you want to make sure you have only one of each such thing without duplicates, you can use a set. And a tuple is also a mathematical notion, typically where you can combine related things without complicating things with actual structs. Like, x, y is a common paradigm in lots of programs-- graphics, or videos, or certainly math and graphing itself. You don't really need a whole full-fledged data structure. You might just want to say, x, y. And so Python gives us that kind of expressiveness. So let's actually now dive in with that quick mapping from one world to the other and focus on what you can actually do with Python. So here I am in the familiar CS50 IDE. Much like we have pre-installed for you clang and make and other tools, we've also installed for you a program. That program is called Python, which is a little confusing at first glance because Python is apparently the name of the language. But it's also the name of the program. And here's where Python is different. Whereas C is again compiled, and you use something like clang to convert it to machine code, Python is both the name of the language and the name of the program you use to interpret the language. So pre-installed in CS50 IDE, and frankly, these days, probably on your own Macs or PCs, even if you don't know it, is a program called Python that if fed Python source code as input will do what that code says. So let's go ahead and try something just like that. Let me go ahead and save a file preemptively as hello.py. So .py will be the convention now instead of .c. And I'm going to go ahead and actually keep this pretty simple. I'm just going to print the first thing-- muscle memory. So it's not printf anymore. It's just hello, world. Save, done. That's going to be my first program in Python. Why? It's one line of code. It's consistent with the features I've claimed Python has. So how do I run it? Well, in C, we would have done, like, make hello. But make knows nothing about this because make is typically used with C, at least in this context here. So maybe it's, like, ./hello.py. No. It seems I don't have permission there. But there's a step that I teased us with earlier on just the slide alone. How do I go about running a program, did I say? AUDIENCE: Python hello.py. DAVID MALAN: Yeah. I have to be a little more explicit. So python, which is the name of the interpreter that understands Python. And now I need to feed it some input. And we know from our time in C that programs can take command-line arguments. And indeed, this program itself does, Python. You just give it the name of a program to run. And there it is, our very first program. So that's all fine and good. But what if I wanted to do something a little more interesting, like getting a string from the user? Well, turns out in Python, in CS50 IDE especially, I can do something like this. s gets get_string. And I can ask someone, for instance, for their name, like this. Now, CS50 IDE is already yelling at me-- undefined variable get_string. and let's actually see if maybe it's just buggy. No. So this is a little more arcane than usual. But traceback, most recent call last. File "hello.py," line 2, in module-- whatever that is. So I see a line of code from line 2. NameError-- name get_string is not defined. This is not the same language we've seen before, but what does this feel reminiscent of? Yeah, like in the past, when you've forgotten cs50.h, you've gotten something about an undeclared identifier, something like that. It just didn't understand something related to the CS50 library. So in C, we would have done include cs50.h. That's no longer germane because now we're in Python. But it's somewhat similar in spirit. Now I'm going to say instead from cs50 import get_string, and now save that. And hopefully momentarily, the errors will go away as the IDE realizes, oh, you've now imported the CS50 library, specifically a method or function, rather, inside of it called get_string. So there, too, it's different syntax, but it kind of says what it means-- from cs50, which is apparently the name of the library, import a function called get_string. Now if I go ahead and rerun python hello.py, I can go ahead and type in, say, Maria's name and ignore her altogether because I need to make a fix here. What's the obvious bug-- obvious now, to me-- in the program? AUDIENCE: You need to include the variable for s. DAVID MALAN: Yeah. So I need to include s, which I got on line 3, but didn't thereafter use in any way. So this is going to be wrong, of course, because that's going to say, literally, hello s. This is kind of how we used to do it. And then we would put in s. But this is not printf. This is print. So the world is a little different. And it turns out we can do this in a couple of different ways. Perhaps the easiest, if least obvious, would be something like this, where I could simply say hello, open curly brace, close curly brace. And then inside of there, simply specify the name of the variable that I want to plug in. And that's not quite all the way there. Let me go ahead and run this once more. Now if I type in Maria's name, oh. Still not quite right. I need to actually tell Python that this is a special type of string. It's a formatted string, similar in spirit to what printf expected. And the way you do this, even though it's a little different from C, is you just say f. This is an f string. So literally before the quotes, you write the letter f. And then if I now run this program here, i'm going to actually see Maria's name as hello, Maria. And I'll take care of that red X later. So that's a format string. And there's one other way. And this is not very obvious, I would say. You might also see in online documentation something like this. And let's just tease this apart for just a second. It turns out in Python that what I've highlighted in green here is known as a string, otherwise known as a str. str is the name of this data type. Well, unlike in C, where string was kind of a white lie, where it was just a pointer at the end of the day, a string is actually a first-class object in Python, which means it's not just a sequence of characters. It has built-in functionality, built-in features. So much like a struct in C had multiple things inside of it, so does a string in Python have multiple things inside of it, not just the sequence of characters, but functions that can actually do things. And it turns out you access those functions by way of the same dot operator as in C. And then you would only know from the documentation or examples in class what functions are inside of the string object. But one of them is format. And that's just a function that takes an argument-- what do you want to plug into the string to the left of the dot? And so simply by specifying, hey, Python, here's a string with a placeholder. Inside of this string is a built-in function-- otherwise known as a method, when a function is inside some object or structure-- pass in the value s. So if I now go ahead and rerun this after saving my changes, I should now see that Maria's name is still plugged in. So that's it. But a simple idea that now even strings have things inside of them besides the characters alone, and you can access that via the dots. So let's go ahead now and ramp things up to a more familiar example from a while back. Let me go ahead and open up two side-by-side windows and see if we can't translate one to the other. I'm going to go ahead and open up, for instance, int.c from some time ago. So you might recall from int.c, we had this program, whose purpose in life was to get an integer from the user and actually now plug it into printf, and then print it out. So what's going to be different now in Python? Well in Python, if I go ahead and implement this as, say, int.py, I'm going to go ahead and do the following. Let me scroll down to kind of line things up roughly. I can go ahead and say def main, as I saw in the slides before. And then over here, I can say i gets get_int, quote, unquote, integer. And then down here, I'm going to say not printf but print, quote, unquote, "hello," and then the placeholder. What's the simplest way to do this now, per our past example? Curly brace i. And then I just need to be super clear this is a special f string or format string, into which you can plug in values. And now I'm going to go ahead and save that. And I've got most of the pieces together, ignoring, for now, the red X. So what more remains to be done? I've made one same mistake as before. Yeah, so the get_int. so up here, really the equivalent of line 3 would be from cs50 import get_int this time. Saving that. And now if in my terminal window I go ahead and run python of int.py-- hmm. That seems strange. It's not an error, in terms of, like, erroneous output. Just nothing happened. So why might this be? How might you go about troubleshooting this, even with very little Python under your belt? Was that a hand, or no? No? OK. Yeah? AUDIENCE: Is there a line break? DAVID MALAN: Is there a line break? That's OK. I was just doing that to kind of make everything line up. But it's no big deal. Everything's indented properly, which is the important aesthetic. Yeah. AUDIENCE: We didn't call the function. DAVID MALAN: We didn't call the function. And this is where Python's a little different from C. In C, recall, main just gets called automatically for you. Humans years ago decided that shall be the default name of a function. In Python, line 6 here, calling something main is just a convention. I could have called it foo or bar or any other word. It has no special meaning. And so in Python, if you want to actually call main, you need to do something, frankly, that's, I think, one of the stupider distractions early on. But you have to literally say this-- if the name of this file happens to equal something that's specially called main, then call main. So long story short, when you run the Python interpreter on a file, as we've been doing with python, space, int.py or hello.py, there is a special global variable that your program has access to called __name__. And if that default name happens to be __main__, then you know that you have the ability to call any function you want by default. So for now, much like we did in week one, where we glossed over certain details that just weren't all that interesting, lines 11 and 12, for now, let's consider not all that interesting. But it's how we're going to kick-start these programs. Because now if I run python, space, int.py, type in a great number-- hello, 42. That's the meaning of life, the universe, and everything. So let's now actually do something more powerful than just getting a single int from the user. Let me go ahead and close off this one and close off this one and open up, say, ints.c after splitting my window again into two windows here. And let's open ints.c. So this one was a little different in that we did some arithmetic. And so here is going to be another difference in Python. Here's what we did in C. And what was curious or worth noting about math in C? Which of these did not quite behave as you might expect in the real world? Division? Yeah, why? What did division do? Yeah, it chopped off or rounded down. It floored the value by throwing away everything after the decimal point. So this line here, 18, where it's such-and-such divided by such-and-such is such-and-such. And we literally just said x divided by y. If you divided, for instance, 1 divided by 2 in grade school, hopefully, you would get the value 1/2 or 0.5. But in C, what did we get instead? AUDIENCE: Zero. DAVID MALAN: Zero. So it gets truncated to an int, the closest int without a decimal point being 0 because 0.5 is really 0.5. And thus we had that effect. So in Python, things are going to be similar in spirit. But this is kind of a feature that was fixed or a bug that was fixed. In Python-- let me go ahead here and open up an example I wrote in advance called ints.py, which is actually now going to look like this. So the Python equivalent now, which I've roughly line up, looks a little different. And there's a few distractions because we have all these f strings. Now in the way. But notice I'm just plugging in x's and y's. But what's a new feature, apparently, in Python, arithmetically? So floor division. So this was the more proper term for what C has been doing all this time. In C, when you use use the slash and you divide one number by another, it divides, and then floors it to the nearest int. In Python, if you want that same old-school feature, you're going to now use slash slash, not to be confused with the C comment. And if you want division to work the way you always knew it did in grade school, you continue using just the slash. So a minor point, but one of the differences to keep in mind. So if we actually run this here in Python, if I go into source 8 today and our week's directory for week 1, and I run Python ints.py, here now we're going to see 1 and 2. And there's all of the values that we would expect to see. All right. So without dwelling too much on this, let's fast forward to something more powerful like conditions. So in Python, if we want to do something only conditionally, laying out my browser like this, let me go ahead and open up, let's say, conditions.py. Sorry, conditions.c, which once upon a time looked like this. So in this example here, notice that we have a program that gets two ints from the user, and then just compares x and y and x and y and prints out whether they're greater than, less than, or equal to, ultimately. So let's actually do this one from scratch over here on the right. So let me go ahead and save this as conditions.py. And then at the top, what's the very first thing I'm going to apparently now need? Yeah, so the CS50 library. So from cs50 import-- it looks like get_int is the one we want this time. Now, how do I go about getting an int? Or what's the translation of line 9 on the left to the right-hand side of the screen? x equals get_into of the same prompt. OK, what comes next, if I line it up roughly here? y gets get_int of quote, unquote, y. And what's down here? The condition. So if x less than y? No parentheses are necessary. It's not wrong to put them, but it's unnecessary. And now enters a word into our terminology-- it's not Pythonic, so to speak. If you don't need them, don't put them. So if x is indeed less than y, what do we want to do? We want to print x is less than y, yes? No. OK. All right, good. So else if x-- OK, good. Right. So, kind of goofily, elif, then go ahead and print out x is greater than y. And as an aside, I actually did that accidentally. But it turns out in Python, too, you can use double quotes or single quotes. Either is fine, whereas in C, single quotes had a very specific meaning, which meant what? AUDIENCE: Char. DAVID MALAN: Char. So single characters. And double quotes meant strings, sequence of characters, which meant zero or more characters, followed by backslash 0. In Python, all of that is gone. Single quotes and double quotes are equivalent. I'll almost always use double quotes, just for consistency, as should you, for consistency, within your own files. But sometimes it's useful to drop into one or the other if you nest, for instance, quote marks, as you might have once in a while in C OK. So finally, else print out x is equal to y. So it's cleaner. And frankly, I don't need all this whitespace. So let's go ahead and just make this a little tighter still. You can see that in 11 lines, we've now done what took 27 or so last time. But I have omitted something, to be fair. What did I omit? Yeah, I didn't do that whole calling of function thing. There's no mention of main. And it actually turns out that's not strictly necessary in Python. If you're going to be interpreting a file that contains Python code, and it's a simple enough program that you don't really need to factor code out and organize it into separate functions, then don't. If this is what would now be called a command-line script, a program that just has lines of code that you can execute, literally, at the prompt. So if I go into this directory and run python of conditions.py, Enter. x will be 1. y will be 2. x is indeed less than y. And that's it. I don't need to bother doing all of this, as I proposed earlier. def main, and then I could go in here. And if you've never known this, and now it's useful, especially, for Python, you can highlight lines and just tab them all at once. I could do this, but then I would need this thing, which I'd probably have to go look up how to remember it, if you're doing it for the first time. There's just no value in this case to doing that. But at least it can be there as needed. So let me go ahead and undo that. And we're back to a porting of one to the other. All right. So that might then be conditions. And let's see if we can't-- noswitch there. Let's take a look at this one. Let me open up, rather than comparing all of them side-by-side, let me just open up this one now called noswitch.py, which is reminiscent of a program we ran some time ago called noswitch.c. And you can perhaps infer what this does from the comments alone. What does this program do in English? Because logical operators is not all that explicit at top. What's that? Yeah. So if you've ever interacted with a program that asked you for a prompt, yes or no, here's some code with which you might implement it. And we could do this in C. We're just comparing characters here. But there's a few differences if you kind of now think back to how you might implement this in C, even if you don't recall this specific program. I'm importing my library right up here. I'm then calling get_char this time, which is also in CS50's library for Python. And then notice there's just a couple of things different down here syntactically. Besides the colons and the indentation and such, what else is noteworthy? Yeah. Yeah. Thank god, you can just say more of what you mean now. If you want to do something or something, you literally say or. And if we were instead-- albeit nonsensically here-- trying to do the conjunction of two things, this and that, you could literally say and. So instead of the two vertical bars or the two ampersands, here's another slight difference in Python. Let's now take a look at another example reminiscent of ones past, this one called return.py. So here's an example where it's actually more compelling to have a main function because now I'm going to start organizing my code into different functions still. So up here, we are importing the get_int function from CS50 library. Here I have my main function just saying x gets get_int. And then print out the square of x. So how do you go about defining your own custom function in Python that's not just main? Well, here on line 11 is how I would define a function called square-- that takes, apparently, an argument called n, though I could call this anything I want-- colon, return, n, star star, 2. So a few new features here. But again, it's no big deal once you just kind of look these features up in a manual or in a class. What is star star probably doing? AUDIENCE: Square root. DAVID MALAN: Not square root. The power of, yeah. So n star star 2 is just n raised to the power of 2. That was not a feature we had in C. So now we get this in Python. And what's this line 12 in green with the weird use of double quotes? Yeah, it's a comment. And it's a different type of comment than we've seen before because in my previous example, I did have a few comments. Recall that just a moment ago, in conditions.py, we had a whole bunch of comments. Prompt the user for x. Prompt the user for y. Compare x and y. So whereas in C we were using slash slash, Python, unfortunately, uses that for floor division, so to speak. So we instead just use the hashtag or the pound sign to specify a line that should be thought of as a comment. But here is something a little different. And we won't dwell too much on this for now. But Python has different types of comments, one of which is this. This is technically called a docstring or document string. And what's nice about Python, as well as languages like Java and others still, is that you can put comments in your code that special programs can read, and then generate documentation for you. So if you ever took AP CS and you ever saw a Javadoc, this was a way of commenting your methods and your code in Java using funky @ signs and other syntax so that if you ran a special command, it could generate a user's manual for all of your functions and tell you or colleagues or friends or teachers exactly what all your functions are, what their arguments are, what their return values are, and all of that. Similarly in Python can you use these funky quote quote quote docstrings to document your function. So whereas in C our style has been to put quotes above the functions, in Python it's going to be to put them as the first line inside and indented within the function. All right. So now let's actually try to port a program from code again, thinking back on week one in C when we had this program here. So there's quite a bit going-- oops, spoiler. Don't look at that. Hopefully, that didn't sink in just yet. So in week one, we had this program in C, get_positive_int. And its purpose in life was to write a program that gets a positive integer from the user, in and of itself not all that interesting. But it was an opportunity to introduce a few things. One, we introduced this line 6 several weeks ago, which is known as a prototype. And what was the purpose of having that function prototype up there? Yeah, you declare the function, but why? Because it's already implemented down here on line 15. AUDIENCE: The way the program runs, it needs to be in order or something like that. DAVID MALAN: Yeah. Because of the way the program's run, and frankly because of how sort of naive or dumb that clang is by design, it does not know that a function exists until it actually sees it. So the problem is that if in C, you have main, inside of which is a call to function like get_positive_int, but it's not implemented until a few lines later, clang is going to be dumb and just not know that it even exists. And it's not going to compile your code. So this prototype, as we called it, is kind of a teaser, a hint, that doesn't implement the whole function. It just shows the compiler its return type and its types and order of parameters so that that's enough information to then just trust that if I just blindly compile main, eventually I'm going to see the actual implementation of the function. So I can compile its bits, as well. So in here, in C, we called get_positive_int, and then we passed in a prompt. We stored it in a variable calle i, and then printed it out. And then to implement this, we used kind of a familiar construct that you've used in other programs. Pretty much anytime you want to prompt the user for input, and you want to keep pestering him or her until they cooperate with whatever your conditions are, you would use the so-called do-while loop. And because the do-while loop, recall, is distinct from the while loop how? AUDIENCE: It runs at least once. DAVID MALAN: It runs at least once, which just kind of makes intuitive sense if you want to prompt the user for something. And then if he or she doesn't cooperate, only then do you want to prompt them again. By contrast with a while loop, it's going to happen again and again and again no matter what, from the get-go. So let's see if we can't now convert this or port this, as people would say, to Python. So here I'm going to go ahead and save a new file called positive.py. And I'm going to go ahead and do everything here in main, as before. So I'm going to ahead and do, let's say, from cs50 import get_int, because I do need that. And then I'm going to go ahead and have my main method here. And then inside of main, just like on the left-hand side, I'm going to do i gets get_positive_int-- positive integer, please. It's going to wrap a little bit now. That's fine. And then I'm going to go ahead and print this, which, recall, is just print an f string where the placeholder is i, although, frankly, this is kind of stupid, to just create a string that has nothing other than the value we want to print. Nicely enough in Python, just print what you want. And so that simplifies that argument. So now it remains to implement get_positive_int, which is going to take some kind of prompt as its input. And notice I'm not specifying the data type of prompt, which is string. I'm not specifying the return type of this function. But both actually do exist underneath the hood. So in the past, to get a variable, I would do something like this, semicolon. But I know I don't need the semicolon. I know I don't need the data type. And this just looks stupid to just put a variable there to need it. You don't need to do this in Python. If you want to use a variable, just start using it. And unfortunately, whereas almost every other feature we've seen in Python thus far kind of maps directly back to a feature in C, Python does not have a do-while. So it has the for-in, and it has while. And maybe it has other things we haven't told you about. But it doesn't have do-while. So knowing that, and knowing only what we've presented thus far, how do we still go about getting an int from the user and ensuring it's positive and reprompting him or her if and only if it's not? Put another way, how would you do this in C if we took away from you the do-while construct? Exclamation points? OK. So we could invert something, maybe, using that logically. AUDIENCE: You can just do a while loop. DAVID MALAN: We could just use a while loop. How? AUDIENCE: So while prompt is less than 1. DAVID MALAN: So while prompt is-- OK, so the prompt is the string we're going to display to the user. So it's not prompt, I think. So maybe i or n, to be consistent with the other side. So you know what? Why don't I-- what about this? What if I just do-- you know what? I know I need a loop. This is by far the easiest way to just get a loop, right? It's infinite, which is not good. But I can break out of loops, recall. So what if I do something like this? What if I do n gets get_int, passing in the same prompt? And then what do I want to do next? I'm inside of an infinite loop. So this is going to keep happening, keep happening, keep happening until-- is positive? So python's not quite that user-friendly. We can't just say that. But we can say what? AUDIENCE: Greater than 1. DAVID MALAN: Greater than-- close. AUDIENCE: Equal to. DAVID MALAN: OK, that's fine. Greater than or equal to one. Then what do we want to do? Break. So it's not quite as cool as, like, a do-while loop, which kind of gives us all these features, though frankly, this was never that pretty, right? Especially the fact that you had to deal with the issue of scope by putting the variable outside. So in Python, the right way to do this would be something like this. Just induce an infinite loop, but make sure you break out of it logically when it's appropriate to do so. And so now if I go ahead and add in that last thing that I keep needing-- so if name equals main, and it's always find to copy-paste something like that, call main. Let me go ahead now and in my terminal window run python of positive.py. And let me go ahead and give it negative 5. How about negative 1? How about 0? Whoops. How about that? How about 0? 1? Hmm. I screwed up. None is interesting. It's kind of our new null, so to speak. But whereas in C, null can, potentially, if used in the wrong way, crash your program, Python might just print it, apparently. Where did I screw up? Yeah, so I didn't return an actual value. And whereas clang might have noticed something like this, Python, the interpreter's not going to be as vigilant when it comes to figuring out if your code is missing something. Because after all, we never said we were going to return anything. And so we don't strictly need to. So what could I instead do here instead of break? I could just return n here. Or I could equivalently do this, and then just make sure I return n here. And another difference in Python, too, is that the issue of scope isn't quite as difficult as it was in C. As soon as I've declared n to exist up here, it now exists down below. So even though it was declared inside of this indentation, it is not scoped to that while loop alone. So either way could we actually make this work. OK, so now let's try to run this again. Positive integer. Negative 1. 0. 1. And now we're actually seeing the number 1. All right. Let me pause here for just a moment and see if there's any questions. No? Yes. AUDIENCE: Do you to call things from the CS50 library individually, or can you just import the entire library? DAVID MALAN: Ah, good question. Do you have to call things inside of the CS50 library individually, or can you import the whole thing? You can technically import the whole thing as follows. If you want access to everything in the CS50 library, you can literally say star. And a star in programming-- well, in many computer contexts, star generally is a wildcard character. And it means anything that matches this string here. This is generally considered bad practice, though. Because if CS50 staff happens to give you functionality or variables that you don't want, you have now just imported into your namespace, so to speak, all of those functions. So for instance, if the CS50 library had public inside of it a variable called x and y and z in addition to functions like get_string and get_int and get_char, your program is now seeing variables x and y and z. And if you have your own variables called x and y and z, you're going to shadow those variables inside ours. And it just gets messy quickly. So generally, you want to be a little more nitpicky and just import what you want. Or, another convention in Python is to not specify it like this, but instead to do import CS50. This does not have the same effect of importing all of those keywords like get_int and get_string into your program's namespace, like the list of symbols you can actually type in. But what you then have to do is this-- you have to now prefix any usages of the functions in that library with the now familiar or more familiar dot operator. So this is just a stylistic decision now. I have consciously chosen the other approach so that initially, you can just call get_int, get_string, just like we did in C. But technically and probably more conventionally would people do this to make super clear this isn't my get_int method. It's CS50's get_int function. OK. Other questions? Yeah. AUDIENCE: Is it good coding practice to do the if __name__ or just-- because you can run hello, world without defining main. Do you really need to do-- DAVID MALAN: Oh, it's a good question. Short answer, no. So I'm showing you this way because you'll see this in various examples online and in programs that you might look at that are open source. Strictly speaking, this is not necessary. If you end up making your own library, this tends to be a useful feature. But otherwise, I could equivalently do this, which is perfectly fine as well. I can still define get_positive int. I can get rid of main altogether. And I can just now do this. So this program is equivalent and just as fine for now. OK. So with that said, let's do a couple of more examples here to kind of paint a picture of some of the things that are similar and different. And let's go ahead and open up, for instance, overflow.c from some weeks ago, splitting our windows again. And then on the right-hand side, let me open up something called overflow.py, which I put together in advance. So here we have on the left an example of integer overflow, whereby if I start counting at 1, and then don't even have a condition, and I just keep multiplying i by 2, by 2, by 2, doubling it, doubling it, doubling it, doubling it, we know from C that bad things happen if you just kind of keep incrementing something without any boundary in sight. So this program is just going to print out each of those values, and it's going to sleep one second in between. Same program in Python looks pretty similar. But notice I'm initializing i to 1, doing the following forever-- printing out i, multiplying i by 2, and then sleeping for one second. But sleep is also not built into Python in the way that print is. Notice what I had to include up here. And I wasn't sure what that was. And so honestly, just a few days ago, I googled, like, "sleep one second Python," saw that there's this time library, inside of which is a sleep function. And that's how I knew which library to actually include. And so just as there are man pages for C, there's a whole documentation website for Python that has all of this information, as well. So let me go ahead and do this. And let me actually try to create two windows here. What's the best way for me to do this? Split one to two. OK. So let's do this, just so I can run this in the same place. So if I go into my source-- [POPPING NOISE] Jeez. My source 8 directory, and I go into weeks and one, and I make overflow-- nope, sorry. Week one. OK. So if I go into source one, and I do make overflow, which is kind of cute semantically, I'm now going to be able to run a program called overflow. Meanwhile, over here, let me go ahead and split this window, too. Dammit, not there. Let's put this over here. Oh, no! OK. One second. Sorry. Overflow.py. OK. So now we're-- oh, now I lost the other window. Oh, that's cool. OK. So let's do this. OK. Now I know how to use the IDE. All right. So on the left-hand side, I'm about to run overflow. And then lastly, without generating that beep again, I'm going to go in here. And I'm about to run python of overflow.py. All right. And so the left will run the C version. The right will run the Python version. And we'll start to see-- no pun intended-- what happens with these programs. Oh, damn it. I got to scroll. OK, so I'll just keep scrolling for us. This is fun. OK. OK. Next time, Google how to sleep for half a second instead. OK. So there we go. Something bad has happened here. And now C is just completely choking. Things are in a funky state. So what happened on the left, before the answer scrolls away? Integer overflow, right? We had so many bits becoming ones, that eventually, it was mistaken for a negative number temporarily. And then the whole thing just kind of got confused and became permanently zeros. Whereas on the right-hand side, like, yeah, Python. Look at you go. Still counting higher and higher and higher. And even though we haven't talked about the underlying representation of these types in Python, what can we infer from the apparent better correctness of the version on the right in Python? It's not an eight-bit representation. And even C, to be fair, uses 32 bits for its ints. And that's what we got as high as 2 billion or 4 billion in total. But same idea. How many bits must Python be using? AUDIENCE: 64? DAVID MALAN: Yeah, maybe 64. I don't know exactly. But I know it's not 32 because it's keep counting up and up and up. And so this is another feature of Python. Whereas int in C has typically been for us 32 bits-- although that is technically machine-specific-- Python integers are now going to be 64, which just means we can do much bigger math, which is great for various data-science applications and stats and whatnot, where you actually might have some large data sets to deal with. Unfortunately, we still have some issues of imprecision. Let me go ahead and close a whole bunch of these windows and go ahead and open up, for instance, just this one here. OK. No, I'm going to skip this and do something slightly more fun, which is this. So in Python here, let's do a quick warm-up. This is going to print for me what? AUDIENCE: Four question marks. DAVID MALAN: 4 question marks, right? And this is reminiscent-- this is like a really cheap version of "Super Mario Bros." And if you think back to week one, where we explored this, there was a screenshot I had of "Super Mario Bros," one of the worlds where we just had four question marks which Mario could hit his head against to actually generate a coin. So we stepped up from there in C to do this instead. And this is going to give us another feature. But let's see if we can't start to infer from context what these programs do. Here's another one, mario1. What's this do? It's using a loop, for sure. And it's using how many iterations, apparently? Four. So from 0 to 1 to 2 to 3, total. Each time, it's going to print out, apparently, a question mark. But now, just infer from this-- I haven't answered this question already-- what else is going on line 4 and why? AUDIENCE: It's not going to a new line. DAVID MALAN: Not going to a new line, right? So there's always this trade-off in programming and CS more generally. Like, yay, we took away the backslash n, which was annoying to type. But now if it's always there, how do you turn it off? So this is one way to do that, and it also reveals another fundamental feature of Python. Notice that print apparently takes, in this case, more than one argument. The first is a string-- literally quote, unquote, and a question mark. The second is a little funkier. It's like a word, end. It's then an equal sign, and then it's a quote mark. So what is this here? So it turns out Python supports what are called named parameters. So in C, any parameters you pass to a function are defined, ultimately, by way of their order. Because even if a function takes arguments that have names, like x and y or a and b or whatever, when you call the function, you do not mention those names. You know they exist, and that's how you think about them in the documentation or in the original code. But you don't name the arguments as you pass them in and call a function. You instead pass them in in the appropriate order per the man page or per the documentation. In Python, you can actually be a little more flexible. If a function takes multiple arguments, all of which have names, you can actually mention the names explicitly, thereby freeing you from the minor inconvenience of having to remember and always get right the actual order of arguments. So in this case, print apparently takes at least two arguments in this case, one of which is called end. And if you want to use that one, which is clearly optional because I haven't used it yet, you can literally mention it by name, set an equal sign, and then specify the value that you want to pass in. So if I actually now go into this and go into weeks and 1 and do python of mario1.py, I'll still get-- in week two. If I get mario1.py, I still get four question marks. But that's the result of printing this with a line ending of quote, unquote. If I do this, meanwhile, it's a little stupid because I'm going to get that for free if I just omit it altogether. But now I get four question marks here. And if you really want to be funky, you can do something like this, which is just going to be taken literally to give you that instead. Unclear utility of taking this approach. But that's all-- [POPPING NOISE] Sorry-- that's going on. Let's take a look at mario2. This one works a little differently, as well. And how would you describe the feature offered by this version of mario? Prints any number of question marks perfectly. So it's parameterized by first getting and int from the user using CS50's get_int function. And now I'm iterating from i to the range of n, whatever that is, and then actually printing out the question marks. Meanwhile, in mario3.py, a little fancier still. But what am I doing a little better now? AUDIENCE: You're making sure that the n is positive. DAVID MALAN: Yeah, I'm just making sure that the n is positive. So I didn't bother implementing a whole function called, like, get_positive_int. I don't need that. This is a super-short program. I'm just using the same logic up here-- inducing, deliberately, an infinite loop, breaking out of it only when I've gotten back a positive integer, and then printing out that many of hashtags, reminiscent of the bricks in Mario. And then lastly, we have this slightly more sophisticated version that actually prints out a different shape altogether. You can infer from the comments, but focus more on why. So this first line 12 iterates from i to n, whatever n is that the user typed in. Meanwhile, line 15, indented, iterates from j from 0 up to n, as well. So this is kind of like our canonical for int i gets 0, dot, dot, dot, for int j get 0, dot, dot, dot, where we've had nested loops in the past. So notice now that we have this building block, which is a line of code or kind of conceptually just a Scratch piece. We can embed one inside of the other. Here I can print out a hashtag, making sure not to put a new line after every single hashtag I print out, only printing out a new line on line 17, on each iteration of the outer loop. And now notice whereas in C we would have done this historically-- and that's fine-- in Python, we don't need the f. And we also don't need the backslash n. End So ergo, you can simply do print, and you'll get, if nothing else, a backslash n automatically. So that now when I run this version of Mario, we now get something more interesting. And I'll increase the size of my terminal window for this so that I can enter a positive number like this and print 10. And now we've got a whole block. So that was a lot. Let's go ahead and take our five-minute break here. And when we come back, we'll look at some more sophisticated examples still. All right. So let's begin to start to transition to actually solving problems with Python after introducing just a couple of additional features that aren't so much syntactic but actual features of the language. So here on the left was an old program we wrote in week three called argv0.c. And its purpose in life was simply to allow you to run a command-line argument for the very first time. And that was a nice tool to have in our toolkit. So how might we go ahead and map this? Well, we actually need to know how Python works a little bit differently as follows. If I go ahead and open a new file called-- let's call it argv0.py. I'm going to go ahead and translate this, just as we did earlier. So I'm going to go ahead and want to use the following. So if argc-- so there is no argc. And actually, so def main-- there was also no argc or argv. And it's not actually correct to do this and this, as you might assume. It turns out that the feature of command-line arguments are provided by a Python package, so to speak, or a library, much like the CS50 library is a package that you can import in Python speak. So to do this, I actually need to do this-- import sys, which gives me access to a whole bunch of system-related stuff, like what the user has typed at the command prompt. And if I want to check if the number of words that the human typed at the prompt is two, I actually am going to do this-- if the length of sys.argv equals 2, then I'm going to go ahead and print out quote, unquote, "hello," and then a placeholder here. I know for placeholders, I need to turn this into a formatted string, so an f string there. And now inside of the curly braces, it turns out I can do sys.argv[1]. So it's a little different from before. But notice I'm borrowing almost all the same ideas as earlier, including how we're printing out strings. And even though this is a little more verbose, what is between these two curly braces? Well it's the result of looking in the system package, which has a variable called argv, for argument vector, just like in C. It is itself an array, AKA a list in Python. And here we have the result of indexing into element one of that list. And the way that I have access to this is because I've imported that whole package. So if on the right-hand side here I go ahead, after saving that file, and I do python of argv0.py, I see nothing. But if I actually say, like, my name here, I see, "hello, David." So a very similar program, but implemented a little differently. And you'll notice, too, that the length of an array, henceforth known as a list, is not something that you yourself have to remember or keep around. You can just ask a list how long it is by calling the len-- or L-E-N for length-- function, passing it in as an argument. So that's one of the takeaways there. And if we actually want to do something a little more clever, like print out all of the strings in argv, well, back in the day in see, you might recall this example-- argv1.c, wherein I had this for loop. And I iterated from zero on up to argc, the argument count, printing out each of the arguments in that vector. Python actually makes even something like this even simpler. Let me go ahead and create a new file here. And I'll call this, say, argv1.py. And it turns out in Python, I can similarly just import sys, and then do, honestly, for s in sys.argv, print s. Done. So again, kind of just says what it means. So I've imported the system library. sys.argv I know to be a list, apparently, of command-line arguments. For something in something is a new syntax we have for for loops. So for some variable s inside of this list, go ahead and print it. And so it's a much cleaner, much more succinct way of honestly getting rid of all of the complexity of this by just saying instead what we mean. Meanwhile, if I wanted to print out every character, I can take this one step further. So back in the day in C if I wanted to print out every command line argument and every character therein I could do this. I just need a couple of nested loops, wherein via the outer loop, I iterate over all of the arguments passed in. And on the inner loop, I iterate over the current string length of whatever argument I'm printing. And this had the effect of printing out all of the command-line arguments' letters one at a time. I can do this in Python, honestly, so much easier. So let me go over here. Let me create a new file called argv2.py. Let me import sys, as I did. So import sys. And then for s in sys.argv, for c in s, print c. Done. So what is this doing? Gone is all of that overhead of for int i and for int j and so forth. For s in sys.argv iterates over all of the elements of that list, one string at a time. For c in s is a little different because s is technically a string or a str object, as we're going to start calling it. But at the end of the day, a string is just a sequence of characters. And it turns out Python supports, out of the box, the ability to use a for loop even to iterate over all of the characters in a string. And so c-- I just mean char. So for c in s, that gives me each of the characters. So now at the end here, if I go ahead and run python of argv2.py with nothing, I get just the program's name because that's, of course, the very first thing in argv, as in C. And if I write, say, a word like "Maria" here, I get argv2.py, Maria, all in one long column because of the additional prints that are happening and the implicit new lines. So any questions before we proceed on this use of a package called sys using these functions therein? All right. So let me skip ahead, then, to something slightly familiar, too. Let me go ahead-- and you might recall initials.c from some time ago, wherein we accepted as input to get_string a user's name, and then we printed out their initials. So let's go ahead and do that. So from CS50, let me go ahead and import get_string. Then let me go ahead and say, get me a string. And I want the user to be prompted for their name, as we might do here. Then let me go ahead and say, all right, their initials-- I don't know what they are yet. So let me just initialize an empty string. But then do this-- for c in s, which is for each character in the person's name, if-- and I don't know how to say this yet. If c is an uppercase letter, then go ahead and append c to initials. And then down here, print initials. So I've left a couple of blanks. That's just pseudocode for the moment. But this line 5, just to be clear, is doing what for me? What is being iterated over? The string. So for each character in the string, for c in s, I'm going to ask two questions. So in C, we did this in a couple of different ways. We can actually do it with inequality checks and actually considering what the underlying ASCII values are. The ctype library had that isupper function and islower that we use. Well, it turns out because c is itself not a char, there is no such thing, technically, as a char in Python. You have only strings of length 1. And this is why single quotes no longer have any special meaning. It turns out c is technically just a one-character string. Strings are what we've started calling objects, which is a fancier name for struct. So inside of an object like a string is functionality. And we saw one piece of functionality earlier, which was what? Not length, though that is another one. It was format. We saw it briefly. But when I did the string.format, I proposed that there's actually built-in functionality to a string called format. Well, you know what? It turns out there is a method or function inside of the string class also called isupper. And I can ask the very string I'm looking at that question by saying if c.isupper is true, then go ahead and append c to initials. So in C, if initials were technically a string, how could you go about appending another character to a string in C? AUDIENCE: c.append? DAVID MALAN: Not in C. In C. So in C, the language. OK, so what's a string in C? A string in C is a sequence of characters, the last one of which is backslash 0. All right. So it's an array of characters, last of which is backslash 0. So if I, for instance, typed in my first name, "David," and now I want to append "Malan" to the end of it, how do I do that in C? AUDIENCE: [INAUDIBLE] DAVID MALAN: Exactly. It's like an utter pain in the neck. You have to create a new array that's bigger, that can fit both words, copy the "David" into the new array, then copy the last name in, then put the null terminator at the new array, then free, probably, the original memory. I mean, it's a ridiculous number of hoops to jump through. And you've done this on occasion, especially for things like, perhaps, problem set five. But my god, we're kind of past that. Just go ahead and append to the array the character you care about. So in this case, not an array, but a list. Sorry, not an array but a string object that's initially blank. It turns out that Python supports this syntax-- plus equals typically means arithmetic and add one number to another. But it also means append. So you can simply append to initials by doing plus equals c, one additional character. So even though the string starts like this, and this big in memory, it's then going to grow for one character, grow, grow, grow, grow, until it has all of the user's initials. And as for where that memory is coming from, who cares? This is the point that we're now past. You leave it to the language. You leave it to the computer to start to manage those details. And yes, if it needs to call malloc, fine. Do it. Don't bother me with that detail. We can now start thinking and writing code sort of conceptually at this level, instead of at this level. So again, we're sort of abstracting away what a string even is and leaving it to the language itself. So if I now go ahead and run python of initials.py and type in, for instance, "Maria Zlatkova" here, with a capital M and a capital Z, I then see her names because I've plucked out the middle initials. And if we do something else, like "David J. Malan," even with a period in there, it infers from the capitalization what my initials should actually be. So again, a much tighter way of doing things. Let me go ahead and now open up another example we didn't see a few weeks ago, though it was included in some of our distribution code, if you wanted to look. Some weeks ago, we had this program among the distribution code, where I declared an array of strings called book. And I proposed that there were these several names in the phone book, so to speak-- all of the past instructors of CS50 sorted alphabetically. And then down below in this C program, I used that global variable called book to implement, it seems, linear search. And to implement linear search in C, I'm going to need, of course, a loop to iterate over all of the strings. This line 26 does exactly that. I then in C, recall, had to use str compare because remember we tripped over this issue early on where you can't just compare two strings in C because you'd be comparing, accidentally, their addresses, their pointers, not the actual value. So we used str compare. And I could pass in the name that I'm looking for in the i'th book one at a time, checking for equals zero. And then I can call Mike or David or whoever I'm trying to call, or just quit if the user isn't found. So what did this program actually do? If I go into this example, which, again, was from weeks 3, and I do make linear-- nope, not that. Wrong directory again. If I go into source3 and make linear, this program is supposed to behave as follows. So if I go ahead and run ./linear, look for our old friend Smith, it found Smith. If I go ahead and search for, say, Jones, who did not previously teach CS50, it says "quitting." All right. So meanwhile, in Python, bless its heart, we can get rid of all of that. And in our source8 directory here and our subdirectory 3, let me go ahead and open this instead. In Python, I can declare an array, otherwise known as a list, almost in the same way. But what's different, just to be super clear? AUDIENCE: Brackets? DAVID MALAN: So the brackets are now square brackets instead of curly braces. And frankly, unless you statically initialized an array in C-- like hardcoded the values for your array in C-- you might not have even known you could use curly braces. So that's not a huge deal here. But in Python, square brackets here and here represent a list of elements, literally. And what else is different? Didn't declare the size of the array. And I technically don't have to do that in C, either, if you're hardcoding all of the values all at once. But there is something missing on line 7. AUDIENCE: Type. DAVID MALAN: Sorry? AUDIENCE: The type? DAVID MALAN: The type. I didn't specify string. But otherwise, this is pretty similar to what we've done in C. But what's beautiful here-- and let me go ahead and hide that for just a second. Let me go ahead and prompt the user for his or her name. So let's ask for the name here. And then if I want to search the book, which is just a list of names, how do I implement linear search? Well I could just do if name in book, print "Calling name." And Let's make this an f string. And then down here, that's it. So that's how you implement linear search in Python. You don't need a loop. You can just ask the question yourself. So if book is a list, and name is the string that you're looking for, just ask the language to figure this out for you. If name in book is the syntax you can use to ask literally that question. And then Python will use, probably, linear search over that list because it doesn't necessarily know it's sorted, even though it happens to be alphabetically. But it will find it for you, thereby saving us a lot of the complexity and time of having had to implement that ourselves. Meanwhile, if I want to compare two strings, let me propose this-- let me write a quick program here, compare1.py. And let me go ahead and from CS50 import get_string, as before. And now let me go ahead and get one string that I'll call s. And let me get another string that I shall call t, just as we did a few weeks ago. And now in C, this was buggy, right? If I print same, else I print different. So in C, just to be super clear, why was this incorrect, this general idea of using equals equals? Yeah, they're comparing addresses, right? This was like the day before we peeled back the layer of what a string actually is. And it turns out that s and t in C were char stars or addresses, which means certainly if you get two different strings, even if you've typed the same characters, you're going to be comparing two different addresses. They're not going to be the same. Now, you can perhaps infer from the theme of today-- what is Python going to do if asked if s and t are equal? It's gonna answer that question as you would expect as the human. Equals equals now, in Python, is going to compare s and t, look at their actual values because they are strings, and return same if you literally typed the same words. So in here, if I go in here and I do python of compare1.py, and I type in, for instance, "Maria," and then I type in "Maria," they're indeed the same. If I type in "Maria" and, say, "Stelios," they're different because it's actually now comparing the strings, as we would have hoped some time ago. So let's take a look at another that kind of led to some interesting quandaries. You might recall in week four, we had this example in C-- noswap, so named because this just did not work. It was logically seemingly correct. But swap did not actually swap x and y, but it did swap a and b. Why? AUDIENCE: The memory locations? DAVID MALAN: The memory locations were different, right? So x and y, recall, are variables in C that exist in a certain slice of memory that we called a frame on the stack, main's frame on the stack. Meanwhile, a and b are from a slightly different location in memory. We sort of kept drawing it slightly above, like a tray at the dining hall on the so-called stack. a and b had the same values of x and y-- 1 and 2-- but their own copies of them. So even though we logically, as with Kate, I think, with the Gatorade, swapped the two values, we ultimately swapped the wrong two values without actually permanently mutating the original x and y. So unfortunately-- well, fortunately and unfortunately in Python, there is no such thing as a pointer. So those are now gone. So we no longer have the expressiveness with which to solve this problem that way. But let me propose that we do it in oh-so-clever of another way. Here let me go ahead and declare x is 1, y is 2. Let me go ahead and print out as much. So with a format string, I'm going to go ahead and say x is x, y is y, plugging in their respective values. I'm going to do that twice. But in between, I'm going to try to perform this swap. And if your mind's ready to be blown, you can do that in Python, do the old switcheroo in Python. And this actually does swap the two values as you would expect. Now this is not a very common case. And to be fair, this is an incredibly contrived example because if you needed them swapped, well, maybe you should have just done this in the first place. But it does speak to one of the features of Python, where you can actually do something like that. Let me introduce now one additional feature that we only recently acquired in C. And that's the notion of a struct. And let me go ahead and do this in code from scratch. So let me go ahead and save this file proactively as struct0.py, reminiscent of one of our older programs. And let me go ahead and do this. From CS50 import get_string. And then let me give myself an empty list. So that would be a conventional way of giving yourself an empty list in Python. And much like in C, you can declare an empty array. But in C, you have to know the size of it or, if not, you have to use a pointer. And then you have to malloc. No. All of that is gone. Now in Python, you want a list? Just say you need a list. And it will grow and shrink as you need. Now I'm going to go ahead and just three times, arbitrarily, for i in the range of 3, let me go ahead and ask the user for a name using get_string. And I'll ask him or her for their name. Dorm will use get_string, as well. Dorm here. And then I want to append to the array this student. So I could do something like this-- students.append name. And it turns out-- and we've not said this yet. But there is inside of the list data type a method-- that is function-- built into it called append that literally does that. So if you've got an otherwise empty list, and you calls that list's name dot append, you'll add something to the end of the list. And if there's not enough memory for it, no big deal. Python will find you the memory, allocate it, move everything in it, and you move on your way without having to worry about that. But I don't want to store just the name. I want to store the name and the dorm. So I could do this. I could do-- well, maybe this isn't really students. Maybe this is now, like, dorms. And then here I could do dorms.append dorm. But why is this devolving now into bad design if my goal was to associate a student with his or her dorm, and then keep those values together? Why is this not the best approach in Python or, back in the day, even in C, to have two separate arrays? AUDIENCE: Like struct? DAVID MALAN: What's that? AUDIENCE: Struct? DAVID MALAN: Well, it's twice as many things to maintain, for sure. And what else? AUDIENCE: You can't map them to each other. DAVID MALAN: You can't map one to the other. It's just-- it's very arbitrary. It's sort of this social contract that I will just assume that student 0 lives in dorm 0. And student 1 lives in dorm 1. And that's fine. And that's true. But one of the features of programming and computer science is this idea of encapsulation, like, associate related memory with each other. And so what did we do in C instead? We did not have two arrays. AUDIENCE: We had a struct. DAVID MALAN: Yeah, we had a struct. And so Python doesn't have structs per se. It instead has what are called classes. And it has a few other things like tuples and namedtuples, but more on those some other time. So it turns out I could actually implement my own notion of a student. And I could import it like this. The convention in Python is if you create your own struct, henceforth called a class, you capitalize the name of it by convention. So a little different from C conventions. So what is a student going to look like? This is perhaps the most complex syntax that we'll have today, but it just has a few lines. If you want to implement the notion of a student, how might you do this? Well, in Python, you literally say class Student, where class is similar in spirit to-- just to be clear-- struct or typedef struct. But in Python, we're just saying class. And then this is the funky part. You can declare a function that by convention must be called init for initialize that takes as its first argument a keyword called self, and then any number of other arguments like this. And then, for reasons that will hopefully be clear momentarily, I can write some code inside of this method. So long story short, what am I doing? I have declared a new type of data structure called Student. And implicitly inside of this data structure, there are two things inside of itself-- something called name and something called dorm. And this is how you would in a C struct typically do things with the data types and semicolons inside of the curly braces. Meanwhile, there's this method here. And it's a method insofar as it is inside of a class. Otherwise it's a function, just by a different name. This method init takes whatever self is-- more on that another time. But it then takes zero or more custom arguments that you can provide. And I called it name and dorm. So it turns out this special method init is a function that's going to be called automatically for you any time you create a student object. So what does that actually mean? That means in your code, what you can actually do is this. I can create a student in memory by saying s gets capital Student, passing in name and dorm. And we don't have this feature in C. On the right-hand side, what I've highlighted is the name of the class and its two arguments-- name and dorm, which are just what the user has typed in. What this class does for me now is it allocates memory underneath the hood for a Student. It's got to be big enough for their name and big enough for their dorm. So it's, like, yay big in memory, so to speak. It then puts in the name and the dorm strings into that object, and then returns the whole object. So you can kind of think of this as a much fancier version of malloc. So this is allocating all the memory you need. But it's also installing inside of that memory the name and the dorm. And it's bundling it up inside of not just an arbitrary chunk of memory, but something you can call a Student object. And all that means that now for our students, we can just go ahead and append that student to the list. So now if later I want to iterate over for student in students, I can go ahead and print out, for instance, that student.name lives in student.dorm, close quote. And if now over here-- whoops, close that. Now over here, if I go ahead and run python on struct0.py-- oh, no! Oh, thank you. That goes there. So now-- dammit. Missing curly-- oh, thank you. OK. So now if I want to go ahead and type "Maria" and "Cabot" and "David" and "Mather" and "Rob" and, say, "Kirkland," now we get all three of those names. And there's other ways, too, if we wanted to actually store this thing on disk. But I'll defer that to an example online. Let's look at one final example that will hopefully either make you regret the past several weeks or embrace the next several instead. So you'll recall that-- though the former, I suppose, could be true even without my help. So if I go into now today's distribution code, you will see this program. And we won't walk through all of its lines. But this is a program written in Python called speller. And what I did was literally sit down with speller.c from problem set 5. And I just converted it from left to right, from C to Python, implementing it in Python in as close to an identical way as I could, just using features of Python. So just skimming this, you'll see that apparently my implementation of speller in Python has a class called Dictionary which is very similar in spirit to dictionary.h in C Notice that I still have a constant here. Or it's not technically a constant, but a variable called length equals 45. I hardcoded in dictionary/large, as speller.c did, too. I'm using command-line arguments, as we saw earlier, but this time in Python instead of C. Notice you can do funky things like this, which is reminiscent of our swap trick just a little bit ago. If you want to declare multiple variables all on the same line and initialize them, you can just enumerate them all with commas. Then on the other side of the equal sign, enumerate with commas the values that you want to assign to those variables. And then down here, if I keep scrolling, you'll see code that we won't get into the weeds of, but some familiar phrases. So this is the program that actually runs a student's dictionary on some input, and then prints out per all of this stuff at the bottom all of the familiar phrases that you might recall from problem set five. So this took a lot of work, most likely, to implement in C. And understandably, you might have used a linked list initially, or ultimately you might have used a hash table or a try or struggled with something in between those two. And that is a function of C. C is difficult. C is challenging because you have to do everything yourself. An upside, though, of it is that you end up getting a lot of great performance, theoretically. Once you have implemented the code, you're kind of as close to the hardware as possible. And so your code runs pretty darn well and is dependent only then on your algorithms, not on your choice of language. So here let me go ahead and implement a file called dictionary.py. And let me propose that the words-- the equivalent, sorry, of dictionary.h would be this file here. And it's going to have a function called check, which takes in an argument called word. It's going to have a function called load, which takes in an argument called dictionary. It's going to have a method called size, which takes in no arguments other than itself. And then it's going to have a method called unload, which also takes no arguments other than itself. So if we were instead to have assigned problem set five in Python, we essentially would have given you a file called dictionary.py with these placeholders for you because recall in pset five, those were all to dos. Strictly speaking, there would be one other here. We would probably have a def init because every class in Python, we'll see, we'll typically have this init method, where we just are able to do something to initialize the data structure. So let me go ahead and do this. We don't know that much Python yet. And we're taking for granted that speller in fact, works. But let me go ahead and load some words in a dictionary. So here is my method called load. Dictionary is going to be the name of the dictionary to load. So you guys implemented this yourself by loading those files from disk. In Python, I'm going to do this as follows. Give me a file and open it in read mode. Iterate over each line in the file. Then go ahead and add to my set called words the result of that line by stripping off the end of it backslash 0. Then go ahead and close the file, and then return true because I'm done implementing load. So that is the load method in Python. Happy, yes. OK. So check. Check was a struggle, too, right? Because once you had your hash table, or once you had your try, now you had to actually navigate that structure in memory, maybe recursively, maybe iteratively, following lots of pointers and the like, following a linked list. How about I just do-- let's just say if word lowercase in self.words, return true. Else return false. Done. So that one's done. Size-- we actually can kind of infer how to do this. Return the length of the words. That's done. Unload-- don't have to worry about memory in Python, so that's done. And there you have a problem set five. [APPLAUSE] Thank you. So what then are the takeaways? Either great elation that you now have this power or great sadness that you had to implement this first in C. But this was really ultimately meant to be thematic. Hopefully moving forward, even if you struggled with any number of these topics-- linked lists and hash tables and pointers and the like-- hopefully you have a general understanding of some of these fundamentals and what computers are doing underneath the hood. And now with languages like Python and soon with JavaScript and SQL, with a little bit of HTML and CSS mixed in for our user interfaces, do you have the ability to now solve problems, taking for granted both your understanding of those topics and the reality that someone else has now implemented those concepts for you so that when it comes to solving problem sets six and seven and eight, and then leaving CS50 and solving problems in your own domain, you have so many more tools in your toolkit. And the goal really for you is going to be to pick whichever one is most appropriate. So let's adjourn here. I'll stick around for questions. And we'll see you next time. Best of luck on the test.
B1 US py string cs50 malan david malan program CS50 2017 - Lecture 8 - Python 68 8 小克 posted on 2017/11/14 More Share Save Report Video vocabulary