Placeholder Image

Subtitles section Play video

  • DAVID MALAN: Hi, everyone.

  • This is CS50's own David Malan and--

  • COLTON OGDEN: Colton Ogden.

  • DAVID MALAN: So this is a new approach for us.

  • We, of course, have lectures once a week here in Cambridge.

  • And of course, we've done a number of live streams over the past few years.

  • But we thought we'd try something new with our Twitch channel here.

  • Colton is a big fan of Twitch.

  • He's quite the gamer himself, and a programmer of games.

  • And we thought we'd use this as a more interactive time to actually chat with

  • some folks in the Twitch chat room, and actually walk through some code in real

  • time-- live coding--

  • to get into more depth on some of the things

  • that we might otherwise talk about in class, and then, ultimately,

  • take people's questions and steer the conversation in any direction folks out

  • there would like to go.

  • COLTON OGDEN: Yeah.

  • It's a very flexible format.

  • Everybody who's in, feel free, definitely, to chime today and chat.

  • If you aren't following the CS50 TV account,

  • I believe there's a 10 minute waiting period.

  • So definitely follow us so you can talk to us.

  • But I see we do have one person, Srini Vasank says hello.

  • So hello, Srini Vasank.

  • Nice to see you here.

  • DAVID MALAN: Hello.

  • Nice to meet you as well.

  • COLTON OGDEN: So yeah.

  • I guess we can take a little transition here

  • to your laptop where we have your code there.

  • DAVID MALAN: Magic.

  • Yep, so we're actually sitting here in Cambridge, Massachusetts

  • on Harvard's campus with a magical green screen behind us.

  • And so we're superimposing ourselves, the two humans,

  • which is why you see this nice glow on both of us today.

  • COLTON OGDEN: I wish I had a shot of just the green screen.

  • All I have is this.

  • DAVID MALAN: That's green, but you see it as gray, perhaps.

  • And this allows us to superimpose ourselves over the code,

  • like you've seen in some other channels, so that we can actually

  • talk in greater proximity.

  • COLTON OGDEN: We do it for lecture for the code it itself.

  • DAVID MALAN: We do.

  • We do this in Sanders Theater.

  • But the green screen's way behind.

  • And we use some camera magic to actually capture it across the room as well.

  • COLTON OGDEN: Yeah.

  • DAVID MALAN: All right, anymore hellos to tend to?

  • COLTON OGDEN: Not just yet, no.

  • Just Srini Vasank.

  • We have 10 viewers currently active.

  • DAVID MALAN: All right, wonderful.

  • Well, so glad to have you here in class.

  • It's like a CS50 section here on campus.

  • What we thought we'd do today is--

  • our first trial of this is-- take a closer look at one of the topics

  • we look at in CS50.

  • Those of you have been following along for some time

  • and are pretty much through half of the course

  • or so might have implemented your own spell checker at some points.

  • And for those unfamiliar, roughly halfway during CS50's semester

  • here on campus and online, we challenged students

  • to build their own spell checker, which is a piece of software that looks over

  • all the words you've typed out in a file,

  • and tells you, often with read underlines, which words you've

  • misspelled, that aren't in a dictionary.

  • And that's where the CS comes in.

  • That dictionary is just a really big list of words,

  • hopefully alphabetically sorted.

  • And it's a non-trivial problem to actually look up

  • every one of the words in your file against every word in the dictionary.

  • If you recall from our discussion of asymptotic notation,

  • that's like an n squared problem, if n is the number of words in each,

  • just to simplify a bit.

  • And that can actually take quite a bit of CPU cycles and memory.

  • And so one of the challenges in CS50 is, do that efficiently.

  • Minimize your use of memory.

  • Minimize your use of CPU time, so to speak.

  • And just write the fastest spell checker possible.

  • Now, what language, of course, do we tend to write this in though?

  • COLTON OGDEN: Well we start it in C, generally, for the course.

  • Yeah.

  • DAVID MALAN: And that's great.

  • Because C is super low level.

  • It's super fast.

  • And really takes advantage of the hardware.

  • But what are some of the prices you'd say you pay by writing stuff in C?

  • COLTON OGDEN: Programmer time would be number one.

  • Obviously looking for memory leaks, and seg faults,

  • and all those sorts of things that trip up a lot of programmers.

  • DAVID MALAN: Yeah, those of you programmed in C, maybe C++,

  • have dealt with pointers, memory addresses, segmentation faults,

  • buffer overflow exploits, and the like.

  • All of that comes into play.

  • And so why don't we start there?

  • Why don't we actually take a look at what it is you have had to do,

  • or if you're following along with CS50 for the first time,

  • might have to do if you tackle this particular problem down the road.

  • And then let's actually do it in a much more friendly environment of Python.

  • But consider what prices we actually pay for doing that.

  • So we have up here behind us dictionary.h.

  • This is the so-called header file that we give to CS50 students.

  • And in this header file, we define four functions.

  • Do you want to summarize what functions they have to implement?

  • COLTON OGDEN: Actually, offhand, I don't remember.

  • I have to look at the source code.

  • DAVID MALAN: Oh, that's cool.

  • COLTON OGDEN: Sorry.

  • DAVID MALAN: We have it right here if you'd like.

  • OK.

  • Well, we clearly prepped here.

  • COLTON OGDEN: It looks like check.

  • DAVID MALAN: Yeah, check is one of them, isn't it?

  • COLTON OGDEN: Load size and unload.

  • DAVID MALAN: Yeah, now fortunately, the file's completely commented.

  • So we can follow along here.

  • So these four functions characterize, if you will, the API for spellchecker.

  • API is just application programming interface.

  • And while this might sometimes mean some web based service or third party you're

  • using, it can also refer to local software that you yourself are writing,

  • or that someone else has started writing and you now need to finish.

  • And indeed, the challenge for this problem is to flesh out that API

  • and write the actual implementation thereof.

  • All right, so in order of operation, not alphabetically,

  • load is the first function that students in this class have to write.

  • This is a function that takes as input in C--

  • it looks like "const char* dictionary".

  • And how should folks think about what that means?

  • COLTON OGDEN: Well, const meaning that it can't be mutated.

  • So whatever data that they pass into this function

  • should be not tampered with by the function.

  • A char* is just a sequence of chars or a pointer to a char,

  • which can be any number of chars after that.

  • But it's a string, effectively.

  • DAVID MALAN: Yep.

  • I agree.

  • COLTON OGDEN: Text data that's going to become their dictionary.

  • DAVID MALAN: And then the name of this parameter, of course,

  • is just dictionary.

  • And so the idea is that this is the name or the path to the file containing

  • all of those words that I mentioned earlier

  • that represent the actual dictionary.

  • And load's purpose in life, per the comment atop it,

  • is to load dictionary into memory.

  • And it's supposed to return a bool--

  • true if successful, else false.

  • For those unfamiliar, we are using standard bool.h,

  • which is a header file that you can include in C. That actually gives you

  • access to two definitions of true and false as 1 and 0,

  • respectively, effectively.

  • So after load is called, once implemented,

  • then it's the function called "check" that's supposed to get it called again,

  • and again, and again for every word in the file

  • you're actually spell checking.

  • So it's that one that's going to be especially performant.

  • Super fast, super minimal memory usage, hopefully,

  • if you're really trying to optimize those two things.

  • What does it appear that this takes as input to,

  • if you want to tackle that one?

  • COLTON OGDEN: The check function.

  • So it looks like the same thing.

  • So char*, meaning a pointer to a char.

  • So a variable length string, or a sequence of chars.

  • And then words.

  • So it's basically going to be any size word.

  • And we're going to, basically, check for the string header representing

  • the signature, whether the word is actually in the dictionary.

  • So you check our dictionary for this word

  • and see whether it's actually inside that dictionary, that data structure.

  • DAVID MALAN: Yeah.

  • And so strictly speaking, you don't have to use const

  • in either of these definitions.

  • This is just a nice mechanism for really protecting yourself from yourself,

  • lest you accidentally try changing word or try changing dictionary.

  • The fact that you defensively-- or whoever wrote this file-- defensively

  • put const there means, you just won't accidentally

  • screw up, change the value, and introduce

  • some bug that could be very easily avoidable by making it read only,

  • as const effectively does.

  • So lastly, there's two other functions in this file.

  • Per the comments, those are size and unload.

  • Unload is meant to do the opposite of load.

  • So whatever work we end up doing in load is supposed to be undone,

  • freed up in unload.

  • And then size is supposed to just return the number of words in the dictionary.

  • Now, maybe that's a linear operation.

  • You just iterate over whatever your data structure

  • is, counting up every one of the words and return that value.

  • Or if you're smart about it, you could probably just

  • keep like an int or a long around, and just, as you're loading words,

  • keep track of how many words there are.

  • And then just spit out in constant time that same value.

  • So in C, this is a pain in the neck to actually implement this thing.

  • You can implement it as a big array, a linked list to give you some dynamism

  • and growth.

  • You can implement it as a hash table, which is often implemented

  • as an array with multiple linked lists.

  • Or you can implement it using a try, an even fancier data structure, that

  • just consumes lots of memory, but theoretically,

  • is constant time in its operation, at least asymptotically.

  • So we're not going to do that though in C.

  • Indeed, that's one of the challenges in CS50 itself.

  • And one of the great moments in a course like this

  • is when you actually change context, and you switch from C--

  • a very low level language-- to something like Python, which

  • is higher level, so to speak, that has lots

  • more abstractions, lots more features, lots more capabilities built

  • into the language itself.

  • You can whittle down a 10, 20, 30, 40 hour project into truly just minutes

  • if not seconds, once you're actually super learned in the language.

  • Shall we dive into some actual live code?

  • COLTON OGDEN: Sure.

  • Before we do, Srini Vasank has a question for you.

  • DAVID MALAN: Yeah.

  • So pass by value can be used here instead of const.

  • Short answer, no.

  • You are already technically passing in the dictionary by its value.

  • But the catch is that the value you're passing in is its address.

  • And in so far as passing in an address allows you to dereference that address

  • and go to that address in memory, means that it's potentially

  • vulnerable to being changed.

  • And so we're declaring const here for the very reason

  • that we're technically passing by reference here, or by value.

  • But that value is an address.

  • And so this is why we have this defense mechanism in place.

  • If we were instead passing in a specific char, or an int, or a float,

  • or any other primitive type that doesn't involve memory addresses,

  • then yes, pass by value avoids this.

  • Because the worst you can do is change a copy of the argument.

  • But in this case, by definition, we are passing in a string,

  • which is indeed a char*, or address of a character, or a sequence of character.

  • So we can't-- it's not as easily said as passing by value.

  • COLTON OGDEN: Correct.

  • DAVID MALAN: The values here are what are [INAUDIBLE]..

  • COLTON OGDEN: I guess the equivalent would be maybe duplicating

  • the data structure in the function.

  • But then that would be like horribly inefficient for something

  • massive, like a big dictionary file.

  • DAVID MALAN: Yeah, you can totally copy the whole dictionary.

  • But even then, you'd have this window of a few lines of code

  • where you could still screw up and accidentally change things just

  • by nature of making a mistake.

  • COLTON OGDEN: Yeah, that's true.

  • DAVID MALAN: Really good question.

  • So keep the questions coming if you have any as we move ahead.

  • But let me go ahead and open up a file.

  • Let's call it dictionary.py, and actually implement this API,

  • but using Python, and therefore Python syntax and the equivalent.

  • And then if we have time, we can take a look

  • at a new speller.py, which is a porting or translation of speller.c,

  • which we haven't looked at yet, but CS50 students do in the class.

  • That actually implements the spell checker itself.

  • But the real intellectual work and the data structure stuff

  • really takes place in dictionary.c, or today, dictionary.py.

  • So let me go ahead and proactively save this as dictionary.py

  • so we get our syntax highlighting.

  • And let me propose that we just have place holders initially

  • for those four functions.

  • So again, we can't use dictionary.py with speller.c.

  • And so this is just a porting of the whole project to another language.

  • So in Python, you declare functions using slightly different syntax.

  • In Python, you don't specify a return type.

  • And you don't specify the types of your arguments to functions.

  • But you do specify their names.

  • And recall that one of those names was "check".

  • It does take an argument, in this case, which we'll call "word",

  • just as we did in C. But you have to define the function, which

  • in Python uses the def keyword.

  • And then of course, instead of currently braces in Python, if you're familiar,

  • you instead indent--

  • instead of using those curly braces--

  • often with colons, implying that here comes some indentation.

  • And I'm not ready to do this yet.

  • So I'm just going to say "pass", literally

  • passing on implementing the function.

  • But we'll come back to that.

  • But that will get the IDE here.

  • We're using CS50 IDE, or cloud nine, to just stop

  • yelling at us with some red marks.

  • And quite welcome for the pass by value question there.

  • So let's go ahead and do two others.

  • Let's go ahead and define load.

  • And load takes in a dictionary.

  • Here, too, I'm just going to pass for now on actually implementing it.

  • I'm going to go ahead and implement size, which takes no arguments.

  • So you just specify open paren and closed paren.

  • You don't need to say void as you do in C. Let's pass

  • on implementing that for now.

  • And then unload, which frankly, we're probably

  • not going to need at all in Python, because memory is managed for you.

  • You might have to allocate it, albeit implicitly.

  • But you're not going to have to worry about managing it like in C.

  • So there's the API ported to Python.

  • And again, you'll notice no return types and no parameter types.

  • Python's loosely typed.

  • It still has type strings, and ints, and floats, and other values.

  • But it doesn't actually require that the programmer, like us,

  • actually specify those things.

  • All right, so what do you think we should do to think about this?

  • We have to implement a dictionary for a spellchecker.

  • A dictionary, by definition, is a whole bunch

  • of words that are passed in in a big text file.

  • How would you go about thinking about how to begin solving this in Python?

  • COLTON OGDEN: I think the first thing I would want to do

  • is understand the dictionary file so that we can parse it

  • and we can load it into the dictionary.

  • DAVID MALAN: OK.

  • All right, well so if we want to parse the dictionary file,

  • it's nice that we're actually in Python because it's even easier in Python

  • than in C, where you have to really think about how long your lines are

  • and how many bytes of memory you might need.

  • So let me go ahead and go up there.

  • Let's implement load first, or at least the beginning of it.

  • And if I want to open a file in Python, it's actually

  • as simple as saying Open the file name-- dictionary being the file name--

  • and then you can specify, do you want to read the file,

  • or maybe read and write the file?

  • And so just as in C, with the F open function, if you recall,

  • you can say quote unquote, "read", which just says,

  • hey Python, open the file called "dictionary",

  • or whatever that value actually is, and then read it for me.

  • But this is going to return, essentially, a file handle to the file,

  • so to speak-- kind of like a reference there, too.

  • So I want to keep that around.

  • And I'm just going to declare a variable on the left called "file".

  • But I could call it anything I want--

  • F or whatever-- and assign that the return value of "open".

  • Notice, I don't have to specify the type of file.

  • It's going to return--

  • it's going to store some kind of file object.

  • But I don't need to worry about that.

  • And now in C, you would probably proceed to iterate using F read, or F scan F,

  • F get as, or F get C, or any number of other file I/O reading functions.

  • But in Python, there's these higher level abstractions, where

  • if you know that text file just contains word after word after word,

  • you can just say as much in Python by literally saying,

  • you know what, for each line in the file, go ahead and do this.

  • And so just as I kind of rattled that off verbally, so

  • can you type it in Python.

  • And it's a lot more English like, if you will, in that sense.

  • Now, I don't really know what we want to do with this word yet.

  • So let's just say for now, store word in some data structure.

  • Because that's kind of an interesting design question to come back to.

  • But once we're done doing that, we're just

  • going to go ahead and call close on the file.

  • Or rather-- I'm using C syntax already--

  • we're going to call file close.

  • And here's where Python is also object oriented.

  • The file reference that came back itself has functions or methods built into it.

  • And if you look at the documentation, one of those is close.

  • So file.close actually closes that specific file.

  • And then after that, let's assume for the sake of discussion

  • that everything went well.

  • So I'm just going to go ahead and return true, here in this case.

  • So returning true just signifies everything is good.

  • I do seem to have an error here, but that's only because I haven't actually

  • done something in this for loop.

  • So let me also just say pass on implementing that for now.

  • And that error should go away as well.

  • COLTON OGDEN: There's an even cleaner way

  • to do what you've done with the file, right?

  • DAVID MALAN: Oh yeah?

  • COLTON OGDEN: Using what's called a context manager.

  • DAVID MALAN: All right, how do we do that?

  • COLTON OGDEN: With the "with" keyword.

  • So if you have the-- instead of needing to open a file handler with file--

  • well, you still need a file handler.

  • But instead of having, basically, the dock close,

  • you can enclose all of these operations that

  • require this file within its own indented block using the "with"

  • keyword.

  • And this is essentially just what's called a contact manager.

  • It manages the context of this file and the operations on it.

  • So basically, it saves you a line of code.

  • DAVID MALAN: Yeah.

  • So great point to make.

  • This is more Pythonic, if you will.

  • Here on line 7 what I've done per Colton's suggestion

  • is actually say, with the opening of this file in read only mode,

  • call the return value "file".

  • And then notice how I've indented by tabbing everything below that-- line

  • 8, 9, and 10 at the moment, inside of that indentation-- thereby

  • implying that this file variable is useful within every indented line

  • underneath.

  • And as Colton remarks, this notion of context is important.

  • Because what's going to happen ultimately

  • is that Python is going to close this file for me by the time we hit line 11.

  • So I just don't need to think about it.

  • So this is a very common approach just to clean up

  • your thinking, focus on the work you actually care about,

  • which is opening the file and getting at it.

  • It's less interesting, intellectually, to close the file

  • and, indeed, much better if the language itself

  • deals with that overhead for you.

  • COLTON OGDEN: Plenty of people forget to close their files in C as well.

  • DAVID MALAN: Indeed.

  • In fact, one of most common sources of memory leaks,

  • where you allocate RAM and forget to give it back

  • is, indeed, because you forgot to close some data structure, like a file.

  • COLTON OGDEN: We have a few people in the chat.

  • We have Trebinator who says, greetings from Germany.

  • DAVID MALAN: Greetings from America.

  • COLTON OGDEN: Amit [INAUDIBLE],, says, greetings from India.

  • DAVID MALAN: Nice to meet you, Amit, as well from Cambridge.

  • COLTON OGDEN: And then Srini Vasank has another that says, comment in line 9

  • is stores w.

  • Comment in line 9.

  • Store word in some data structure.

  • DAVID MALAN: Oh yeah.

  • So entire line, instead of one-- yes, it is.

  • Well, it depends what I do with this.

  • So I can save myself here.

  • I haven's actually written this line of code,

  • so I can claim it's going to do whatever I want.

  • But yes, if we do no other action here in the commented line on line 9,

  • yeah, we'd be storing the whole line.

  • But if we assume, as we're allowed to in CS50 C version of the PSet,

  • that the dictionary contains words, one word per line.

  • Then, with a bit of fanciness, can I actually extract

  • from that line the one, and only one, word that's on it,

  • get rid of maybe any leading or trailing whitespace or messiness,

  • and just load that word.

  • And indeed, that's the to do on line 9 that remains.

  • COLTON OGDEN: For context, do you happen to have the dictionary file available

  • for us to look at?

  • DAVID MALAN: Yeah, absolutely.

  • If you want to take a look at the dictionary,

  • let's go into another folder.

  • And in that folder, we have a dictionaries folder.

  • And I'm going to go ahead and open up a large dictionary here,

  • which has got 140,000 words.

  • Even the IDE here was struggling for a moment to open it.

  • And here we go.

  • So here are the lines in the file.

  • "A" is the shortest English word I can think

  • of that starts with A. Apparently "aaa" and "aaas" have meaning too.

  • Aardvark--

  • COLTON OGDEN: So we've gone ahead--

  • we've lower cased all the words, in this case, in advance.

  • DAVID MALAN: We have.

  • And so that's kind of an assumption we're making.

  • If we didn't want to make that assumption, that's fine.

  • We can deal with it in code.

  • But yes, that would have been one of the setups in the C version

  • of the problem, which is that, yeah, it's all lowercase.

  • COLTON OGDEN: We got [INAUDIBLE],, greetings from Italy.

  • Hello, [INAUDIBLE].

  • DAVID MALAN: Buongiorno, [INAUDIBLE].

  • COLTON OGDEN: We have a SB Venner, greetings from London.

  • Nice.

  • DAVID MALAN: Hello.

  • COLTON OGDEN: Good to see you.

  • DAVID MALAN: Nice from the Cambridge here.

  • COLTON OGDEN: Srini Vasank.

  • Sorry if I am jumping.

  • No, that's totally fine.

  • I think it was a good thing to catch in the comment.

  • DAVID MALAN: Yeah.

  • No, that's fine.

  • Call me on anything that we're missing here.

  • So if you want to see all 140,000 words, we can scroll here for quite a while.

  • You can see there's a lot of A words in English.

  • And this is why it's actually important at the end of the day

  • to think about how much memory you're using,

  • how many CPU cycles you're using.

  • Because at the end of the day, this is going to add up.

  • And indeed, I got bored with scrolling.

  • We haven't even gotten through all of the A words.

  • So there's a lot of words there.

  • COLTON OGDEN: And that's why we have a small and a large, I'm guessing,

  • so that we have a dichotomy.

  • We can performance measure between the small--

  • our algorithm operating on the small database versus the large database.

  • DAVID MALAN: Yes.

  • You do not want to try to debug a problem in your code

  • with 140,000 inputs coming at you.

  • Can you even imagine setting breakpoint in a debugger and walking through that?

  • COLTON OGDEN: That would be terrible.

  • DAVID MALAN: So we have small, which just

  • has two words-- cat and caterpillar.

  • So this is actually-- fun fact.

  • So this is kind of curious that we have these two words.

  • Why might you, in writing some test code, use two words like this,

  • as opposed to cat and dog?

  • COLTON OGDEN: I'm thinking from the perspective of the try,

  • because they're both going to operate on the same nodes in the tree.

  • Cat and then caterpillar--

  • caterpillar is kind of a superset of cat in that sense.

  • That's my first inclination.

  • DAVID MALAN: Yeah.

  • No, and that was the motivation.

  • Because you want to try to think, when testing your code, of potential

  • corner cases.

  • And it's definitely potentially worrisome if two of your words

  • are so similar but different.

  • So whether you're using a try or anything else,

  • just choosing those kinds of corner cases--

  • one letter words, two letter words, substrings of other words--

  • it's a really good defensive mindset to go into debugging with.

  • Fun fact two-- it was just a few years ago that I

  • learned how to spell caterpillar.

  • Because it occurred to me, since childhood, I could pronounce this word.

  • I knew what a caterpillar was.

  • And then the one time, off the cuff, I said, oh,

  • why don't we spell check cater--

  • pill-- ar.

  • I had to Google the darn word in the front of class.

  • COLTON OGDEN: How did you think it was spelled?

  • DAVID MALAN: A cata-- a cat-a-pillar?

  • I don't know.

  • COLTON OGDEN: Oh, I see.

  • DAVID MALAN: You just take for granted for 30 years--

  • COLTON OGDEN: It's a weird word.

  • I think there's other words like that too.

  • DAVID MALAN: I was embarrassed.

  • COLTON OGDEN: Srini Vasank says, sorry, I

  • didn't realize that dictionary falcon needs only one word.

  • DAVID MALAN: No need to apologize.

  • We didn't tell you, so you wouldn't have known.

  • No big deal.

  • All right, so let's go back to the Python API

  • that we've been in the midst of implementing here,

  • and see if we can't flesh this out a little more.

  • So I feel like the big to do is really this.

  • We've kind of cut some corners, and we've only thought about--

  • we have a special guest here today--

  • but we've only really thought about what to do logically.

  • But where are we going to tuck this data away?

  • So how do you think about--

  • how should anyone think about actually storing these words, would you say?

  • COLTON OGDEN: Well in CS50, I think normally,

  • we use what's called a hash table.

  • So some way for us to essentially map a word to some storage bucket

  • somewhere, using a function that's optimized

  • for even distribution of our words, so we don't have all of the same words

  • in, basically, one long linked list.

  • But I think we should show people maybe how

  • to implement a hash table in Python.

  • DAVID MALAN: Oh yeah.

  • You want to do a hash table in Python?

  • COLTON OGDEN: Yeah.

  • Let's do a hash table in Python.

  • DAVID MALAN: All rigth.

  • So if you want to declare a hash table-- for instance,

  • a variable called "hash table in Python"--

  • you could technically just do that.

  • COLTON OGDEN: Oh, wow.

  • DAVID MALAN: And then done.

  • What would you like me to do next?

  • COLTON OGDEN: OK, so now I think--

  • done.

  • I think we're all set.

  • DAVID MALAN: Kind of.

  • But this actually does invite some interesting design questions.

  • So this is indeed the definition of a hash table-- or in Python,

  • it's called a dictionary or dict--

  • where you map keys to values.

  • And that's the fundamental definition of a hash table.

  • It's a mapping of keys to values.

  • COLTON OGDEN: So cat equals 1, dog equals 2, for example.

  • DAVID MALAN: It could be.

  • Yeah, but you don't have to use numeric indices like that.

  • It really suffices in a dictionary to say, yes or no, the word is in it.

  • So you could probably just say, cat true, dog true, caterpillar true.

  • But even that feels a little redundant.

  • And so when it comes to designing a data structure like this,

  • hash table might not necessarily need be what you want.

  • You can actually use bunches of others.

  • So you could actually get away with just using a list in Python,

  • which is a dynamically realizable array or vector,

  • if you're coming from the C++ world.

  • That's nice too, because we'll automatically grow to fill the data.

  • And you're also not unnecessarily storing true, true, true,

  • true, or 1, 2, 3, 4.

  • You're just storing the words themselves-- the keys.

  • COLTON OGDEN: More efficient terms of memoy.

  • DAVID MALAN: Yeah, I mean, you're saving some number of bytes.

  • However Python implements true and false--

  • it's got to be non-zero, presumably.

  • But would you caution against using just a list, would you say?

  • COLTON OGDEN: I would, for access purposes.

  • Because if you want to find out whether something's in a list,

  • it's going to be an O of n operation.

  • DAVID MALAN: Yeah.

  • So a big O of n is a linear time operation.

  • Because if that word is "zoo", and it's all the way at the end of your list,

  • if it's alphabetized, then you've got to search all 140,000 words just looking

  • for the darn word that you care about.

  • COLTON OGDEN: We've got a couple of questions also.

  • Trebinator-- are you doing PSet5 in Python?

  • Yes.

  • DAVID MALAN: Yes, essentially.

  • And fun fact-- PSet5 just became PSet4 this year.

  • So stay tuned for CS50x 2019.

  • COLTON OGDEN: And do you want to read off

  • SB Venner as the last question or two?

  • DAVID MALAN: Yeah, sure.

  • SB Venner asks, "what does the passed reserve keyword do in Python?

  • I've not really seen that before.

  • Steve."

  • So it's actually just literally a pass.

  • When you want to just have a placeholder line that does nothing,

  • as I'm doing here sort of pedagogically, you can just say pass.

  • That's a bit of a contrived use case.

  • It really means, when you need a line of code,

  • but you don't want to take any action there, you can use pass.

  • So just to pull up a separate file here--

  • let me just create foo.py for the sake of discussion.

  • This would not be the best design, but it's one possible use of this.

  • If you wanted to say something like, if x is greater than y--

  • but in that case, you don't want to do anything about it,

  • you could just say pass.

  • Now, I say this is bad design because, frankly,

  • if you just want to check a condition and then not do something,

  • and instead do something only if the opposite is true-- like do this--

  • you can just flip the logic, frankly.

  • And you can, instead, say, if x is less than or equal to y, then do this.

  • And you can avoid the pass altogether.

  • But I've been in the habit on some applications of actually using it

  • in the context of exceptions.

  • So we don't tend to talk about these in CS50,

  • but you might in a higher level class, or in your own professional work.

  • Often, it's the case in Python that you want

  • to try to do something inside of which you might have some lines of code.

  • Except, something bad might happen.

  • And so you might say, except--

  • exception as e, for exception.

  • But if you want to do something with the exception,

  • you might not want, sometimes, to do something with the exception.

  • But you do want to catch it and then ignore it.

  • Because you know more than the computer.

  • And you know that sometimes an exception might happen.

  • But if you decide, not a big deal, you can literally just

  • say pass, effectively ignoring the exception,

  • but still handling it so the whole program doesn't abort.

  • So I would say that's a more compelling use of pass that's not pedagogical,

  • but it is functional.

  • COLTON OGDEN: Essentially, because in Python,

  • if you try to do a function definition without a body,

  • it will just give you a compiler error.

  • DAVID MALAN: Yeah.

  • Yeah, indeed.

  • So if you actually want to have the equivalent of abstract methods inside

  • of a class that you then override in a child's class,

  • you might just say pass, because you don't want the function

  • to have to do anything by default. But it

  • does need to be well-defined grammatically in the language.

  • That's a really good question.

  • Keep them coming.

  • Hope that helps, Steve.

  • All right, so if we turn our attention back to dictionary.py,

  • I would propose we could take this one step further.

  • Python has a whole buffet, if you will, of data structure.

  • It's not just lists, not just hash tables,

  • but it also has one that we introduced in class, albeit briefly.

  • It actually has what are called sets.

  • You might not have thought about these recently.

  • But mathematically, a set is a collection of numbers, typically.

  • But a key detail is that the set does not

  • contain duplicates, which is great because your dictionary probably

  • shouldn't contain duplicates.

  • The word is either a word or it's not.

  • And so a set is kind of nice because it maintains that property.

  • But more importantly, if you read the documentation for Python,

  • set is more performant than list.

  • It will give you an approximation of constant time access.

  • So it's probably implemented with a hash table underneath the hood,

  • much like a dictionary.

  • But it does have this property where you can just

  • think about membership therein, not an association of keys with values.

  • COLTON OGDEN: It's kind of like the hash table,

  • but rather than needing to store that true every time, which like you said,

  • probably amounts to some non-zero amount of bytes, we just get all of that

  • sorted for free with the same functionality of the membership check.

  • DAVID MALAN: Yeah, indeed.

  • And so a set is what you would often call an abstract data type.

  • But you don't necessarily know or have to care how

  • it's implemented underneath the hood.

  • All you care about is its properties-- the fact that you can add to it,

  • remove from it, and get an approximately constant time access to it.

  • That's what you care about.

  • But underneath the hood, it probably is implemented with some kind of tree,

  • or a hash table, or anything.

  • That's not for us to worry about in a language like Python.

  • So let's call this actually what it is.

  • This is our dictionary of words.

  • So let's just say-- we've got a variable called "words",

  • and it's initialized to be just an empty set,

  • containing no words initially, until we're ready to put them into the file.

  • COLTON OGDEN: We've got a question from Andre.

  • Andre-- we see Andre on the Facebook group all the time, actually.

  • DAVID MALAN: Indeed.

  • Hello, Andre.

  • Nice to see you here on Twitch.

  • COLTON OGDEN: "Would checking access aligned bounding box

  • collisions be a good use case for pass, since you are typically testing

  • whether for cases are not true?"

  • In that case, you still would check all four conditions.

  • So I don't think you would, because you want to make sure all of them

  • aren't true.

  • Because that's just part of how that algorithm works.

  • So yeah, I don't think pass, in that particular instance,

  • would be appropriate.

  • DAVID MALAN: Yeah, you can still flip the logic even if syntactically it

  • starts to get a little messy or sloppy.

  • Generally, in any language, having just empty code blocks

  • where you take no action or explicitly say pass,

  • it usually suggests that better design is possible.

  • COLTON OGDEN: Yeah, I usually see it as a non-implemented sort of thing like a

  • to do.

  • DAVID MALAN: It's like in the real world, at least in the US

  • and in English, you sometimes have silly business or legal documents that say

  • "this page intentionally left blank".

  • I mean, everyone kind of rolls their eyes at that,

  • because well, this is kind of silly.

  • And I think the same idea there.

  • You don't want to just have place holders for code.

  • You want to get the logic right from the get go.

  • COLTON OGDEN: Good question.

  • DAVID MALAN: Yeah, nice to see you here on another venue.

  • All right, so this line 9 still kind of remains.

  • Store line in some data structure.

  • So honestly, this is pretty straight forward in Python.

  • If I have this data structure called "words",

  • and I just want to go ahead and add to it,

  • I can literally just add the line to the set.

  • Now, this isn't quite right.

  • You notice this earlier--

  • I think one of our--

  • COLTON OGDEN: I think Srini Vasank.

  • DAVID MALAN: Yes-- noted earlier that, doesn't it contain a full line?

  • And it does.

  • But if we know that the dictionary contains one word per line,

  • but every line probably ends with a backslash n, or CRLF, or the like,

  • we're going to want to strip that off.

  • And if you read the CS50 specification, you'll

  • know that, indeed, it ends with backslash n, or a new line, or a line

  • feed.

  • So what's nice in Python is, you can actually strip this off pretty nicely.

  • You can say, reverse strip--

  • rstrip a specific character or characters from the end of the string,

  • hence a write strip from the end.

  • And that's actually going to strip off one or more new lines

  • from the end of the string, thereby giving us,

  • effectively, the whole word from the file.

  • COLTON OGDEN: Nice.

  • Clean.

  • DAVID MALAN: And that's it.

  • Now, even though Colton and I have been talking for quite a few minutes

  • here, at the end of the day, we wrote what,

  • six or so lines of code to get done much of a CS50 problem, namely the load

  • function here.

  • So let's take a bit of a breather.

  • And let's go ahead and just implement size.

  • This function, according to the problem statement,

  • is return the number of words in the dictionary file, which

  • is now loaded into memory, presumably.

  • How would you go about thinking about this?

  • COLTON OGDEN: Well, one approach might be to use some sort of loop

  • to basically go over our data structure and say, for-- in Python,

  • for example-- for something in our collection-- for word in words.

  • And then just increment the counter that you've declared-- size.

  • So size plus equals 1.

  • And then yeah.

  • Just returned the size.

  • DAVID MALAN: OK.

  • All right, so that's pretty straightforward.

  • It feels like, if words is a data structure,

  • it probably supports the notion of length.

  • Could we whittle this down into something more Pythonic?

  • COLTON OGDEN: Yeah, I would say most abstract data types probably

  • do have some sort of length function, or length field on them.

  • So I believe you can just call len on set-- the length function, which

  • is a function that works on pretty much any core Python data structure.

  • And that should give you the number of words.

  • It should be stored internally.

  • DAVID MALAN: Indeed.

  • This is a nice generalization.

  • It's a little inconsistent across Python that you don't call words.lang,

  • but this is essentially an API in Python.

  • leng, or length, is a function that works across various different data

  • structures and data types and can even work across your own customized ones

  • that you yourself invent, so long as your data structure implements

  • itself a magical method called "leng" itself, which is built into the object.

  • This len function will call that length method,

  • thereby making sure that you can get the length of any data structure

  • in a standard way, in this case.

  • So actually, just letting the data structure,

  • the set itself, tell us what's your length,

  • it could probably now be done in constant time, instead of linear.

  • And again, that's the goal--

  • speed everything up today.

  • Well, let's do a spoiler here with unload in Python.

  • Even though in C, you would have to allocate all of this memory--

  • maybe it's on the stack, or the heap, or wherever-- once you've decided that,

  • you have to actually free that memory.

  • In Python, there's really no work to be done here.

  • So I'm just going to go ahead and boldly say "done".

  • COLTON OGDEN: Return true.

  • DAVID MALAN: Yeah.

  • COLTON OGDEN: What's the mechanism called that does that for us?

  • DAVID MALAN: So it would be called "garbage collection".

  • And phew, I wasn't sure if what you were fishing for there.

  • It'd be called "garbage collection", which

  • is kind of a cool way of describing a process that happens behind the scenes.

  • Whenever that program isn't really doing much else, or has a breather,

  • or just needs to collect garbage, it will free up any memory

  • that you are known not to be using anymore,

  • so that it can be used for other purposes in your program or somewhere

  • else.

  • Yeah.

  • COLTON OGDEN: Languages like Java, and Ruby, and such also include-- typically

  • interpreted languages, and some compiled languages have this feature.

  • DAVID MALAN: Gotcha.

  • COLTON OGDEN: C definitely does not have this feature.

  • DAVID MALAN: No.

  • No, and that's what makes it all the more painful sometimes to use.

  • And Srini, you got it.

  • Thank you for the summary there to everyone.

  • A few people have some long usernames here today.

  • So we're kind of left with just one function

  • that we passed on earlier, literally.

  • That is the check.

  • So check, recall, is going to be the function

  • that we're going to call again, and again,

  • and again to spell check any of the words that are actually passed in.

  • So let's consider how we can go implement this up top.

  • We know that the data structure in question is called "words".

  • It's a set.

  • And it turns out, it's pretty easy to just ask the question,

  • is this string in this set?

  • You can literally just say something like, if word in set, then go ahead

  • and return true.

  • Or if that's not the case, with your colons here,

  • you can go ahead and return false.

  • But in Python, anytime you find yourself in the habit of very verbosely saying,

  • if this, then return true, else if that, return false,

  • you can probably whittle it down even more cleanly.

  • And indeed, you can actually do this in C.

  • Though we tend not to do it in the earliest weeks of CS50.

  • I could actually just be a little more elegant here, and just say, what--

  • return word in words.

  • And that in itself is a Boolean expression

  • that will return true or false.

  • I might want to do one enhancement here.

  • And you would only know this from reading the problem statement.

  • Even though the dictionary that we were given,

  • with 140,000 words is sorted in all lowercase, is in the problem statement,

  • the file your spellchecking might not be.

  • Because that could be any file that some human typed out.

  • So we should probably, if we're looking for words in a lowercase dictionary,

  • we should probably force every word that we're being passed to lowercase,

  • just to make sure that we're comparing apples and apples, so to speak.

  • COLTON OGDEN: Good point.

  • DAVID MALAN: So any questions, then, on how we've tackled this?

  • Believe it or not, in 17 lines or fewer, can you

  • implement a spellchecker in Python that conforms to CS50's own API, which

  • takes many more lines and, undoubtedly, many, many, many more hours,

  • if not tears in C to implement.

  • COLTON OGDEN: Sure.

  • Maybe we have a couple of minutes for questions.

  • In the meantime, when was the first year that we had this PSet?

  • Do you remember?

  • DAVID MALAN: I do remember.

  • In CS50 here at Harvard, 2007.

  • And AI has yet to replace a spellchecker.

  • So we haven't had to replace it yet.

  • COLTON OGDEN: Did you come up with this from scratch?

  • Or was this an inherited PSet?

  • DAVID MALAN: I did come up with this from scratch, I believe.

  • It's a while ago now.

  • And in fact, I think I first wrote this PSet, or problem set, even years prior.

  • Let's roll back to 2002 or 2003. i was teaching

  • at Tufts University, another school up the road,

  • in their computer science department.

  • And I was making all new homework assignments for that class.

  • And I think, unless I'm rewriting history here,

  • I think I did write that from scratch way back when.

  • And it's evolved a little bit over time.

  • And in fact, it started off as a C++ problem set, if this was true.

  • That, or I'm completely making all of this up, and in 2007

  • did I first write this PSet.

  • COLTON OGDEN: Was it because you were having issues with Word's spellchecker?

  • DAVID MALAN: [INAUDIBLE] No, I mean, we had Microsoft Word in my day.

  • [INTERPOSING VOICES]

  • You can actually just ask the software to do this.

  • COLTON OGDEN: ZAlien's got a couple of kappas for us.

  • Hey, ZAlien, good to see you.

  • DAVID MALAN: Thank you very much.

  • COLTON OGDEN: Trebinator, "love your course and self teaching.

  • [INAUDIBLE]

  • DAVID MALAN: Oh, thank you.

  • Oh, PSet5 and C--

  • I'm sorry we showed you the answer in Python.

  • Otherwise--

  • [INTERPOSING VOICES]

  • dictionary.h, the file we started with, is actually given to students.

  • So that's not even a spoiler there.

  • So fun fact-- when I was a kid, Microsoft Word

  • came on four floppy disks.

  • COLTON OGDEN: Oh man.

  • DAVID MALAN: And it had all the features I actually needed or cared

  • about back in, what, 1995 give or take.

  • Microsoft Word 5.1A.

  • COLTON OGDEN: How much smaller was the dictionary back then?

  • DAVID MALAN: We had as many words, I think, in English.

  • Although, you hear about more and more stupid words coming out every year

  • that are apparently immortalized in the dictionary.

  • COLTON OGDEN: I can't remember any from this year.

  • DAVID MALAN: I don't know.

  • You always roll your eyes.

  • COLTON OGDEN: You have the floppy Word in your office at 125, don't you?

  • DAVID MALAN: Oh, maybe.

  • COLTON OGDEN: Or Excel.

  • No, you have the Excel floppy disk.

  • DAVID MALAN: Oh, possibly.

  • Yeah.

  • It was cool.

  • Actually, a couple of years ago now, CS50's video team went on the road

  • to visit our friends at Microsoft.

  • So I probably shouldn't have said ill things about Microsoft Word just now.

  • And they have a wonderful archive.

  • And we got to explore some of the boxes and shelves of old stuff.

  • And they just had all of the oldest software right there on floppy disks.

  • We have some fun photos of picking up entire boxes of really, really

  • heavy products.

  • But the stupid thing is, the heavy products were not

  • like the actual software.

  • It was the damn instruction manuals.

  • They used to come with dictionary sized user manuals

  • that we held up above our head.

  • I don't think we can find the photo on such quick demand.

  • Maybe?

  • Can we do that?

  • All right.

  • CS50's own Dan Coffey here is suggesting that we go to our own photo website

  • here.

  • So let's see if we can't dig up a little memory from yesteryear.

  • If I go here, I think we can probably search.

  • COLTON OGDEN: Srini is asking, is that a 3.5 inch floppy disk?

  • Oh, did you already answer that question?

  • DAVID MALAN: Hey, look at this.

  • Hey, look at that search.

  • That's a good dictionary we've got here.

  • COLTON OGDEN: Wow.

  • DAVID MALAN: So this is me looking very surprised at how heavy Microsoft

  • Office was a long time ago.

  • COLTON OGDEN: That's really how big the manuals were?

  • Are those the manuals or the full software?

  • DAVID MALAN: This is mostly the manual.

  • So let's enhance here.

  • And yeah, there were a few floppy disks or maybe CDs by then in there.

  • But those are mostly manuals.

  • That's not even the biggest one.

  • Here we go.

  • This is Visual C++.

  • What the hell?

  • That's the compiler.

  • Wow, that's me a long time ago.

  • That's a compiler and the manuals for it.

  • Slightly creepily, too, is that photo.

  • But also, Microsoft apparently-- what are those called?

  • Teletubbies?

  • COLTON OGDEN: Teletubbies, yeah.

  • DAVID MALAN: Teletubbies used to be powered by Microsoft.

  • COLTON OGDEN: Microsoft did--

  • I had no idea.

  • DAVID MALAN: Yeah.

  • COLTON OGDEN: Windows 3.1 was distributed on 8 3.5 inch floppies,

  • says Andre.

  • I believe it, 100%.

  • DAVID MALAN: Indeed.

  • Oh, and it's apropos here.

  • Off camera is our friend Dan Coffey, who is right there on camera with that.

  • And then of course, it wouldn't be CS50 without our friend Natasha

  • Chornesky, Nacho, who's at Microsoft with other colleagues of hers as well.

  • And there she is, CS50 proud, having taken CS50 herself.

  • COLTON OGDEN: Do you remember per Srini's comment

  • whether it was 3.5 or 5.5 inch floppies?

  • DAVID MALAN: Let's see, for Office?

  • In my day, the version we had but on Macs was 3.5.

  • But I don't think Macs ever had 5.25 disks.

  • But Windows might have, certainly.

  • If there was Windows.

  • I mean, DOS and Windows 3.1 and onward might have.

  • COLTON OGDEN: Yeah.

  • I remember having a Windows 95 machine with floppies on it.

  • But it's been a long time.

  • DAVID MALAN: Yeah.

  • Hey, we're all sounding really old right now.

  • So if there's any you kids tuning in wondering what the hell

  • this channel just became--

  • COLTON OGDEN: I think we should have prefaced it

  • with a picture of what a floppy disk is.

  • DAVID MALAN: That is true.

  • So here.

  • Let's do that.

  • What's a floppy disk?

  • Floppy disk.

  • They came in different shapes and forms.

  • They're mostly died out now.

  • COLTON OGDEN: You used to have a really good presentation in CS50

  • where you would have people rip open a floppy disk.

  • You still have those in storage, by the way.

  • DAVID MALAN: We do still have some floppy disks in stock.

  • Yeah.

  • Fun fact-- it actually got really expensive to do.

  • Because the price of storage and media just went down, and down, and down,

  • and down.

  • So for a few pennies, we could give 1,000 students in the room

  • their own floppy disk, and then tear it apart in class.

  • But then once nobody started using them, you

  • could only get them on eBay, not even Staples.

  • And so then it's $1 each, and that makes for a really expensive demo.

  • COLTON OGDEN: I remember when we bulk ordered those.

  • There were thousands of them.

  • DAVID MALAN: Yeah.

  • So this is a somewhat hard floppy disk.

  • Inside is the actual floppy material.

  • And then as Srini was referring, we had 5 and 1/4 inch disks,

  • and even bigger ones back in the day--

  • small, medium, and large, if you will.

  • And I started off life, I think, with the middle disks there.

  • But fun fact two-- if we enhance this, these 5 and 1/4 inch floppy disks,

  • which were indeed flexible, came with a little divot out of the corner,

  • which was to help align them so you would know which is on the left

  • and which is on the right, so you don't put the disk in upside down.

  • But the catch is that the materials inside the disk

  • are actually perfectly symmetric.

  • And so, if you actually take a scissors or a special $5 device,

  • you could snip a hole in the floppy disk exactly symmetrically

  • on the other side, thereby tricking the floppy disk

  • drive to write bits onto the other side of the actual floppy disk inside.

  • Now I'm sure if I had any notion of electrical engineering at the time,

  • I would know that this just creates interference and bad things.

  • But to a kid trying to double his storage space,

  • it doubled my storage space.

  • So all of us used to do this and make your floppy disks double sided.

  • COLTON OGDEN: That's a TIO for me.

  • I didn't know you could do that.

  • DAVID MALAN: Yeah, well it probably wasn't recommended.

  • The better practice would be just buy more floppy disks.

  • But that's what we did back in the day.

  • COLTON OGDEN: Yeah.

  • That's cool.

  • That's a good piece of history.

  • DAVID MALAN: Yeah.

  • So any questions from the audience there?

  • Anything we can pluck off, either on C, Python, or anything in between?

  • COLTON OGDEN: Yeah, if you guys have any sort of small exercises or examples

  • you want us to implement in Python, let us know.

  • You said potentially resize as well?

  • Is the one that you have?

  • I can print that for you.

  • DAVID MALAN: We do have resize.

  • But I think we also have the speller, which we can go from C to Python.

  • COLTON OGDEN: Oh, that's true.

  • Yeah, we have that other spelling.

  • Yeah, we could do that if folks are interested.

  • We should let people decide what they want.

  • My desk still has a few of those [INAUDIBLE]..

  • Do you folks prefer us to do resize or to take a closer look at speller in C

  • maybe?

  • A good chance for some feedback.

  • DAVID MALAN: I do think speller in C is nice.

  • Because we could literally do it side by side.

  • COLTON OGDEN: That's true.

  • That's a good point.

  • DAVID MALAN: Let's see how the lines line up.

  • COLTON OGDEN: That's a good point.

  • DAVID MALAN: All right.

  • So let's do that.

  • COLTON OGDEN: Let's start on that.

  • DAVID MALAN: All right.

  • So let's go ahead and let me unenhance-- close a couple of our distractions

  • there.

  • And let me go into my same folder as before.

  • Let me go ahead and open up speller.c, which

  • is indeed the version of this program that students are given.

  • So I'm afraid, too, this won't help you out,

  • Trebinator, with PSet5, since you two are given this distribution code.

  • But what's nice about the IDE here is we can split our windows

  • and actually do some side by side coding,

  • and essentially comport or translate C from Python from left to right here.

  • COLTON OGDEN: We could also do this magic.

  • And now we're out of the scene.

  • Look at that.

  • DAVID MALAN: Oh, nice.

  • We can even get rid of you and me.

  • Nice.

  • All right.

  • So here we go.

  • Let me go ahead and create a new file.

  • We'll call this speller.py.

  • And essentially, we're going to try to recreate this file over

  • at the right hand side.

  • So we'll do some simple, some more sophisticated things for a moment.

  • Keep watching, Trebinator.

  • We won't show you dictionary.c.

  • That's where the real magic is.

  • COLTON OGDEN: Actually, do you want me to maybe make us smaller

  • and put us in the middle so we can still see--

  • DAVID MALAN: Yeah, can we unenhance us?

  • COLTON OGDEN: Yeah, let's do that.

  • DAVID MALAN: Goodbye.

  • Oh, there we go.

  • COLTON OGDEN: There we go.

  • DAVID MALAN: Oh, we could do a little Mac versus PC, like C versus Python--

  • [INTERPOSING VOICES]

  • COLTON OGDEN: I think I should get on the other side then.

  • DAVID MALAN: Yeah, this is kind of backwards.

  • But the keyboards-- that's OK.

  • I'll look this way, you look this way.

  • And we'll focus on our own code.

  • All right, so one thing to note-- first of all,

  • in Python, there's different ways to write comments.

  • And so for multi-line comments like this,

  • how would you go about writing them in your file?

  • COLTON OGDEN: In C or-- oh, in Python.

  • Python has a cool syntax--

  • I mean, one way you could do it is just preface

  • every single line with the hash mark.

  • DAVID MALAN: So just do every single line like this.

  • COLTON OGDEN: Which would get very tedious.

  • DAVID MALAN: Yeah, for sure.

  • COLTON OGDEN: Python has another way you can

  • do it using either a triple full quotes or single quotes, which

  • will allow you to do the same thing, essentially,

  • as this, except on the left.

  • But you don't need to have the star on each line.

  • You can just write strings, literally, as you want.

  • DAVID MALAN: Yeah, So problem set 5 implements a spellchecker.

  • And then down here, we can close that thought using double or single quotes,

  • so long as we're actually consistent across the two.

  • And now those are our comments.

  • And you can see it's nicely highlighted.

  • Its color coded a little differently, just because the IDE treats comments

  • in C and Python a little differently.

  • But same exact idea.

  • COLTON OGDEN: I think Srini had the right idea.

  • He had the triple hash mark.

  • But just off on just the quote being the key there.

  • DAVID MALAN: Indeed.

  • You definitely want to use the single or the double quotes

  • there and then be consistent throughout.

  • So on the left in C, we had a whole bunch of header files

  • that we included-- one of which was our own, dictionary.h--

  • and then a bunch of which were some lower level things really used

  • for instrumentation.

  • No problem at all.

  • And the comparables that we'll need are a little different.

  • So there's not necessarily a one-to-one mapping left to right.

  • And in fact, rather than do these preemptively,

  • we'll come back to some of our imports.

  • And let's just trip over the libraries we need once we actually get to them.

  • Otherwise, it's magical just by knowing where we're going with this.

  • But I will say this.

  • We do know that there's a file called dictionary.py from earlier.

  • So I can immediately say, you know what, from dictionary,

  • go ahead and import those functions-- check, load, size and unload.

  • Those are the four functions we did implement earlier.

  • And this way, we can just keep the logic of our spellchecker

  • separate from the implementation details of the dictionary.

  • We could even swap in and out different dictionaries if we want,

  • and therefore implement the API in different ways.

  • COLTON OGDEN: Basically how you would port a header

  • file-- the idea of a header file from C into Python.

  • DAVID MALAN: Yeah, exactly.

  • Now, I've seen in some code-- and frankly, some of my own code--

  • this approach, what would this do?

  • Do you know?

  • COLTON OGDEN: So from dictionary import * means--

  • * being sort of the glob or a catch all operator in Unix and other systems--

  • basically means everything.

  • So from dictionary-- from the dictionary.py file,

  • just import everything and pretend like I wrote this in my own file.

  • So I'll have access to check.

  • I'll have access to unload.

  • I'll have access to size.

  • All those different ones.

  • DAVID MALAN: Yeah.

  • And that otherwise seems pretty handy.

  • Because then I don't have to enumerate all four of those functions.

  • But the catch is, you can potentially scoop in more than you intend.

  • So for instance, we had a global variable in that file called "words",

  • which was that set.

  • And by scooping it up with *, we're accidentally

  • introducing to this namespace, so to speak, speller.py,

  • that same variable, which means I could somehow accidentally screw

  • up and mutate or change that variable, when it really is meant

  • to be used only by the dictionary.

  • Now, we can defend against that in a bunch of ways.

  • We can introduce classes in dictionary.py.

  • We can introduce an actual module and kind of separate that out.

  • But in general, just because * is convenient,

  • does not mean it's good design.

  • And in fact, it would be more conventional and proper to enumerate,

  • even tediously, the functions that you actually

  • care about so that you're importing only what it is you

  • care about, and nothing extra-- especially if someone else goes

  • and changes the implementation of the dictionary, and all of a sudden

  • your variables are colliding.

  • COLTON OGDEN: Exactly.

  • DAVID MALAN: All right.

  • So I've imported the dictionary.

  • Let's go ahead now and start translating some of the code.

  • And for the most part, it will line up, but not necessarily one for one.

  • So we can point out some of the differences as we go.

  • And we might even disagree how we might implement things.

  • But we'll try to do it line for line here.

  • So over here on line 20, in our C version,

  • was the definition of a constant, as it would be called.

  • It's, by convention, in all caps.

  • And then this just hardcodes the path to that big file with 140,000 words.

  • What would you propose be the equivalent in Python here?

  • COLTON OGDEN: Well, Python doesn't really

  • have the idea, or at least the hardcoded idea of constant variables,

  • unfortunately.

  • So what I would probably just do is something like capital dictionary,

  • just like you have right there, gets--

  • then the string, "dictionaries/large".

  • So almost the same thing, just without the define being necessary.

  • DAVID MALAN: So if you can capitalize the word,

  • though, is this not now just a constant?

  • COLTON OGDEN: Not in functionality, but--

  • not ad hoc-- but implicitly a constant.

  • Other programmers reading this are going to see the fact that dictionary is all

  • capitalized, and just by convention across multiple programming languages,

  • all capitalized variables with underscores

  • tend to be constants, even if that idea is not

  • enforced by the compiler interpreter.

  • DAVID MALAN: Yeah, and this is just kind of a language decision.

  • The reality is that different communities and different languages--

  • authors disagree.

  • And the mentality with Python is, well yeah, you could change that value,

  • but just don't, and be a little more mature about it.

  • Now, it goes both ways.

  • I mean, that's fine to be said.

  • And OK, I promise not to touch that variable.

  • But when programs get big and complicated,

  • you might forget which is yours, which is

  • someone else's, even though the capitalization is supposed to help.

  • So a lot of languages--

  • C among them, Java, C++--

  • really try to protect you from yourself, albeit at a price of tedium,

  • when you have to write even more code or more thinking just

  • to get the same kind of work done.

  • COLTON OGDEN: Exactly.

  • DAVID MALAN: All right, so that's a pretty straightforward translation

  • of that line.

  • Below that is a prototype, which, in C, is

  • a proactive declaration of a function's name, return value,

  • and arguments-- at least the types and order thereof.

  • There isn't really a comparable in Python.

  • So we'll skip over that.

  • And in Python, there isn't necessarily a main function.

  • You can absolutely have one.

  • But we won't bother because there's actually going

  • to be no other functions in this file.

  • So we'll keep it simple.

  • So let's focus on argc here.

  • In C, argc is the argument count--

  • the number of arguments that have been passed in.

  • But in Python, you have to do this a little differently.

  • We need a module, right?

  • COLTON OGDEN: We do, yeah.

  • Because Python-- C gives you argc and argv

  • for free as part of its implementation.

  • But it's not something that's necessarily

  • part of Python's implementation.

  • Real quick, though, Srini has a question.

  • "All caps mean generally object or class?"

  • Sorry, I'm not sure.

  • DAVID MALAN: Yeah, good question.

  • COLTON OGDEN: [INAUDIBLE]

  • DAVID MALAN: Oh, you're testing me again.

  • All caps generally means, either you're yelling in an email, or a text message

  • or whatever--

  • sorry for those of you wearing headphones, actually--

  • or it means a constant, or a social agreement

  • that this should be a constant value.

  • When only the first letter is capitalized,

  • or you have a multi-word variable name or a symbol that has capitalization

  • throughout, but not the whole thing, that typically signifies, yes, a class.

  • And then, if you use more traditional camel case, like lowercase letters

  • to begin, or more conventionally, underscore variable names,

  • otherwise known as snake case, then that's just a variable, typically.

  • COLTON OGDEN: Snake case because Python?

  • DAVID MALAN: No, I think name comes from back in the day.

  • Because it looks like a snake.

  • It's just your long string of words with the underscore.

  • COLTON OGDEN: Oh, interesting.

  • DAVID MALAN: Hey, you learned something new here on CS50 live.

  • COLTON OGDEN: It's interesting how coincidental that is, though.

  • DAVID MALAN: Snake case.

  • Why?

  • COLTON OGDEN: Because Python.

  • DAVID MALAN: Oh, interesting.

  • No, I'm pretty sure it predates it.

  • But I'm sure Wikipedia could answer this too.

  • COLTON OGDEN: That's fascinating.

  • DAVID MALAN: We'll check, and next week we'll let you know.

  • So to gain access to our command line arguments in Python,

  • we need to import the system module, which actually gives us access to that

  • and a bunch more features.

  • Or frankly, if we want to be a little more precise

  • and make it look a little more like C, we

  • can just say, from the sys library, import argv, or argument vector,

  • which is a subset of the functionality therein.

  • And so down here, if I actually want to check

  • did the user run this program with the right number of command line arguments,

  • I can actually say something like, well, if the length of argv does not equal 2,

  • and length of argv does not equal 3, then

  • I can go ahead and exit and complain.

  • So I can actually do this in a couple different ways.

  • In Python, you generally use print instead of print def.

  • So you would say "usage: speller" and then,

  • if I mimic exactly what we have in C, I would do something like this.

  • This is going to be the name of the program.

  • If I actually add a shebang and put it in a file called "speller",

  • or install it via package, dictionary is in square brackets

  • because it's meant to be optional.

  • That's the file that you want to load optionally.

  • And then text is the file name that you actually want to spellcheck.

  • And then I can go ahead just like in C, exit with a certain value.

  • But for this, I need to do exit 1, which if I want to return a 1 status code,

  • I would do that.

  • But honestly, you can actually do this a little more simply.

  • You can get rid of all that.

  • And you can just exit with a string.

  • And if you read the documentation for Python, when you do exit with a string,

  • it's going to return 1 for you.

  • But it's also going to print out that value.

  • Now Python-- or the IDE here isn't liking this.

  • Because I've done something wrong.

  • Oh, I'm taking C too literally.

  • You have to literally say "and"--

  • professionals here-- and change it back to some human English.

  • COLTON OGDEN: And so, because dictionary is

  • optional, that means that we're going to essentially use that hardcoded

  • as a default, I'm guessing?

  • DAVID MALAN: Exactly.

  • That's why we have the line 14, a constant that we'll use as a default.

  • Why?

  • Just because we decided that we don't want

  • to force the student to have to specify a dictionary every time

  • they run their program.

  • Instead, we want them to have some default built in.

  • COLTON OGDEN: And we saw len before--

  • the len function-- when we used a set.

  • And now we're using it on argv.

  • So what is argv?

  • What is the data structure that argv is, if it's

  • following the same model of data structures

  • being able to use the length function?

  • DAVID MALAN: Yeah, in this case, it happens to be a list.

  • And this is a list of the command line arguments

  • that the human is presumed to have typed in from index location 0 on up.

  • And so, leng just tells us the length of that list,

  • just like the leng in the context of sets

  • gives us the whole size of the set itself.

  • COLTON OGDEN: OK, awesome.

  • DAVID MALAN: All right.

  • So we're already trimming away some lines.

  • We're on line 17 instead of 32.

  • So it's a little more compact.

  • And frankly, a lot of our lines were comments anyway.

  • So let's proceed here.

  • Now, in C, we had to do something a little different using certain structs,

  • or data structures called rusage structs, before and after.

  • We don't strictly need those in Python.

  • Because we can actually time our program a little more cleanly.

  • But more on that in just a moment.

  • But you'll notice that in C, we did define a few variables--

  • time_load, time_check, time_size, and time_unload-- all of which

  • seem to be floats, because they have decimal points in them.

  • And they're all initialized to 0.

  • I did this on one line on the left hand side, just because you could.

  • You don't strictly have to do that.

  • You could have four separate lines or variables.

  • But I can do roughly the same here.

  • I can say time_load, time_check, time_size, and time_unload equals 0.0,

  • 0.0, 0.0, 0.0, effectively inducing what's called a tuple on the left hand

  • side and the right hand side.

  • A tuple is a collection of one or more values, often written in parentheses,

  • but not necessarily so.

  • But if I want to be really explicit as to what I just did,

  • it's kind of like doing this, which is a nice one liner of doing

  • four things on the right to four things on the left.

  • And in this case, the assignment is what you're actually doing here.

  • COLTON OGDEN: And we can call this tuple unpacking, correct?

  • DAVID MALAN: Yeah, indeed.

  • Because you're going from a tuple with four elements in it

  • to a tuple with four elements in it.

  • But you're really doing a one-to-one mapping

  • between all of the tuple elements.

  • And so the parentheses are not strictly required.

  • The commas imply the presence thereof.

  • All right, so what else?

  • Let's go ahead and actually initialize.

  • Let me bring us back to the middle here.

  • Here we go.

  • Let's put this wall between us again.

  • There we go.

  • All right, so now if we want to go ahead and initialize

  • this variable, dictionary, to be either argv1, which

  • is the name of the dictionary the human maybe typed in, or the default.

  • We can do this with the ternary operator here in Python as well.

  • Let me go ahead here and first, let's see what Python's--

  • oh, accidental breakpoint.

  • No complaints at all.

  • Let me go ahead and just initialize a dictionary variable,

  • no type needed, to either argv1, if the length of argv

  • equals 3, else initialize it to dictionary, just like that there.

  • So you'll see that this is a little different from C

  • syntax in that I'm using if and else there.

  • But this is a nice one liner, a ternary operator

  • effectively, that's assigning dictionary to argv1

  • if, literally, the length argv is 3, else it's assigning it to dictionary.

  • COLTON OGDEN: So dictionary, lowercase now, is our actual 100% dictionary.

  • And then capital dictionary is our ostensible dictionary

  • when we start off, if we haven't actually

  • input a dictionary into the command line when they ran the program.

  • DAVID MALAN: Exactly.

  • Yep.

  • And so we have here another example of Python code

  • that kind of reads like English.

  • On line 21 here on the right, dictionary equals argv1, if the length of argv

  • is 3, else dictionary is what it equals.

  • So it reads a little more naturally than the more arcane C

  • syntax for ternary operator.

  • All right, let's just do a few initializations.

  • Again, this one doesn't map perfectly to Python, because get rusage

  • is very much a C thing.

  • But the closest I could come up with was to declare a few variables like this--

  • the time involved before we're going to run some code.

  • Then we're going to go ahead here and let's get a variable called "loaded"

  • and call the load function in our actual file.

  • Then let's go ahead and get the time after that

  • by using process_time again, which is the same function built

  • into the time module like this.

  • And then, just as a sanity check here, let's ask the same question.

  • Well, wait a minute.

  • If the dictionary was not loaded per its return value there, let's go ahead

  • and bail out and say something like, exit could not load dictionary.

  • You'll notice, though, we might want to substitute in a full sentence there.

  • So I can actually paste in the variable's value,

  • so long as I make this a format string, otherwise known as an F string.

  • Now the IDE is yelling at us here.

  • Do you have a sense of why this red X appeared?

  • COLTON OGDEN: OK, so before time.process_time-- so before--

  • I'm a little confused as to what you're doing there.

  • DAVID MALAN: All right, well let me explain to you a la my rubber duck

  • here.

  • So what I'm trying to do is just figure out what is the current time of day,

  • if you will, right now.

  • Then I want to call my load function.

  • Then I want to ask another question-- what is the time of day now?

  • Because you can infer from that, presumably,

  • if you subtract after minus before, how many seconds, milliseconds,

  • whatever, transpires.

  • COLTON OGDEN: It's basically like having a stopwatch.

  • Time functions like a stopwatch-- start and then stop.

  • DAVID MALAN: Exactly.

  • COLTON OGDEN: But I notice you're still getting the error though, even

  • though you put the equals sign there.

  • So the IDE is helpful.

  • It's giving us some information-- undefined variable time.

  • And so time, presumably, is a Python module just like sys.

  • So it looks like we just haven't included that--

  • [INTERPOSING VOICES]

  • DAVID MALAN: We have to import time, which is kind of a magical concept.

  • But yeah.

  • And that's it.

  • Import time.

  • COLTON OGDEN: What was the--

  • there's a fun little thing you can do in Python.

  • It's like import something, and it redirects you

  • to a web page or something, right?

  • What is that?

  • DAVID MALAN: Oh, I don't know.

  • But you can import functions from the future, which is cool.

  • COLTON OGDEN: Python-- hold on.

  • Python funny import.

  • What is it?

  • What is it?

  • What is it?

  • What is it?

  • Import this, I think.

  • DAVID MALAN: Import this?

  • What is that?

  • What do you mean?

  • COLTON OGDEN: I think it does--

  • let me look it up before we do it on screen.

  • Python import this-- foo.py.

  • [INAUDIBLE] wants me to do what?

  • COLTON OGDEN: Import this.

  • DAVID MALAN: Import this.

  • COLTON OGDEN: You can do it in interpreter too.

  • And I think that will be-- if you can open interpreter.

  • DAVID MALAN: All right, slight digression here.

  • Let's open a terminal window here.

  • Let's run Python 3, the latest version.

  • Let's run a line of code with dash-C and just say import this.

  • COLTON OGDEN: And this is "The Zen of Python" by Tim Peters.

  • DAVID MALAN: That's a wonderful Easter egg.

  • Mind you, on the second screen, off-camera,

  • we are googling funny Python imports.

  • COLTON OGDEN: I remembered that--

  • I didn't remember what it was.

  • Oh, we missed a few comments or things in the chat here.

  • So argv[1] gets the second argument--

  • DAVID MALAN: Hey, wait, wait.

  • I want to import antigravity first.

  • That's do important things first, if we could.

  • COLTON OGDEN: I missed that one.

  • I missed that one.

  • DAVID MALAN: This is trying to make a connection somewhere.

  • COLTON OGDEN: This is the one that does the web page.

  • Here, we can-- do we have a computer on [INAUDIBLE]

  • DAVID MALAN: I can do it on my Mac here, probably.

  • All right.

  • Here, Andre, here we go.

  • Here we go-- Python 3-3, import anti-gravity.

  • COLTON OGDEN: Nice.

  • DAVID MALAN: That's amazing that this is built into an actual language in 2018.

  • I had never actually seen that.

  • That's pretty beautiful.

  • COLTON OGDEN: I think I saw it a couple of years ago.

  • But I forgot what it was.

  • DAVID MALAN: OK, now Serenity is being a very good student here

  • and is following along.

  • Yes, time was missing.

  • So, Serenity, you are absolutely correct.

  • Apologies for the digression here.

  • We got a little carried away with xkcd.

  • But, yes, import time was missing.

  • So replacing that does actually fix that problem.

  • And then, let's see-- we had another question from SB Venner.

  • COLTON OGDEN: SB Venner--

  • on line 16, would it not be a little better to say something

  • like argv underscore line equals line argv?

  • And then on the next line, say, if line underscore argv is not equal to 2

  • and line underscore argv is not equal to 3,

  • it would save calling the length function method twice?

  • DAVID MALAN: It's a good question.

  • I don't know if we'll agree on this.

  • Short answer-- yes, theoretically, by tucking away that answer in a variable,

  • you don't have to ask the same question again.

  • I would generally argue, though, that it's

  • a premature optimization, if you will.

  • There are so many other things I could be doing poorly in this program.

  • And just counting the number of command line arguments efficiently probably

  • isn't one of them.

  • The reality is, too, the human is the one

  • that's going to be providing these command line arguments most likely.

  • And he or she is not going to type all that many words.

  • So argv is probably going to be size one, two, three,

  • maybe 10, but most likely not.

  • And so even if we are, in linear time, checking the length of those things,

  • it's still going to be super fast.

  • And, frankly, if Python's interpretation of the implementation of a list

  • is efficient, Python is probably using its own variable

  • to keep track of the length of a list, so that it

  • can return its length in constant time.

  • So good instincts.

  • But those are the kinds of details that you should generally

  • relegate to the library and trust or check

  • the documentation to see if it's doing things as smartly

  • as you might be inclined.

  • COLTON OGDEN: If we were incurring this function call six million times

  • instead of a nested for loop, I would say

  • that would be a good point and to cash that value.

  • DAVID MALAN: OK.

  • That's fair.

  • COLTON OGDEN: For this app's, I don't think that's necessary at all.

  • Two function calls versus six million function calls--

  • I think that's a different kind of optimization, for sure.

  • DAVID MALAN: I agree.

  • And I would say it doesn't have to be six million.

  • If we're talking about a dictionary with 140,000 lines,

  • I would worry about anything you're doing inefficiently with that input,

  • not the human's two or three command line argument.

  • But totally valid, but probably not something you should worry about.

  • Better to spend those mental cycles on harder problems.

  • COLTON OGDEN: Exactly.

  • Trinity up above also asked-- argv[1] gets the second argument or the first

  • one?

  • DAVID MALAN: So in argv[0], as in C, is the name of the program itself

  • or the name of the Python file, which would be speller.pi in this case.

  • So argv[1] gets the first command line argument,

  • but the second word that the human provided at the command line,

  • if that distinction makes sense.

  • COLTON OGDEN: And the word Python itself doesn't even get included.

  • If I'm remembering correctly, it doesn't get included in argv at all.

  • DAVID MALAN: Correct.

  • Just the name of the file is in argv[0].

  • COLTON OGDEN: In C, though, you do get the dot slash,

  • whatever your script is, as the first--

  • DAVID MALAN: Whatever the name of the file.

  • But same in Python 2.

  • If you did Python space dot slash foo.py,

  • you'd see that as well in argv[0].

  • DAVID MALAN: Trebinator--

  • COLTON OGDEN: Do you want to go ahead and read his question?

  • DAVID MALAN: Yeah.

  • When having a try with a struct try letters--

  • an array-- what is a letters initialized when using malloc?

  • So the initialization of that is totally up to you.

  • If you're just calling malloc, it's not getting

  • initialized with anything whatsoever.

  • You're just getting back a chunk of memory--

  • if enough is available-- equal to the number of bytes that you've requested.

  • If you want to clear that memory, setting it equal to all zeros--

  • so not technically null-- null is a pointer value--

  • but all zeros, you can use calloc, or clear alloc,

  • which is just another C function.

  • Usage is identical, but it will do the clearing.

  • Now, historically, people would use calloc very sparingly, or not at all,

  • because why bother clearing your memory if you're just

  • going to overwrite it yourself.

  • But in the case of a try, and in a CS50 spell checker,

  • if you're planning to clear that memory anyway,

  • as with a for loop or a while loop, then, yeah, let calloc do it for you.

  • Because it probably is using a for loop or a while loop itself to do that.

  • COLTON OGDEN: I was just reading an article recently about calloc and how

  • it was not really used historically too.

  • DAVID MALAN: No, because--

  • I mean, performance.

  • Like, literally, by calling calloc you are asking the computer to do,

  • let's say, twice as much work.

  • Give me memory.

  • Oh, and by the way, clear it for me.

  • I mean, clear your own memory was kind of the mindset.

  • And only initialize it when you absolutely have to.

  • COLTON OGDEN: And that would've been O of N, too, right?

  • Because it doesn't have to go over every single byte in that set and [INAUDIBLE]

  • DAVID MALAN: Most likely, unless there's some hardware

  • optimizations where you can actually clear

  • a whole page of memory efficiently.

  • COLTON OGDEN: But I guess that'd be up to the [INAUDIBLE]

  • DAVID MALAN: Beyond my bi familiarity.

  • All right.

  • So where did we leave off?

  • We had just benchmarked, if you will, the running time of the load

  • function by computing after and before.

  • So if we actually want to compute the total time involved in loading,

  • let's go ahead and call that time underscore load,

  • and go ahead and calculate that value-- which is actually pretty easily done--

  • just do after minus before.

  • In C, we have to do some annoying calculations

  • because of how those structures work.

  • Python, you just don't need to worry about that.

  • You can just compute it with arithmetic.

  • COLTON OGDEN: Did we write those functions

  • ourselves that calculate functions?

  • DAVID MALAN: We wrote calculate ourselves.

  • If you go to the bottom of this file, you'll

  • see an annoying function that actually dives

  • into this data structure being used.

  • You just don't need to do that in Python at all.

  • COLTON OGDEN: Import time.

  • DAVID MALAN: Import time, or anti-gravity if you prefer.

  • So this is going to be a heck of an implementation before long.

  • So let's see what we're actually going to code live versus just kind of whip

  • out a sample solution from the oven here.

  • But let's see how this actually maps to the code in question.

  • Because I think we'll probably just want to pull up a spoiler at some point

  • here.

  • COLTON OGDEN: It's getting close.

  • It's 4:06.

  • We have around 24 minutes-ish before the stream officially ends.

  • DAVID MALAN: So let's see how far along we get here.

  • Let's go ahead and open the actual text.

  • So on the left-hand side here, we get some pointers involved.

  • Thankfully, those are not going to be involved here in Python.

  • But let me go ahead and convert this as closely as I can here.

  • Give me a variable called text on the left.

  • Let me go ahead and go into argv[2].

  • If the length of argv equals 3, else argv[1].

  • So this is just giving me the last command line argument,

  • which is either position 2 or position 1,

  • depending on how many total words were passed in.

  • Then let me go ahead and open the file called Text in read-only mode.

  • And just for good measure, I can actually specify an encoding,

  • though this might not strictly be necessary

  • given what dictionary you're using.

  • But I'm going to specify Latin 1, which is essentially Ascii, as it

  • is in our C problem set here.

  • Notice I misspelled argv, so the IDE is yelling at me

  • with that little red circle.

  • And then I'm just going to do some error checking.

  • So if there's not actually a file, we might have screwed up,

  • or the user might have messed up.

  • So I'm just going to go ahead and say something like print could not

  • open the name of that text.

  • Let's go ahead and make this an F string, or format string.

  • Then let's go ahead and call on load--

  • whatever it does-- even though we secretly know

  • it doesn't need to do anything in Python.

  • And then just go ahead and exit with 1.

  • So some of these lines aren't strictly necessary.

  • We could introduce a context manager, like you proposed before.

  • But I wanted to handle the errors exactly like we are in C,

  • just so that you can kind of see C to Python left to right.

  • COLTON OGDEN: There's one thing on line 33

  • that I would suggest for folks who might not

  • know some of the shortcuts you can use with lists in Python.

  • For example, all we're essentially trying to do here on line 33

  • is just take the last--

  • no matter what it is, we're always going to take the last thing, right?

  • So instead of using argv[2] or argv[1], we can always take,

  • without any condition, argv[-1].

  • And that will just give us the last element in that list.

  • DAVID MALAN: Indeed.

  • Now this does assume that earlier in the file,

  • we have validated that this has the right number of command line arguments.

  • Because if it's got like 10 arguments, we

  • don't want to take the 10th, or the 9th, or whatever.

  • We want to make sure it has exactly the right number.

  • But we did that, indeed, earlier already.

  • I like that a lot.

  • Let's keep that.

  • All right.

  • So after that, notice that we want to prepare to report the misspelled words.

  • That's actually kind of an easy one-liner.

  • So let's give ourselves some room and scroll up here

  • so we have a place to put this.

  • All right.

  • So right up here, let's go ahead and just

  • print out exactly as we did in C, using some new lines to format it.

  • Nothing really interesting there.

  • Misspelled words, couple new lines.

  • Notice-- I only need one new line, though,

  • because we get one for free with Python.

  • Because the end of a line in Python is by default that.

  • But if we know that, we can just leverage that assumption.

  • Now let's go ahead and prepare to spell-check the word.

  • And I'm going to go ahead and give myself

  • an empty word, which is the equivalent of this line over here.

  • In C, you have to know how many characters you're going to fit.

  • It turns out from the problem statement--

  • if you read into CS50's homework-- you'd see that the maximum length of any word

  • is going to be "length," which is a constant, which is like 45

  • or so characters, defined elsewhere.

  • But we don't care in Python, because the words can grow and shrink.

  • Then I'm going to go ahead and initialize some variables on the Python

  • side to be like an index of misspellings and words,

  • all equaling 0, 0, and 0, using this fancy [INAUDIBLE] approach, as before.

  • All right.

  • Very welcome, Trebinator.

  • All right.

  • So now let's go ahead and spell-check every word in the file.

  • This is painful in C because you have to take baby steps in reading the input.

  • The problem in C is that you can't really

  • safely read in from an arbitrary text file an arbitrary number of bytes.

  • You have to decide in advance how many bytes maximally you're going to read.

  • The problem, though, is if we're reading words from a file,

  • I don't know how long or how short my lines are going to be.

  • It could be infinitely long, or however much memory I actually have on disk.

  • And so here the best way in C is to take baby steps--

  • read one little character at a time and just accumulate those characters,

  • rather than blindly read as many as you can.

  • Python-- I don't have to worry about any of that.

  • I'm going to go ahead and do the following in an infinite loop,

  • while true as follows.

  • If I want to read a character from the file, let me go ahead and just say--

  • oops, that's not how you spell true in Python.

  • Let me just go ahead and say file.read one character here.

  • If there's not a character, then I'm going

  • to go ahead and break out of this.

  • That means I'm at the end of the file.

  • But if I actually want to now assemble words from the file-- and this

  • is what's tricky here.

  • Unlike the dictionary, where I can just read one word at a time from a line,

  • the file we're spell-checking might have sentences and paragraphs.

  • They might have apostrophes and punctuation and all this nastiness.

  • So we need to actually distinguish.

  • So I'll do this one a little quickly, essentially

  • porting our C from the left to our Python code here at the right.

  • But among the approaches I could take here is this--

  • if re.match this regular expression--

  • A-Za-z, c-- or it is the case that c equals an apostrophe

  • and I have already read in part of a word,

  • implying that this variable called index from our C version is greater than 0,

  • then I can go ahead and append this current character to the word.

  • And I'm going to go ahead and increment my index counter advancing in the word.

  • So what am I doing here?

  • I'm tip-toeing through the file that we're spell-checking,

  • taking one character at a time.

  • So if the word in the file is "Colton," I'm getting a C-O-L-T-O-N--

  • yay, ra ra.

  • So we're reading in Colton and one character at a time

  • and then just appending it to a variable called "word," incrementally

  • building up your name.

  • COLTON OGDEN: And index you're using to keep track of the fact

  • that you're inside a word, right?

  • DAVID MALAN: Exactly.

  • Now if, meanwhile, I've read too many lines,

  • I might actually say this-- so if the index is actually

  • now longer than the maximum length of a word that's allowed,

  • I'm actually going to go ahead and consume the rest of the string

  • and just throw this away, because it's not supported.

  • Something is wrong.

  • So while true, I'm going to go ahead and read in one more character.

  • And if it is not a character or it is not the case

  • that that character matches the following regular expression,

  • or pattern, as it really is, A-Z or a-z, then what I'm going to go ahead and do

  • is just break out of this all together.

  • Essentially, something has gone wrong.

  • If nothing has gone wrong, I'm going to go ahead and reset index and word to 0

  • in the empty string, with a little one-liner there.

  • But I could write that syntactically a little different

  • and actually proceed as follows.

  • So I don't like how the IDE is yelling at me in so many different places here.

  • Let's see-- undefined variable re.

  • COLTON OGDEN: Well, we solved that problem before.

  • DAVID MALAN: We did.

  • So that's re-- regular expression-- refers to pattern matching

  • capabilities of a library.

  • So let's try that trick before-- import re, scroll back down,

  • come on-- magic, nice-- we've whittled this down further.

  • How about here?

  • Undefined variable length.

  • So I cut some corners here and I didn't actually

  • define the length of the longest word.

  • Should've done that way early on.

  • Looks like I skipped a step.

  • Length equals 45.

  • I think I just said that one verbally.

  • All right.

  • So let's go back down here.

  • Voila.

  • All my code looks correct, of course.

  • And now let's go ahead and check a couple other things.

  • So this line up here was saying if each of the characters we're reading one

  • at a time out of the file is alphabetical--

  • A-Z-- uppercase or lowercase-- or it is an apostrophe,

  • like Colton's-- the possessive-- then we're going to go ahead and accumulate

  • it.

  • But if we run into other things, we want to handle those differently.

  • So else if it is the case that C is a digit--

  • this is a method built into strings--

  • then I'm going to go ahead and just consume these.

  • Let's just get rid of them, because they're not part of the file

  • I want to read.

  • So while true, let's go ahead and read one more character with file-read(1).

  • Then let me go ahead and say, well, if that's not actually a character,

  • or it's not an alphabetical character, and it's not a digit,

  • then let me go ahead and do this with it--

  • all on one line here--

  • let's go ahead and just break out.

  • And then let's go ahead and reset ourselves-- index word gets 0 here.

  • So we're going kind of quickly through the Python version.

  • But, again, the goal here is to convert the C code on the left to the Python

  • code on the right.

  • So just to fast forward to where we are in the story,

  • we've just implemented on the left this C code with isdigit with the equivalent

  • Python code on the right.

  • COLTON OGDEN: Is this all distrib code that they usually get, or this

  • is the solution that we're looking at?

  • DAVID MALAN: This is all distribution code.

  • So everything on the left students were given.

  • They didn't have to come up with this all on their own.

  • COLTON OGDEN: Because they just had to implement the dictionary files,

  • or dictionary functions, rather.

  • DAVID MALAN: Indeed.

  • So I think, why don't we go ahead and--

  • COLTON OGDEN: Oh, sorry, we got a question from Andre.

  • DAVID MALAN: We got a decent amount left.

  • Let's take this question from Andre.

  • Of course.

  • COLTON OGDEN: Can I just ask-- line 76 in C--

  • let's take a look at line 76--

  • is a for loop that sometimes causes some confusion.

  • Is this something you would typically use?

  • Or is it there just for pedagogical purposes

  • to show they aren't strictly limited to simple incrementation

  • within a for loop?

  • DAVID MALAN: It's a good question.

  • And, Andre, I do think you make a valid point.

  • In CS50 we tend not to show for loops involving non-numeric initialization

  • and updates in for loops.

  • We pretty much always use I and J and K and so forth.

  • I would say we're using this because it is the clearest design.

  • The for loop does exactly this-- initialize it

  • to some value, check a condition, and then iteratively do something

  • again and again and again.

  • And as such, that is the right looping construct.

  • You could use a while loop and build it that way.

  • But this is what a for loop is good at.

  • So if anything, it's my fault or the course's fault

  • that we don't show students different paradigms of for loops.

  • But all the better if they get to see it here in the distribution code.

  • COLTON OGDEN: Yeah, I think if you have a discreet way of knowing what the end

  • result of your loop is, without having some non-deterministic thing happening

  • in the loop-- which is normally infinite--

  • I think that's a great reason to use a for loop at all costs.

  • I think it's easier to read.

  • DAVID MALAN: But if you want us to sound smarter then, yeah, absolutely,

  • this is a pedagogical decision.

  • It's informed by years of experience.

  • And we really just wanted students to glean some additional insight

  • from our distribution code.

  • COLTON OGDEN: I think it is illustrative of that paradigm, though.

  • I think it's a good point.

  • DAVID MALAN: I know.

  • That's fair.

  • But I think we should just 'fess up, that, well,

  • we just did it that way because.

  • COLTON OGDEN: It just happened to be what we felt was the best.

  • DAVID MALAN: Indeed.

  • And, amazingly, we've been talking so long, that Trebinator has implemented

  • load in C. So congratulations.

  • That's some really good progress.

  • I suppose it helps that we're showing the logic in Python.

  • But that's fine.

  • Zamyla has walked through, similarly walks students

  • through the process of thinking about how you load.

  • But nicely done, Trebinator.

  • COLTON OGDEN: I think somebody taking some Python and implementing in C

  • is much more impressive than the opposite.

  • DAVID MALAN: So for all those of you who have just done the opposite--

  • C to Python-- Colton is not impressed.

  • Just kidding, just kidding.

  • So why don't we go ahead and finish up this last for loop.

  • And then I think I'll probably pull out some of the actual solution code,

  • because it's going to get tedious to type

  • out some of the last stuff, the printing stuff.

  • So here we just claim if, at this point in our code,

  • we have found a whole word, then we can go ahead and do the following--

  • elif index is greater than 0, thereby implying

  • that we've accumulated at least one character,

  • let's go ahead and increase our counter of words, which is a variable we

  • initialized way up above to 0.

  • Then let's do some timing.

  • So the time before we're about to do this benchmark we'll call before again.

  • And let's just call time process time with no arguments.

  • Then let's go ahead and get the return value of check word,

  • thereby checking if it's misspelled or not.

  • And then let's get the time after-- time.process.time.

  • And this is actually cool.

  • Notice here that we did this one-liner.

  • So we want a variable called misspelled to be true if the word is

  • misspelled and false otherwise.

  • The catch is that check returns true if a word has checked out

  • and it's not misspelled.

  • But that's OK.

  • In C you can invert true and false, or false and true

  • with a bang, or an exclamation point.

  • In Python, you don't do that.

  • You actually literally say not.

  • And that reverses the logical effect from true to false or false to true.

  • COLTON OGDEN: And a lot of people will read the line on the left

  • as not check word either way.

  • DAVID MALAN: Yeah, they'll still say not, even though in Python you

  • literally have to write "not."

  • COLTON OGDEN: We got a few-- it looks like Trebinator says--

  • DAVID MALAN: Hey, we rock.

  • Nice.

  • Or maybe that's for Trebinator.

  • Good job.

  • COLTON OGDEN: I missed initializing my letter array with--

  • [INTERPOSING VOICES]

  • DAVID MALAN: I don't think it's Trebinator then.

  • That's OK.

  • That's OK, you caught it pretty fast.

  • We'll be posting on Facebook.

  • We'll take a look.

  • This is our first one.

  • We're going to do some post-processing.

  • And just ping us on email or on Facebook to ask after this if you don't mind.

  • COLTON OGDEN: I think it'll at least be on YouTube.

  • DAVID MALAN: Yeah.

  • All right.

  • So let's finish this up.

  • So I've just done a benchmark here with checking for misspelled.

  • Let me do some total time checking here.

  • So time check is going to now equal the difference between after minus before.

  • That's the total amount of time.

  • And we're doing this in our loop, which is why I'm now accumulating.

  • We want to do this for every word, see how much time we've totally spent.

  • And now here, if a word was misspelled, let's

  • go ahead and print that word, which is required by the problem specification

  • in the homework assignment itself.

  • And then let's just increment misspellings by 1

  • to keep track of how many words are misspelled.

  • And then let's get started to do another loop.

  • Index and word should be reset to 0 and quote-unquote.

  • Could do this in different ways, but I just

  • adopted that little tuple approach.

  • And then assuming all of this went super well, let's close that file.

  • And as before, we could use context manager and let that happen for us.

  • But we're trying to do a one-to-one mapping with the C code.

  • And indeed, if I scroll down here, you'll

  • see a called fclose, which is also checking.

  • There's some f error code which isn't necessary in Python,

  • but checking if there was an error, what went wrong.

  • We would ideally do these in Python using exceptions anyway.

  • You know what?

  • We're almost at the bottom of the file.

  • Let's take it home.

  • COLTON OGDEN: I think we should.

  • We should do it.

  • DAVID MALAN: Let's make ourselves a spell checker here.

  • COLTON OGDEN: Scotty says, "You guys are awesome."

  • Thanks, Scotty.

  • DAVID MALAN: All right.

  • Here we go.

  • Here we go.

  • Don't say that until we get to the end here.

  • Now let's go ahead and count how long it takes to compute the dictionary size,

  • even though it's probably pretty fast since we're just

  • checking a variable's value.

  • N shall be the size, the result of calling our size function.

  • And after, of course, becomes time.process.time.

  • Then here let's store in our variable time

  • underscore size, which we initialized earlier to 0, which

  • strictly speaking wasn't necessary.

  • We were just doing that for parity with the C version.

  • Now let's go ahead and time the unload function, which should really be fast.

  • Process.time.

  • COLTON OGDEN: As an aside, for the P set,

  • is there a maximum time that the students are

  • given for their solution to work?

  • DAVID MALAN: Not really.

  • I mean, they're given a week to implement it.

  • But the code itself can run as long as they'd like.

  • Realistically, if they run it through check 50,

  • our auto-grading infrastructure, probably

  • gets stopped automatically after 30 seconds.

  • But even with the slowest of data structures, like an array,

  • that should be plenty of time, even to spell-check some of those larger files.

  • We're really talking milliseconds, not so much seconds or minutes.

  • After we've done this, let's do an after benchmark, calling time.process.time

  • again.

  • And then let's go ahead and check--

  • if not unloaded, let's go ahead and yell as much to the user--

  • could not load our dictionary, just as in C.

  • COLTON OGDEN: 104's got a very subtle bug.

  • DAVID MALAN: If not unload--

  • oh, you are right.

  • I was accidentally checking a function value.

  • That was not going to work.

  • Nice catch.

  • Let's go ahead and exit with this one there.

  • Let's scroll up.

  • And I honestly don't really want to type out all these prints in Python.

  • This is very ugly looking.

  • So just like Julia Child, we're going to go ahead and go

  • in to our Python version here, which is in our speller.pi file.

  • And voila, let's move it over to the right-hand side.

  • Voila--

  • COLTON OGDEN: Magic.

  • DAVID MALAN: Look at that.

  • I suppose it's mostly copy-paste, although the format strings change

  • and so forth.

  • So if you're not familiar, let's point out a couple of things here.

  • One, we're using print instead of print def.

  • Two, we are using f strings or format strings to plug those things in there.

  • And, three, we're using these format codes with f strings,

  • where you specify the variable name that you want to print, colon,

  • and then dot some number, which is the number of decimal points

  • you want to place.

  • Print-- 2, in this case-- and then f for a floating point value.

  • And then the nicest line of all is this exit success.

  • We don't need to do sys, since I didn't import sys in that way.

  • You can just use exit in this case.

  • But sys would be more proper version, I'd say.

  • COLTON OGDEN: For folks who might have familiarity

  • with older Python, but not 3.6 and over, the f string is a new feature.

  • DAVID MALAN: It is.

  • You need this in Python 3.6.

  • COLTON OGDEN: Previously, you used percentage signs

  • and the string.format function.

  • And it was a lot uglier.

  • But now you can just preface a string with f

  • and then include the sort of bracket variables

  • in there with format specifiers and it'll just work.

  • DAVID MALAN: Indeed.

  • And Scotty529 asks, "Are there any more in-person visits to Harvard available?"

  • Indeed.

  • Let me see if I can remember the URL off-hand.

  • I think it's CS50 lectures 2018--

  • no.

  • CS50 lectures-- there we go.

  • Is this the right year?

  • Let's just check.

  • 2018.

  • Yeah.

  • So if you go to-- let me enhance--

  • picatic.com/cs50-lectures, you will see a landing page where you can sign up

  • for free tickets to any of CS50's lectures that remain this fall.

  • We only have a few left, if this is of interest to you,

  • and you can travel to Cambridge or you'll be in the Boston area

  • and want to drop by.

  • They're all on Friday mornings.

  • Thereafter, you can go on a free tour of the campus

  • with some of our undergraduate friends, who are kindly

  • offering those right after lecture itself,

  • and then see a bit of the university too.

  • Any other questions from folks here tuned in?

  • COLTON OGDEN: We didn't run it.

  • We got to run it, right?

  • Are we going to run this?

  • DAVID MALAN: Oh, you want to see that my code actually works?

  • Oh, I missed-- did we miss that question?

  • COLTON OGDEN: No.

  • DAVID MALAN: OK.

  • Now you're really putting me on the spot.

  • Now we have to trust that the code actually works.

  • So let's go ahead and open up a terminal window here.

  • Let me go into, if you will, just the pre-baked version,

  • just in case I did anything stupid here, and go into speller.

  • In this directory, notice, I've got speller.py and dictionary.py

  • from before.

  • In dictionaries, recall that we have large and small.

  • We use the big one.

  • And if I do ls on text, we see a whole bunch of sample texts.

  • Let's say, Shakespeare, which is a pretty big file.

  • In fact, if you want to look at Shakespeare in the text directory,

  • this will actually be a pretty big file, as indicated by the progress bar there.

  • Come on, come on, come on.

  • And you'll see the complete works of Shakespeare.

  • So, mind you, there's going to be a lot of misspelled words

  • that are really just old English.

  • But that's fine, because we are spell-checking against something.

  • So let's go ahead and run this.

  • Let me go ahead and run Python, or Python 3, to be specific on, let's say,

  • the large dictionaries.

  • So we'll go into dictionaries-large, and then we'll run this on--

  • let's do this on texts and Shakespeare.

  • Something went wrong.

  • Wait.

  • What am I doing?

  • Oh, I'm not running--

  • I'm trying to interpret--

  • COLTON OGDEN: Can't interpret the dictionary file, right?

  • DAVID MALAN: Yes, key step, key step.

  • OK, so let's actually run speller.py.

  • OK, that was going to be the worst ending to any stream here.

  • And now let's do this on large.

  • Aardvarks is not a Python command.

  • Here we go.

  • All right, here we go.

  • We are spell-checking all these words coming out,

  • recall, are the result of that print statement.

  • Because I have a pretty tall screen right now

  • and because this is a web-based IDE, my spell-checker seems to be god awful.

  • But that's just an internet latency thing.

  • It's just taking some time for all the print statements to get spit out.

  • And, indeed, if you actually compute the total CPU time running that spell

  • checker on the complete works of Shakespeare, it only took 1.7 seconds.

  • The number of words that were misspelled was 45,000 or more.

  • The number of words in that dictionary is indeed just over 140,000.

  • The number of words in Shakespeare is a very talkative 904,000-plus words.

  • And you can see a breakdown of the number

  • of milliseconds we spent in each of those functions checking

  • taking the most time.

  • COLTON OGDEN: And those print statements--

  • those are all about misspelled words, correct?

  • DAVID MALAN: All the misspelled words were what got spit out.

  • And so if we actually hid that print statement,

  • we would actually see these results much more quickly.

  • It's just because of the internet speed and the printing

  • thereof that creates the illusion that the spell-checker is slower than it is.

  • But those running times are the result of time.process.time,

  • which is indeed an accurate count.

  • COLTON OGDEN: Based on the words that I saw coming up on the terminal,

  • it looked like it was all pretty correct to me.

  • DAVID MALAN: I think that's proof by correctness.

  • COLTON OGDEN: Scotty says, "Do you guys play video games?

  • What's your favorite game right now?

  • Do any help you be a better programmer?"

  • I'll let you answer first and then I'll follow up.

  • DAVID MALAN: I don't really anymore.

  • I think the last video game I played was with Colton.

  • We played the newish Zelda--

  • COLTON OGDEN: Breath of the Wild--

  • DAVID MALAN: Breath of the Wild on Nintendo Switch.

  • But I realized that if I only allowed mys--

  • we played until I essentially needed to fall asleep, at which point

  • I decided, OK, that's it for Switch, Wii, Nintendo for the year.

  • COLTON OGDEN: I'm pretty sure you fell asleep too.

  • DAVID MALAN: Did I fall--

  • OK.

  • So, yes.

  • But it was fun.

  • I think it was after last year's CS50 fair.

  • We decided, no, we're going to get through the school year.

  • Then we're going to play a few hours of Switch.

  • COLTON OGDEN: Was it after the hackathon?

  • I think--

  • DAVID MALAN: No, it wouldn't have been after the hackathon.

  • COLTON OGDEN: Maybe it was after the fair.

  • DAVID MALAN: I think it was after CS50 was all done.

  • This is like a year ago now.

  • COLTON OGDEN: Two years ago.

  • It was 2016.

  • DAVID MALAN: We didn't even play anything after last year's CS50 fair.

  • COLTON OGDEN: No, you're right.

  • It was like during J term of 2017.

  • DAVID MALAN: All right, so the point is no, is really--

  • when I was a kid, I used to play Commodore 64, Macintosh games.

  • We weren't allowed to have a Nintendo, but I

  • used to play on my best friend's Nintendo when he let me borrow it.

  • So--

  • COLTON OGDEN: That's rough.

  • DAVID MALAN: So did any of that help me be a better programmer?

  • Yes.

  • You can-- yes.

  • CS50 at Harvard says that playing video games makes you a better programmer.

  • Honestly, no, not in my case.

  • I don't think I played enough or thought about it

  • to any kind of sophisticated level where that's an issue.

  • But, Colton, maybe?

  • COLTON OGDEN: Certainly, games back then were less flexible in that area.

  • Scottie mentions, I heard you can program in Minecraft.

  • That's absolutely true.

  • There are certain mods in Minecraft-- one of them

  • is called Computer Craft, which actually lets you

  • program in Lua, which is pretty cool.

  • You can actually program the game blocks in the world in Lua in real time.

  • It was called Computer Craft.

  • I'm not sure if it's been redone recently.

  • This was several years ago.

  • But there's games like SHENZHEN I-O, which is a hacking game.

  • And TIS-800, I believe, are also another programming game.

  • Very sort of-- well, not realistic, but kind of realistic.

  • It requires the same logic.

  • And I would say if you're looking for games that could help you program,

  • definitely check those out.

  • As for what I'm playing, right now, I'm playing Fallout New Vegas, which

  • is not helpful for programming at all.

  • DAVID MALAN: And Colton's being modest here.

  • He's not plugged a course he teaches here

  • at Harvard's Extension School, that's now freely available on edX.

  • If you go to CS50.edx.org/games, that will whisk you away to this website

  • here, where you can register for free.

  • They want to know our location for some reason.

  • But you can enroll now for free by clicking the audit option.

  • And this will actually introduce you to Lua the programming language,

  • Love 2D, which is a gaming library framework that you

  • can use to make your own games, and, actually then, yes,

  • claim it is homework to be playing games.

  • COLTON OGDEN: True.

  • Yeah, and I think, going forward too, I would definitely

  • like to start streaming games made using these same technologies.

  • So if any of those appeal to you, definitely check the course out.

  • DAVID MALAN: And a question here--

  • "Guys, I'm planning to enroll in a computer science degree,

  • but it will be after a year from now.

  • I finished CS50.

  • What course do you recommend me to study during my preparing to the CS degree?"

  • Totally depends.

  • A typical undergraduate here at Harvard, for instance,

  • would generally take another software class,

  • specifically a class in functional programming

  • or object-oriented programming.

  • And I think that's-- for those folks interested in software especially

  • and introductory computer science--

  • it's a nice way to kind of counterbalance

  • the very procedural programming that we do in a class like CS50,

  • and a lot of courses also do, albeit some courses

  • do focus on object-oriented stuff.

  • But I also recommend of course by some of our friends at Princeton.

  • There's a free introduction to algorithms class that's on Coursera.

  • If I do a quick Google here--

  • Coursera with Kevin Wayne and his colleague on algorithms at Princeton--

  • this should be freely available, part 1 and part 2.

  • You should be able to sign up with Bob and with Kevin here on Coursera.org,

  • also for free.

  • And this is more theoretically oriented class.

  • But that also gives you another perspective

  • as to what CS is and should definitely challenge you in a different way.

  • COLTON OGDEN: Do you know Robert, by chance?

  • Have you met him?

  • DAVID MALAN: A little bit.

  • We've met once or twice.

  • But I know Kevin much better from past conferences.

  • COLTON OGDEN: Interesting.

  • I happen to know his--

  • I've seen his books, Robert's book.

  • DAVID MALAN: Yeah, the two of them now actually writes

  • quite a bit I think, together as well.

  • COLTON OGDEN: Nice.

  • And I think Trebinator mentioned-- you can program mods for it.

  • I think you have to use Java.

  • That is correct, yes, for Minecraft.

  • DAVID MALAN: So please feel free to join us here again online.

  • We're going to figure out what the next day and time is.

  • We'll publicize it on the Facebook channel, Twitter, subreddit

  • and the like.

  • We'll figure out how to get some of these assets online too,

  • if you want to follow along later today or thereafter.

  • Certainly tell your friends if you'd like to join as well.

  • COLTON OGDEN: Yeah, and definitely let us know what sorts of topics

  • you all are interested in us going over and teaching.

  • I know I'm going to probably cater towards game programming,

  • but I'm more than happy to do like Python and web stuff,

  • even though web's not necessarily as much of a strong suit.

  • And I think we're also talking to Brian about him setting up--

  • I think Brian is going to maybe set up a microblog-- do a live microblog

  • implementation.

  • DAVID MALAN: Does Brian know this yet?

  • COLTON OGDEN: Yeah, he does.

  • DAVID MALAN: Oh, he does now.

  • OK.

  • He didn't this morning.

  • COLTON OGDEN: Oh, no, I talked to him yesterday.

  • DAVID MALAN: Oh, you did.

  • Oh, OK, he knew as of yesterday.

  • COLTON OGDEN: But, yeah, we have a bunch of ideas.

  • But let us know what's obviously most valuable to you,

  • so we can have a better idea of where to go.

  • DAVID MALAN: Yeah, I think we're pretty much at the end point here.

  • Special thanks to Dan, who's been off-camera here, helping us

  • run the show.

  • And all the credit for this idea goes to Colton,

  • who spearheaded this idea with Twitch.

  • So we're really glad you all tuned in today.

  • COLTON OGDEN: Yeah, thanks, everybody for tuning in.

  • It was a lot of fun.

  • DAVID MALAN: Yeah.

  • See you online soon.

DAVID MALAN: Hi, everyone.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it