Placeholder Image

Subtitles section Play video

  • [MUSIC PLAYING]

  • DAVID J. MALAN: All right, this is CS50 and this

  • is lecture 6 and as you may recall, today we

  • begin to transition away from this low-level world of C

  • and command line programming into to a domain that's probably

  • a little more familiar, that of the web, and yet

  • all the ideas that we've been exploring thus far like functions and loops

  • and conditions and so forth are still going to be relevant.

  • It's just we're going to start using slightly different syntax and the user

  • interface, or UI, is now going to be your browser instead

  • of a black and white terminal window with just a simple textual prompt,

  • but how did we get here?

  • Well, recall that we've looked recently at structs

  • and what was nice about structs in C was that we had the ability

  • to make our own custom data types and to kind of encapsulate together

  • related data, and that became pretty powerful

  • when it came time for forensics to actually manipulate bitmap files

  • or JPEGs, and even though this struct is way more complicated

  • than a student structure, at the end of the day

  • it's just individual data types that are all somehow interrelated,

  • and by putting them in a struct you can move them all around

  • and copy them and save them all together as you might have done as well,

  • but then most recently did we introduce a somewhat fancier structure,

  • which is still the same idea.

  • It's got like one or more things inside of it,

  • but now, more powerfully, one of those things

  • had this star or asterisk, which gave us, of course, a pointer or an address,

  • but what was so powerful about this simple idea and this seemingly

  • simple symbol is that now we can kind of stitch together in our computer's

  • memory any kind of structure we want.

  • It doesn't have to just be one entity.

  • It can somehow be linked to another and you

  • can keep linking these structures together as well, and this of course

  • was an improvement on perhaps our simplest of data structures early on,

  • an array or a list, but of course, as soon

  • as you have pointers can you begin to link things together

  • until we got something like this and perhaps now with the dictionary

  • implementation you yourself might be exploring a linked list, a hash table,

  • a [? try ?] or some variant in between.

  • And then lastly, there was this painting of a picture, whereby

  • this is your computer's memory put a little more descriptively,

  • and this is germane only insofar as your computer uses

  • different chunks of memory differently.

  • All of your function calls end up using the stack.

  • All of your users of malloc and its cousins

  • end up using the heap, and then of course there's this up here,

  • and what was the text segment, which we didn't really dwell on?

  • What was the text segment all about?

  • Text-- you're being volunteered.

  • Yes, what's the text segment?

  • AUDIENCE: Files information?

  • DAVID J. MALAN: Files information, yeah.

  • Specifically, the 0s and 1s that compose the actual program.

  • So when you compile your source code, like hello.c, into 0s and 1s,

  • those end up getting stored in this location in memory

  • while the program is running.

  • So long-term they're stored on disk or your hard drive or whatever's inside

  • of the computer or the server so that the files persist even when the power

  • goes off or you walk away from the keyboard,

  • but as soon as you double click a program on your Mac or PC or as soon

  • as you do ./hello or some command like that,

  • those same 0s and 1s get loaded into your computer's RAM,

  • the picture we keep showing, and that's where they live while they're in use

  • by your Mac or PC or the actual server,

  • but thus far we've been running all of these programs with something like

  • ./hello or some similar command and running them just in the so-called

  • terminal window.

  • But you are probably most familiar, certainly

  • with more graphical apps on your phones these days

  • and any time you visit a web browser on your phone

  • or on your desktop or laptop, you're still interacting with a program.

  • It's just that program is not only running on your Mac or PC.

  • Your browser is, like Chrome or Edge or Firefox or Safari

  • or whatever it is you use.

  • That's running on your Mac or PC, but what

  • you're communicating with is a program elsewhere,

  • somewhere else on the internet and those programs are called web servers.

  • A web server is just a piece of software that some human or humans wrote

  • and their purpose in life is to serve web pages.

  • When you request the homepage of Facebook,

  • there is a server out there, a program someone wrote,

  • that essentially spits out the 0s and 1s that compose Facebook's homepage,

  • but nicely enough, those 0s and 1s are not written as 0s and 1s

  • by Facebook engineers.

  • They're actually written as something a little more English-like,

  • a little more familiar, and it's not even programming code, per se.

  • It's what's called markup language, and we'll soon see that and more today.

  • So we've gone from compiling your code and running it

  • like this to actually doing that in a web-based environment,

  • but of course, when you're running your own programs in CS50 IDE down here,

  • you're actually using another piece of software

  • that fills the screen, CS50IDE, a.k.a.

  • Cloud9, which is the program essentially running somewhere in the cloud.

  • And we'll start to make this distinction and through examples

  • will the distinction among these different types of software

  • begin to make sense, but where is something like CS50IDE running?

  • Where is Facebook running?

  • Where is Google.com running?

  • Well, back in 1998, Google.com was running on this.

  • This was Larry and Sergey, the founders of Google's, very first implementation,

  • apparently, of their first rack of servers.

  • So servers are generally stored literally in a rack like this.

  • It's usually like 19 inches wide by convention

  • and you just stack computer on top of computer on top of computer,

  • but things were very bare bones back in the day of Google

  • and so there weren't even plastic or metal

  • cases around a lot of their computers.

  • They were trying to minimize cooling, minimize cost, presumably, and cram

  • as much hardware into that footprint as they could,

  • and so you actually see a lot of the wires and hardware

  • kind of sticking out, and this is on display now out west.

  • Of course these days, fast forward just a decade or two, and this

  • is one of Facebook's data centers where it's the exact same idea,

  • but much fancier, much prettier, much better lit servers,

  • but who serve, at the end of the day, the exact same role.

  • There are bunches of servers around the world that are just

  • sitting there waiting for you on the internet

  • to make a request for a homepage, for an email,

  • for any other type of information so that it ends up

  • getting sent from server to client.

  • And in fact, if you've ever thought about those words server and clients,

  • which is probably the lesser used of the two,

  • but a server-client relationship is what you have when you go into a restaurant,

  • and you ask the waiter or waitress for something to eat.

  • He or she brings something back to you, thereby serving you,

  • the client, and the relationship on the web is pretty much the same thing.

  • We are the clients.

  • Our browsers are the clients and out there

  • are servers like these who are serving up content and information,

  • such as on Facebook.

  • So let's consider how all this data even gets to us.

  • So odds are, these days, if you want to visit Facebook.com on your laptop

  • or desktop or phone without using the app, you probably just

  • type in Facebook.com and hit Enter.

  • If you're a little older school or literally older,

  • you might just actually type out the entirety of www.Facebook.com.

  • Both work and there are technical reasons for that related to the topic

  • we'll talk about today, but both of these work just

  • because Facebook has configured their website

  • to work in either of those addresses.

  • Now, as an aside, why are so many websites therefore prefixed with "www"

  • if both of them actually work?

  • Like, why have both?

  • It seems just like redundant to type "www."

  • if it's implied by Facebook.com.

  • Yeah, what do you think?

  • AUDIENCE: Is the "www" required?

  • DAVID J. MALAN: Is it required?

  • Nope, not required.

  • Not required.

  • Yeah?

  • AUDIENCE: Is it to identify that it's part of the World Wide Web?

  • DAVID J. MALAN: Kind of, yeah.

  • It's to identify that it's part of the World Wide Web

  • and no one really says World Wide Web these days.

  • We of course just say web, but back in the day and back in my day, frankly,

  • it wasn't obvious to a lot of human beings

  • what Facebook.com might actually even mean, irrespective of the fact

  • that it didn't exist at some point.

  • And so there was this sort of signal to the world

  • whereby you just started prefixing domain names with "www" just

  • to make super clear to users, oh, this is a website!

  • This is one of those things on the internet or the like,

  • and also back in the day, there were also

  • different services that have fallen into disuse these days,

  • like FTP was quite popular and Gopher, we used it

  • when I was here and other such things.

  • And so "www" was just an arbitrary prefix that just kind of said what it

  • is, but these days we humans pretty much know what a .com is and .net and .edu,

  • but even that kind of road is changing again because there's dozens,

  • hundreds of top-level domains.

  • It's not just .com and .edu and others now.

  • I mean, there's hundreds of these things out there

  • and so it might even be non obvious to this day.

  • So some people, therefore, go really all the way in and type out http://

  • and then the address that they want to visit,

  • but odds are most of us don't do this because our browsers just help us out

  • and prefix that, but that's where our focus will be today.

  • Like, this actually has significance because it specifies

  • what protocol or language, what convention your computer,

  • your laptop should use when talking to that server's address,

  • and actually, if you want the communications to be secure,

  • odds are your typing or your browser is doing it for you.

  • Adding an s there, denoting secure or encrypted a la Caesar and Vigenere

  • from some weeks ago and technically your browser

  • is also probably adding a trailing slash even

  • if it's not shown to you, which denotes you want the root of the server,

  • like the default homepage or something else.

  • In fact, maybe you do want something else.

  • You don't want just Facebook's website, you want Mark's page,

  • and so you could specifically /zuck or whatever the username actually is.

  • So this is a very long way of saying all this kind of stuff

  • that we type or autocomplete and take for granted these days actually

  • has some very fundamental meanings, all of which

  • make possible the entirety of the web.

  • So what actually goes on with HTTP and what does that actually mean?

  • So HTTP is a protocol.

  • It is a set of conventions that dictate how a computer

  • client, like a browser on your Mac or PC, talks to a web server,

  • and it's a protocol in the sense that it's not a language, per se.

  • It's really just a set of conventions and so

  • like this is kind of an arbitrary and awkward human convention.

  • Hello, I'm David.

  • AUDIENCE: I'm Kara.

  • DAVID J. MALAN: Kara, so Kara and I just introduced ourselves.

  • I extended my hand and she kind of knew instinctively

  • that it would be awkward not to shake my hands or to shake my hand

  • and so we exchanged pleasantries and said hello.

  • So this is just kind of a silly human convention

  • whereby we've agreed sort of socially in advance

  • how to greet each other in that way.

  • So HTTP is pretty much the same thing, but in this case

  • you're not actually physically doing something like that.

  • You're kind of sending a message from client to server.

  • You're putting a sort of handwritten note into an envelope this,

  • addressing it somehow and then sending it off on the internet for Kara

  • or for Facebook.com or Google.com to actually receive,

  • and then when Google or Facebook or Kara receives that note,

  • reads and sees what I want, the server or the human

  • responds in some according way.

  • So what then goes inside of this envelope?

  • Well it turns out that when a web browser,

  • like Chrome or Edge or Firefox, Safari, make a request,

  • the message they put inside of one of those envelopes, albeit virtually,

  • is literally this text.

  • It's like if I had it written down on a piece of paper literally GET / HTTP/1.1

  • Host: www.facebook.com and then the "..."

  • just means there's other stuff in there, but it's less fundamentally interesting

  • right now.

  • So what's this all mean?

  • GET is just a verb and it kind of says what it means.

  • Go get something from the server.

  • HTTP/1.1 mentions the version of HTTP that I am using or the human convention

  • that Kara and I were actually implementing there,

  • and so 1.1 tends to be the one most in use these days, and then /, again,

  • it's just like the default identifier for the homepage of a website,

  • the default page that you see in the absence of typing something like

  • and /zuck or some other suffix.

  • Host: Is the same thing as whatever's on the outside of the envelope.

  • So if I'm sending a message to www.Facebook.com,

  • I'm just making super clear inside of the envelope which server should expect

  • this request just in case there are multiple websites running

  • on the same physical server, which is possible for economic and performance

  • reasons these days.

  • So alternatively, if I were trying to visit Mark Zuckerberg's homepage,

  • the request in that envelope's going to look almost the same,

  • but I'm going to be more precise.

  • /zuck instead of just /.

  • Meanwhile, if I'm requesting something from Yale's homepage,

  • the request would look like this, or from Harvard's web page,

  • the request would look like this and so forth.

  • So once Harvard or Yale or Facebook w actually

  • received the request in that envelope, opened it up, look at it,

  • how do they decide how to respond?

  • Well at the end of the day, I'm probably expecting

  • to get back from the web server some kind of, excuse me,

  • web page whereby I want to see my news feed on Facebook

  • or I want to see the search page on Google

  • or I want to see Harvard's homepage, Yale's homepage

  • or whatever it actually is.

  • So there's a lot of information probably packed into that envelope,

  • but there's also a conventional, a standard,

  • response that looks literally like this.

  • So at the very top, for instance, of the "letter" that comes back from Google

  • or Facebook is a message like this.

  • Got it.

  • I'm speaking HTTP version 1.1 also.

  • Everything is OK and 200.

  • We'll come back to that in second and then

  • the type of content inside of the envelope,

  • if I keep digging deeper into it, is going

  • to be text, but more specifically, HTML, and we're

  • going to focus on that today too.

  • HTML, Hypertext Markup Language.

  • This is going to be the language in which web pages themselves are written.

  • Then there's usually some other stuff and way down there

  • is the actual contents of Yale's or Harvard's or Facebook's homepage,

  • but let's zoom in on this for just a moment.

  • 200, odds are you've never seen or cared to see this kind of number before,

  • but have you ever used the web and requested a web page

  • and seen some number that for some reason keeps popping up in your life?

  • AUDIENCE (IN UNSION): 404.

  • DAVID J. MALAN: Yeah, 404.

  • It's just kind of a weird thing that many of us in the room

  • know 404 even if we're not necessarily technophiles

  • and know what HTTP is, but it turns out that in these envelopes coming back

  • from servers sometimes are not just 200 OK, but instead--

  • dammit, typo.

  • This would be much more effective if I said it's not this.

  • It's not found.

  • So inside of the envelope is 404 not found, which means exactly that.

  • The file was not found that you were actually seeking.

  • You mistyped the URL, the page was deleted.

  • Somewhere or other, there was some kind of typographical error

  • and it turns out there's a lot of the status codes in HTTP

  • and there are even more than these, but these

  • are the ones we might see the most commonly.

  • 200 OK means all is indeed well.

  • 404 means not found.

  • 403 forbidden might be if you've not logged in

  • or don't have the right access in order to access some folder

  • or file on some website.

  • This is really bad and we'll get to know this over the coming weeks as we

  • ourselves start implementing code on a server.

  • 500 internal server error, if you will, shall be our new segmentation fault,

  • but hopefully not too frequently.

  • It means something is wrong in the code on the server.

  • This was an April Fools' joke back in 1998 I believe, yeah.

  • So April Fools', some humans decide it would be funny to announce to the world

  • that there's yet another code, which is 418 I'm a Teapot,

  • which kind of comes up from time to time in actual code and then there's this

  • one--

  • 301 Moved Permanently.

  • It's kind of a scary sounding thing, as though a website just

  • kind of up and left and went elsewhere, but it's a powerful mechanism

  • in the following way.

  • If a server inside of one of these envelopes

  • responds with a response like this, there

  • tends to be one other piece of information at least.

  • So if I visit a website like http://harvard.edu,

  • I might get back in the response from Harvard's web server

  • this answer, 301 Moved Permanently.

  • Like where the heck did Harvard go?

  • Well you can see the location based on this other line

  • and all of these things collectively moving forward we're

  • just going to call HTTP headers.

  • Anytime you see a word and a colon, that's

  • an HTTP header with a name and a value and the first one

  • ones a little anomalous in that there's no colon,

  • but that's the only one without the colon.

  • So location colon http://www.harvard.edu.

  • Well what's going on?

  • Well, if I actually visit Harvard's homepage exactly as follows,

  • let's take a look at what happens.

  • I'm going to go to http://harvard.edu, Enter.

  • And notice there's a whole bunch of more stuff happening on the screen thanks

  • to what's called autocomplete, which is a feature of Chrome or my browser.

  • It has nothing to do with the topic at hand.

  • This is just Chrome trying to be helpful today as on your computer too

  • and suddenly, even though I tried to go to http://hardvard.edu, ,

  • where did I clearly end up?

  • HTTPS, so they added the s somehow and what else has it added?

  • [VARIOUS ANSWERS FROM AUDIENCE]

  • DAVID J. MALAN: Yeah, the web.

  • The www prefix was added.

  • So this is not sort of all that important to the user

  • like I got to my destination somehow but the reason for that is as follows.

  • I'm going to go ahead and open up, in the IDE actually,

  • just a terminal window here and I'm going to use a new program called Curl

  • for connect to a URL ://harvard.edu, Enter.

  • And I get back some cryptic looking things and that's actually HTML,

  • and we're going to come back to this in just a moment

  • because it turns out there's two parts to the messages coming back.

  • There's the headers and then there's the content, and we're seeing the content.

  • So more on that in a bit.

  • I want to look a little higher up in the response and literally just look

  • at the headers, and to do that-- and you would only know this from reading

  • the documentation--

  • -I means show me just the headers that are coming back.

  • So here now we see the headers coming back

  • and you'll see indeed we got back a 301 Moved Permanently,

  • and then there's some other stuff we haven't really focused on,

  • but at the bottom is something we have--

  • location, which says to the browser go to this URL instead.

  • All right, so let me do that.

  • Let me save time and just copy paste this and then do curl -I of this,

  • Enter, and pretend to be a browser requesting that page now, but now

  • where are they trying to send me?

  • HTTPS.

  • So this suggests via some mechanism, some human at Harvard

  • decided one, uh-uh.

  • We're not going to be called like harvard.edu.

  • We shall be www.hardvard.edu for whatever reason

  • and then they also decided that if a user visits us using HTTP, which is not

  • encrypted, not secure, we're going to forcibly tell them to come back

  • via secure channel, and we won't dwell today on how that's implemented,

  • but much like in Caesar or Vigenere where was a way to encrypt or scramble

  • information, browsers can do that too and it's

  • implied by using the HTTPS instead of just HTTP.

  • All right, so let's actually visit this one more time.

  • Let me go ahead and highlight that location.

  • curl -i of that address and now an overwhelming amount of information

  • coming back, and that's why I kept putting the ...'s, but the juicy stuff

  • is at the top.

  • Now everything is 200 OK and indeed, if I run it without -I

  • so I see the contents of the envelope, it's

  • like looking deeper inside of the envelope,

  • now I actually see a lot more content, which collectively

  • composes Harvard's homepage, and it turns out

  • we can see this even in Chrome.

  • Let me go over to my browser again and if you've not done this before,

  • it turns out that you can go to your View menu, Developer,

  • and go to Developer Tools-- and we'll do this in upcoming problem sets--

  • and I can go here and see a whole bunch of features, only a couple

  • of which we might look at today.

  • Specifically, I'm going to click on this Network tab.

  • So to be clear, Developer Tools in Chrome still shows me the homepage,

  • but it kind of dedicates part of the screen

  • to these special developer tools that make it easy to understand and actually

  • create websites.

  • So eventually we'll start using this ourself,

  • but what's nice about the Network tab is that you can sniff or monitor

  • all of the requests going back and forth between browser

  • and server in the so-called envelopes.

  • So I'm going to hit a little Clear symbol here first just

  • to get a clean slate.

  • I'm going to click preserve log so I can actually see what's happening

  • and now I'm going to go ahead--

  • actually, I'm going to go ahead and do this.

  • http://harvard.edu, so the sort of incorrect version that I'm going

  • expect the browser to fix for me.

  • I hit Enter.

  • A whole bunch of stuff is flying across the screen

  • and in fact if we zoom in on this, you can

  • see that just visiting Harvard's home page

  • requires 85 envelopes it would seem going back and forth with pieces

  • of the webpage and we'll see soon with some of those pieces are,

  • but it's not just one file coming back.

  • It's bunches of files.

  • Maybe images, maybe fonts, or some other things too,

  • but I'm going to scroll up in this output

  • and now notice the story that's been told here too.

  • So the very first request, which I can hover over and see,

  • came back with a 301, which we now know is Moved Permanently,

  • or it's a redirect.

  • Then if I hover over the second one, you'll

  • see that it's a slightly more precise URL, www, but still with HTTP.

  • So that got redirected and then lastly, if we look at the third line here,

  • this is the one we ultimately ended up at

  • and indeed it comes back 200, as do bunches of other results thereafter,

  • and we'll see what those 200s actually mean.

  • Now, you can do a little better than this

  • and it's perhaps fitting that our friends down the road indeed did.

  • Let me go back to the IDE.

  • Let me go ahead and clear this and instead of curling harvard.edu,

  • let me do http://yale.edu and ask the question,

  • what would be a better approach-- knowing these ingredients that we now

  • have of how redirects work.

  • How could Harvard do better in terms of getting the user to the address

  • that we intend them to be at?

  • Yeah.

  • AUDIENCE: By not forcing like, two redirects?

  • DAVID J. MALAN: Yeah, by not forcing two redirects, right?

  • Even if some of this material is new, we've

  • long talked now about correctness and design and style

  • and we've seen some messy style on the screen and that's fine for now.

  • More on that later.

  • It seems to be correct because it's working,

  • but it feels like it could be better designed

  • because why make one request then make another request just

  • to fix the first request then make a third request just

  • to fix the second request?

  • Why not combine them?

  • And, as it turns out, someone down the road had that same intuition

  • and so we visit yale.edu with just HTTP and without the www,

  • they, in one fell swoop, actually redirect us

  • to the right place in this case.

  • So, with that said, it's perhaps fitting that just

  • a few years, well, some years ago now, you

  • might have tried to visit this particular address,

  • and this is something I can only do in Cambridge.

  • If I go ahead and open a new browser and go to http:// shall we say

  • safetyschool.org and hit Enter if you've never been.

  • Oh, interesting!

  • [STUDENTS LAUGH]

  • DAVID J. MALAN: And apologies for those of you tuning in online

  • live from New Haven.

  • So how is this possibly working?

  • It's actually a very simple heuristic.

  • If instead of selecting Yale or Harvard or any other address,

  • if I literally do like safetyschool.org, we can wrap our mind around

  • what's going on underneath the hood safetyschool.org has moved permanently

  • to New Haven it would seem, but it's via this very simple mechanism that someone

  • back in 2000 registered this domain name,

  • and so actually as I was looking this up in the history last night,

  • I was amused to find that whoever bought the domain has been paying for this

  • domain name now for 17 years for this joke annually, but it's well worth it,

  • but I think it would be--

  • [STUDENTS LAUGH]

  • DAVID J. MALAN: But I think it's only fair now,

  • it's only fair if we take a look at another one too.

  • It turns out that if you visit harvardsucks.org, that one has also

  • redirected, this time to www.

  • So let's follow this little breadcrumb. curl -I harvardsucks.org,

  • and this one's OK.

  • So that means something lives at harvardsucks.org

  • and it does not as cleverly redirect to harvard.edu,

  • but to introduce this, let me actually introduce

  • a friend of ours who's now very awkwardly visiting from New Haven

  • today.

  • Hi Natalie.

  • Do you want to come on up and say hello for just a moment?

  • So this is Natalie, who is our head of the class with Benedict Brown

  • and [? Anushri ?] and with [? Staleos ?] in New Haven.

  • If you'd like to say a quick hello?

  • Hi, Hi, everyone.

  • DAVID J. MALAN: So nice to have you here today and as you know--

  • do you want to make mention of what we're about to see here?

  • What happened back in 2004 just a few years later?

  • AUDIENCE: We did a prank back, basically.

  • DAVID J. MALAN: OK, so perfect set-up.

  • Thank you very much.

  • Hello to Natalie.

  • Let me go ahead and hit play on three minutes

  • that are kind of hard to justify academically,

  • but it's perhaps one of the best pranks that's ever been played.

  • Long story short, our friends down the road

  • got together with a few of themselves just before Harvard Yale, which

  • was to be at Harvard that year and actually

  • mapped out using software, a sort of grid system

  • that lined up with all of the seats in the Harvard stadium,

  • whereby you assume that a human each takes up some amount of space,

  • and then they used special software to figure out

  • how they might spell something out in the audience in a way that

  • would be readable to the opponents, the Yalies, on the other side.

  • So if we could dim the lights for this look back

  • at yesteryear and a slight use of software.

  • [MUSIC PLAYING]

  • - All the way at the top.

  • - This is for you Yale.

  • We love you Yale.

  • - We're here to cheer for Harvard.

  • - Yeah!

  • Let's go Harvard!

  • - Yeah, Harvard!

  • - Take the top one and pass it down.

  • - It's not going to say something like Yale sucks is it?

  • - It says Go Harvard.

  • - We're nice.

  • - You see that shit?

  • Look at them, they have the paper!

  • It's gonna happen!

  • It's actually gonna happen!

  • I can't [BLEEP] believe this.

  • - What do you think of Yale?

  • - They don't think good!

  • - It may be a complete mess, I don't know.

  • - Does everyone have it?

  • Does everyone have their stuff?

  • - The probability that it's gonna be legible is very small.

  • - It's gonna happen!

  • It's gonna happen!

  • - It's too complicated.

  • - Look, look at all the signs.

  • - I know but it's too complicated.

  • - Uh, what houses are you guys in?

  • That's not a real house.

  • - Ho-fo?

  • - Yeah.

  • You guys aren't from Harvard are you?

  • - No, fo-ho.

  • Pforzheimer!

  • - Yeah, but he said ho-fo.

  • - Let's just make sure everyone has it.

  • - Well she's probably drunk.

  • - Are all the cards disributed?

  • - Almost!

  • [APPLAUSE]

  • [CHEERING]

  • - Hold up your signs!

  • - They [BLEEP] did it!

  • [CROWD CHANTING "YOU SUCK!"]

  • - They [BLEEP] did it!

  • They [BLEEP] did it!

  • [CROWD CHANTING "YOU SUCK!"]

  • - What do you think of Yale sir?

  • - They suck!

  • - One more time!

  • - One more time!

  • - Oh and there it goes again!

  • [CROWD CHANTING "HARVARD SUCKS!"]

  • [END PLAYBACK]

  • DAVID J. MALAN: All right, we've been talking

  • about what goes on inside of this envelope,

  • but what goes on on the outside?

  • So when you hand off this envelope from your laptop or your phone

  • to the internet, how does it actually get to its destination?

  • Well you've probably heard this acronym IP, or internet protocol,

  • and it turns out that every computer on the internet and every phone

  • in this room and any very laptop in this room has a unique address.

  • That unique address is known as an IP address and it's much like the address

  • of a building in the real world, like the Science Center might be a 1 Oxford

  • Street Cambridge, Mass 02138, USA.

  • Down the road is the CS building.

  • 33 Oxford Street Cambridge, Mass 02138, USA.

  • So those long strings uniquely identify buildings

  • in the world for the mail service and the like

  • and similarly do IP addresses uniquely identify computers on the internet.

  • These addresses are much more succinct though.

  • They're not long strings they're instead just numbers that have four parts

  • and each of those numbers within the IP address are a value from 0 to 255.

  • So the lowest IP address is all zeros and the biggest IP address

  • is all 255s with some constraints.

  • You can't quite use all of those numbers.

  • So just as a sort of quick teaser, if the smallest number is 0

  • and the biggest number for each of these sections of the IP address is 255,

  • how many bits are being used for each of those four numbers?

  • AUDIENCE: 8.

  • DAVID J. MALAN: Yeah, 8.

  • So remember like 8 bits gives you 2 times 2 times 2 times 2 times 2 times

  • 2 times 2 times 2, which is 256, and indeed we have 256 total values from 0

  • on up to 255.

  • So an IP address is 8 plus 8 plus 8 plus 8, or 32 bits total,

  • or, just come really full circle with week zero, if you have 32 bits,

  • roughly how high can you count?

  • Like what's 2 to the 32 power?

  • Yeah, it's roughly 4 billion.

  • So, long story short, the implication of this very simple definition

  • is that apparently there can only be, in this model, four billion computers,

  • phones, refrigerators, internet of things, devices on the internet at once

  • if they do all need an IP address that's unique.

  • So I've been telling a slight white lie in that they

  • don't have to all technically be unique because there's

  • ways we can share addresses, and it turns out

  • there's even bigger addresses these days that aren't just 32 bits but 128

  • bits, which is just massive and daresay unpronounceable how big that number is.

  • So we've gotten ahead of this issue, but you'll find that in a lot of locations,

  • companies and internet service providers like Comcast and Verizon and the like

  • and campuses like Harvard and Yale, you can notice that they tend to follow

  • patterns, like many of the IP addresses here at Harvard start with

  • 140.247.something.something or 128.103.

  • Down the road in New Haven, a lot of the IP

  • addresses there start with 130.132 or 128.36,

  • which is not at all interesting to the humans who are using these IP

  • addresses, but it is useful to the servers or the devices that

  • are actually routing these envelopes from one place to another.

  • Meanwhile, in our homes and even sometimes on campus these days,

  • there are also what are called private IP addresses, which

  • are numbers within these ranges, and this

  • has been a solution so that when you sign up for Verizon or Comcast

  • back home or your parents do for internet service,

  • you technically only get one IP address from your internet service provider.

  • That's what you're paying for per month, but thanks to something

  • called network address translation and other technologies,

  • you can actually give all of your siblings and parents

  • and family members or roommates in the household their own unique address.

  • It's just private in the sense that no one else on the outside world

  • can access it unless you initiate the connection.

  • So this is generally why at home you can reach

  • any website you want any service on the internet that you want,

  • but you can't have like random people necessarily

  • trying to get into your laptop or your device at home

  • because there's a device, a home router, that translates these private addresses

  • into otherwise public addresses, but for now the takeaway really

  • is just that every computer on the internet has an IP address,

  • and if you've ever poked around your Mac, like under System Preferences,

  • you can actually see this.

  • So I've just pulled up a screenshot here of a network control panel on Mac OS

  • and if you look roughly there on your own Mac,

  • you should see that your IP address is something.

  • It will completely vary by person and by geography,

  • but you'll see your IP address there.

  • On Windows, at least Windows 10, you can see your IP address

  • under Settings here as highlighted here.

  • So this has a very different address, but that's

  • just because this person was on a different network all together.

  • So, where did these IP addresses come from?

  • Well back in the day someone would literally

  • come to your home to set up your Comcast or your Verizon internet service

  • and he or she would like type in these numbers into your Mac or PC

  • and then leave, and you would have one computer on the internet.

  • These days it's a lot more dynamic.

  • You don't need someone coming by.

  • That certainly doesn't scale very well because there's other protocols.

  • HTTP is this protocol we talked about earlier about web pages,

  • but there's other protocols like Dynamic Host Configuration Protocol, which

  • is a mouthful but it just means that our Macs, our PCs, Android phones, iPhones

  • and the like, if they speak this protocol, when you first

  • turn on your phone or boot up your laptop it knows,

  • if it has support for this protocol, to just announce to the internet,

  • hello world.

  • I'm awake.

  • What should my IP address be?

  • This just kind of broadcast message and if Harvard or Yale or Comcast

  • or Verizon or wherever you are in the world

  • has a DHCP server whose purpose in life is just to listen for those hellos,

  • that server should respond using the same protocol with your actual IP

  • address, and it figures out which one to give you based on and available pool

  • of numbers typically.

  • So that's how you might get this but there's

  • other things in these control panels.

  • In fact, if we look a little lower on Windows, there's DNS servers too.

  • Domain Name System.

  • Another acronym and a bit of a mouthful, but you can also see this on Mac OS/2

  • if you actually click Advanced and actually take a look.

  • Here, for instance, there's mention of something else altogether, a router.

  • So there's lots of different addresses going on here

  • and lots of different servers.

  • So how do these all piece together?

  • Well, DNS is an interesting one in that it's

  • going to be the one that translates domain names to IP addresses, right?

  • None of us ever probably visits http:// and then a number, right?

  • Like, we visit facebook.com, google.com or the like,

  • but that's because our computers knows how to translate one to the other.

  • So in fact if I do this command, nslookup for name server look up

  • and then I type in something like google.com, I'm asking the computer,

  • in this case, the IDE, what is the IP address of google.com.

  • I know it as the human as google.com, but the internet knows it

  • by its numeric unique address, and it turns out Google has several,

  • and even this is a bit of a white lie because they have thousands,

  • but the ones that my computer is being told to use

  • is, for instance, this one or this one or any of these other addresses.

  • So let me see what actually happens here.

  • If I highlight that address and open up a browser and go to http:// and that IP

  • address and hit Enter, notice it actually seemed to work.

  • Well, why is that?

  • It's a little hard to see it in Chrome, but let's

  • go ahead and open up the Inspect tab and go to Network just like before.

  • Let me click Preserve Log so that it saves everything here,

  • and I could be using curl.

  • So the curl was just the simpler version.

  • Now I'm using the more familiar graphical version.

  • Let me go ahead and do that again and go to http:// and that IP address and hit

  • Enter.

  • A whole bunch of stuff flew by even just for Google's homepage,

  • but notice what happened.

  • On that very first-- whoops--

  • request, if I hover over it, I see http:// and then the number that I

  • typed in, but it's a 301 because, what was the response?

  • We can actually see these responses.

  • Let me click on the status code here, or the row, go to Headers

  • and notice here, if we zoom in, we'll see that Google

  • responded with this location.

  • So someone at Google just decided, OK, fine.

  • You figured out one of our IP addresses.

  • That's great, but we don't want you to see that in the URL.

  • It's bad for branding.

  • We don't want you to bookmark an IP address

  • because it might change later on.

  • So we're using the same mechanisms as before,

  • but that's how we might do the lookup and we can see the same thing

  • for any number of websites.

  • Here we go nslookup of harvard.edu and we get back just a couple here.

  • If I do the same on Yale, I'm going to get back different IP addresses.

  • Yale has even more in this case and so this

  • is how the computer's figuring out to where to send the data.

  • So what goes on this envelope then, it's going

  • to be not facebook.com harvard.edu or yale.edu,

  • it's actually going to be the address like 1.2.3.4

  • or whatever the actual IP address is of the server I'm trying to send to.

  • Now, of course, I expect a response from the server.

  • I want to get back my news feed or I want

  • to get back Harvard or Yale's homepage.

  • So what more should I probably put on this virtual envelope,

  • just intuitively?

  • Yeah.

  • AUDIENCE: Your own IP address?

  • DAVID J. MALAN: What's that?

  • AUDIENCE: Your own IP address.

  • DAVID J. MALAN: My own IP address, yeah.

  • So just like in the human world, just in case something

  • goes wrong with the post office, I might put my own address, 5.6.7.8,

  • and actually put that on the envelope so that if something goes wrong or, better

  • yet, if something goes right and they're ready to give me a 200 OK,

  • it can actually come back to me because they know from which

  • address this thing actually came from.

  • So who is it or what is it that's doing all of this routing?

  • Well it turns out there's servers on the internet called quite simply

  • routers, otherwise known as gateways, which is just a synonym,

  • and they're kind of artistically pictured here as just dots

  • across the world, and there's hundreds, thousands, tens of thousands

  • of routers.

  • Odds are you yourself at home, if you had internet access,

  • have at least one such router and its purpose in life,

  • again, is to take data from inside your household and send it to the internet,

  • and then any responses you get, to send it

  • back to the appropriate laptop or desktop or phone

  • or smart device that happens to be in your own home.

  • And we can actually see this too.

  • Let me go ahead and in CS50 IDE, try one other command.

  • I'm going to go ahead and type traceroute and I'm

  • going to trace the route, say, to yale.edu from here,

  • or technically from the IDE.

  • So if I hit Enter here, we're going to see a few lines of output,

  • and if you try this at home, just realize

  • I've configured my IDE a little differently to simplify the output.

  • So it looks like there's five steps between Cambridge

  • and New Haven or technically the IDE and New Haven,

  • but what are each of these steps?

  • Well between here and Yale, if we continue that version of the story,

  • there are, it seems, five routers.

  • There are five computers that have like lots of RAM, big CPUs

  • that can handle a lot of internet traffic

  • that are figuring out how to get my envelope from this origin

  • to this router, to this router, to this router, to this anonymous router,

  • to this one.

  • Sometimes the routers are configured not to answer these questions

  • from this program traceroute.

  • They sort of keep it to themselves, and you

  • can see on the right of each of these IP addresses some numbers.

  • So just take a guess, what do each of these numbers represent, perhaps?

  • Whats that?

  • No it's okay.

  • AUDIENCE: Milliseconds?

  • DAVID J. MALAN: Milliseconds, yep.

  • So milliseconds that are measuring what do you think?

  • Time to go, or time to reach that specific router.

  • So we can kind of infer--

  • and this is the kind of amazing thing.

  • To get me to New Haven takes like two plus hours,

  • but to get an email, to get an envelope with a message

  • takes like 10.597 milliseconds to get data from here to there,

  • and then hopefully back if it's a request for a page.

  • Let's do something a little farther away.

  • So let's do like stanford.edu, tracing the route here,

  • and already we can see that the numbers are a little bit higher,

  • and that makes intuitive sense in that Stanford's

  • a little farther away than New Haven and it takes as many 41 milliseconds

  • to reach that.

  • If I go even further and I read like a company's news

  • like cnn.co.jp, which is the top-level domain for a lot of servers in Japan,

  • you can see a real uptick in just how many milliseconds it takes,

  • and in fact, there's something curious here.

  • Why does it take so much more time to get from router number three

  • to router number four do you think?

  • AUDIENCE: The ocean.

  • DAVID J. MALAN: The ocean, yeah.

  • So there's a really big body of water in between the US's west coast and Japan's

  • coast, which probably explains why not just between three and four,

  • but really every router thereafter is that many milliseconds away.

  • So these aren't cumulative.

  • We're measuring constantly from here to there,

  • from here to slightly farther, from here to slightly farther.

  • So it makes sense that once you cross that ocean,

  • that's kind of the total value that you're actually going to see,

  • and it's fascinating really.

  • I mean, throughout the entire world there

  • are not only wireless technologies today, but very much wire technologies

  • and if we take just a few seconds, we can

  • see this visualization of so many of the transoceanic cables that have actually

  • been dropped by big ships that carry many, many, many, many bits from one

  • coast to another.

  • [VIDEO PLAYBACK]

  • [MUSIC PLAYING]

  • [END PLAYBACK]

  • So, with all of those cables capable of transmitting data all around the world,

  • it turns out there's still one more problem.

  • Even if we want to do something simple like

  • download an internet image of a cat because there's

  • different types of servers out there.

  • There's my computer here like my laptop.

  • I'm running Mac OS or windows.

  • There's all those servers in Google's data center

  • and in their racks and Facebook's and the like

  • and in between all of those servers there are lots of routers,

  • but it turns out that those servers in those racks at Google,

  • at Facebook, even at Harvard and Yale, there

  • are servers that can do multiple things because technically,

  • even though we humans tend to talk about servers as being physical devices,

  • a server is, as we started today, really just a program.

  • It is a piece of software that someone wrote that, when run,

  • listens for requests on the internet and responds to those requests,

  • generally by spitting out information, text or 0s and 1s or, in some cases,

  • cats.

  • So upon receiving an envelope, then, how is

  • it that a server knows whether it's a request for a web page or it's an email

  • or it's a chat message or a voice message or any number of other things?

  • It turns out we need one more piece of information at least on this envelope.

  • It turns out that the world has standardized

  • via another protocol called TCP, Transmission Control Protocol, that you

  • need at least one other number on these envelopes,

  • and that number corresponds to the type of service

  • that you're trying to access or the type of data

  • that you're trying to send or receive.

  • So, for instance, 22 is for something called SSH, Secure Shell.

  • This is something that most CS majors might use, but most people in the world

  • wouldn't use this because it's entirely command line

  • and it allows you to connect securely to some remote server

  • without using something like a browser, but all of us generally do use browsers

  • and HTTP, it turns out, all this time has had a unique number associated

  • with all of those requests.

  • 80 is the number and if we visited any URL starting with https,

  • turns out there was a special number, 443,

  • that humans years ago decided just uniquely identify encrypted web

  • traffic requests and responses.

  • 587 is used for Simple Mail Transfer Protocol, which is for email.

  • Excuse me, 53 itself is used for DNS.

  • So if you ever send a message to a server saying

  • what is the IP address of google.com, you're

  • using number 53 to identify whatever machine or software can

  • answer that type of question, and so we can actually see this too.

  • If I go back to my IDE and I actually do curl -I https://www.harvard.edu,

  • this of course, worked before and it was 200 OK,

  • but it also will work if I more precisely say specifically send this

  • request to TCP port, or number, 80 and--

  • damnit.

  • Oh, it's wrong because made a compelling pedagogical mistake.

  • So what did I do wrong?

  • AUDIENCE: Https.

  • DAVID J. MALAN: Yeah, so I kind of screwed up my numbers here.

  • So I said https, but I meant to say http if I'm using port 80

  • or, conversely, if I want to talk to the secure port which is known,

  • I actually want to say 443, and that one in fact works,

  • and I can do it again even in Chrome.

  • If I go up to my browser and go to http://yale.edu/80

  • and let the redirects happen, that too will work.

  • It's just browsers, to keep our minds focused on the website we're actually

  • trying to visit and not distracted by technical details like :80

  • or slashes or even sometimes http itself,

  • just hide that from the URL bar.

  • It's all there.

  • It's all happening, but we humans are getting a little more comfortable

  • with the internet over the years so Chrome and other browsers

  • are just starting to hide some of these lower-level implementation details.

  • So that really means, when I actually want to send a request to a web server,

  • I should really write :80 on the envelope

  • to make clear that that's going to a web server listening

  • on port 80 or maybe 443, and then, you know what?

  • It turns out, and we won't dwell too much on the details,

  • even my Mac or your PC also has its own port number for all of these requests,

  • right?

  • And it would be pretty annoying if you could only visit one website at a time

  • or you could use Gmail or Skype but not both

  • at the same time, or Facebook Messenger or Google Chat but only one

  • at the same time.

  • That would be pretty limiting, especially when

  • we have all this computing power.

  • So it's also the case that your own computer,

  • any time you send a request on the internet,

  • chooses a random or pseudo-random number to uniquely identify

  • the piece of software on your computer that's waiting for the reply.

  • So this might be not port 80.

  • This is going to be a bigger number like 1025,

  • or some large-ish value all the way up to 65,000, even,

  • or 32,000 that now uniquely identifies the port on my computer,

  • and that's how your computer can do multiple things at a time,

  • and when I get the response those values are just flipped,

  • but there's one more piece.

  • Like cats can be pretty high quality and videos certainly

  • take up a huge amount of data.

  • Netflix videos and any streaming videos are taking up

  • a huge amount of information and it would

  • be pretty annoying to your neighbors if any time you

  • were watching a movie on Netflix, you had

  • to be done watching the movie in order for a neighbor

  • to also watch a video on his or her computer as well.

  • So it turns out that what computers also do thanks to IP

  • and TCP is, when they're used together, they offer one more feature still.

  • It turns out that if I want to download a picture of a cat,

  • and we have a nice printed version here, I'm

  • not going to get the whole cat in the one envelope most likely.

  • This cat or this video file or whatever it is

  • is actually going to be divided up into a few different pieces.

  • So this message might get chopped or fragmented into four pieces.

  • Each of those four pieces now might go in each of one of these envelopes

  • here, here, and then here with the third and fourth,

  • and what's nice, though, about TCPIP is that it

  • provides at least two features for us.

  • One, IP ensures that every computer on the internet that speaks this protocol

  • has an address.

  • So IP handles the getting of the data to some destination.

  • TCP, the other half of this, ensures or guarantees

  • with high probability delivery-- that the data actually gets there.

  • Because as you might have gleaned from even the animation of all

  • of the transatlantic cables and all of the interconnections among routers,

  • things can go wrong, right?

  • Routers, it turns out, can get overloaded.

  • Their buffers can overflow such that they

  • can't handle all of the traffic coming into them and in fact,

  • if you try to watch Game of Thrones, some episode on HBO

  • and you couldn't access it at some point or [INAUDIBLE] or some tool like that.

  • If they're overloaded, what does that mean?

  • It just means the server, or the routers between us and the server,

  • are getting so many darn envelopes that they just can't keep up

  • and can't hold onto them all at once, and so sometimes packets do get,

  • so to speak, dropped, both physically and also digitally,

  • and this means some packet is lost.

  • And so what's nice about the internet is that when my computer here

  • talks to the nearest Harvard router that may very well have antennas

  • in a room like this or an access point, I might send off a packet here and here

  • and let's send this all the way to the back if you could,

  • but these packets, as you can see, don't necessarily

  • need to travel the same [? path ?] because-- what's

  • your name in the second row?

  • AUDIENCE: Monsi.

  • DAVID J. MALAN: Monsi.

  • So Monsi is getting a little busy.

  • So Kara, if you could route to someone else.

  • This is literally the effect that happens on the internet.

  • If one router, like Monsi, gets a little bit busy

  • and her attention is elsewhere or just has too many packets to deal with,

  • she won't even necessarily drop it but maybe

  • their path will just be routed around her,

  • and that's what's nice about having this mesh network around the internet.

  • Now unfortunately, one of those packets can get dropped

  • and in fact this is a perfect example.

  • If you want to drop it, drop it.

  • Uh-oh, a packet was dropped!

  • What TCP does for us is the following.

  • Once those envelopes reach hopefully one specific person--

  • OK, you are the lucky winner.

  • Whoever, wants to-- how many do we have?

  • Two there?

  • Where did the third go?

  • That's OK.

  • TCP can handle multiple packets being lost.

  • AUDIENCE: It's over there.

  • DAVID J. MALAN: Oh, and so packets also don't take the shortest path sometimes

  • on the internet.

  • So what might happen?

  • So let's assume for the sake of discussion

  • that those packets did make their way to at least one of our audience members

  • here.

  • He or she, upon receiving them, would also

  • see not just the origin address and the destination address.

  • There would also be some notation, like a memo line on the envelope saying

  • 1 of 4, 2 of 4, 3 of 4, 4 of 4, so that the recipient can infer

  • from that little hint whether or not they received all 4 or just,

  • as in this case, a subset thereof, and in that case,

  • assuming the computer speaks TCP, it can simply say,

  • hey David, resend me packet number 1 or packet number 3

  • or whichever were actually lost.

  • And so together all of this happens at blazing speeds.

  • 10 milliseconds to do all that back and forth to New Haven,

  • let alone even faster here on campus, but those really

  • are the basic principles and building blocks

  • that are just getting our data from one place to another.

  • Of course, the real interesting stuff happens

  • when we dig deeper into this envelope and look at the contents.

  • Not just the cat as in this case, but the language, HTML and something else

  • called CSS which we'll do shortly, but I thought

  • it might be fun, especially on the heels of our look at forensics,

  • to take a look at just how sort of presumptuous Hollywood

  • tends to be when presenting us humans with technical details

  • that now you'll perhaps have an even better eye for in addition

  • to the age-old "enhance" line.

  • [VIDEO PLAYBACK]

  • - It's a 32-bit IPP4 address.

  • - IP as in internet?

  • - Private network.

  • [? Tamia's ?] private network.

  • [STUDENTS LAUGHING]

  • - She's so amazing.

  • - Oh, Charlie.

  • - It's a mirror IP address.

  • She's letting us watch what she's doing in real time.

  • [END PLAYBACK]

  • DAVID J. MALAN: OK, so we'll hold it on this screen

  • here because one, a few of you laughed when you saw the bogus IP

  • address because the number was what?

  • AUDIENCE: 275.

  • DAVID J. MALAN: 275, which is too high and that one

  • we could forgive because you don't want like random people pausing

  • their videos on the internet then trying to hack into or get access

  • to that URL, but even funnier is when the hacker is being described

  • as doing this on the screen as part of their attack.

  • This is like the source code in a language called Objective-C

  • for some kind of drawing program, as suggested

  • by the use of crayons in the code as a variable.

  • So let's pause there and when we come back in five minutes, we'll take a look

  • at HTML itself.

  • All right, so we're back and we're about to learn a new language.

  • Though this might feel like a lot to do in just an hour,

  • this one's a markup language.

  • So it's not a programming language, which

  • means you're not going to see loops.

  • You're not going to see functions.

  • You're not going to see conditions or any of the kind of logic

  • that we have built into C and into Scratch and eventually

  • Python and JavaScript.

  • You're instead going to see just what are called tags, pieces

  • of English-like syntax that just tell the browser what to do

  • and what to stop doing.

  • So we're going to see tags that say start making this text centered.

  • Stop making this text centered.

  • Start making the text bold.

  • Stop making the text bold.

  • So these very deliberate kind of statements

  • that we're going to express using something that's code-like,

  • but it doesn't give you logical control.

  • So as such, there's a pretty small language ahead of us

  • and a lot of what you'll do when learning HTML is just

  • check an online reference or an example online or look at the source

  • code of actual web pages to just figure out how these things are done

  • and today, we will focus on the fundamentals.

  • So this is perhaps one of the simplest web pages you

  • can write in a language called HTML.

  • It's a text-based language.

  • All of the tags resemble some English words

  • and there's a pattern to the kinds of things that you might type.

  • First of all, if you're using the very latest version of HTML, which

  • happens to be version 5, it's been around for a while,

  • you simply start every web page with this cryptic incantation at the top

  • here.

  • Open bracket, !doctype HTML female closed bracket,

  • as those things are called.

  • Angled brackets, which you've probably not had many occasions to type

  • on your keyboard, but starting soon you will.

  • Then after that, they start a pattern.

  • So HTML > and then all the way at the bottom is what we'll call the opposite

  • of that tag.

  • If this is a start tag, this will be an end tag, or if this is an open tag,

  • this will be a close tag, differing only with this forward slash that's

  • inside of the tag.

  • So this says, hey browser, here comes a web page.

  • This says, hey browser, that's it for the web page.

  • Again, this sort of starting and stopping mentality.

  • Meanwhile, inside of the web page as denoted by the HTML tag,

  • there are two parts, a head and a body.

  • The head of a web page tends to contain very little.

  • It's usually just like the title bar in the tab

  • that we humans see when you visit a website,

  • and the body is like 95% percent of the contents

  • of the page, the actual viewport or the rectangular region

  • that contains actual content.

  • What is that content?

  • Well here in the head we have a title that's

  • going to be "hello, title" just because and then

  • in the body of the web page, this web page,

  • there's going to be "hello, body."

  • That's it.

  • That's HTML.

  • If you save this text in a file, open it in a browser,

  • you will see a really lame web page that says hello title and hello body,

  • but that's a web page using HTML tags as they're called.

  • Anything in these angled brackets are tags.

  • So I can actually see this pretty clearly even on my Mac

  • and you could do this on your PC as well.

  • I've opened up TextEdit and I've configured it to be simpler than

  • the default, so know that I've done a little something in advance,

  • but you could use notepad on Windows or any other number of other programs,

  • even Microsoft Word if you save it in the right way or Google Docs,

  • but let me go ahead and just recreate this as !DOCTYPE html, open bracket,

  • html, and just to kind of remember to do things,

  • I'm going to tend to get ahead of myself and sort of start and finish

  • the thought and then dive in inside.

  • Let me go ahead and do head here, close head tag here,

  • and I'm indenting, just for good measure, one, two, three, four tabs,

  • though so long as you're consistent the browser will

  • be perfectly content, as will we.

  • hello, title, title, open bracket, open bracket body, closed bracket body,

  • and then hello, body.

  • So that's it.

  • I've just typed out the exact same thing as before.

  • Let me go ahead and save this as not hello.txt or certainly

  • not hello.c but hello.html by convention.

  • I'm going to hit Save.

  • Mac OS is kind of warning me that this is text, not something called HTML,

  • but I know what I'm doing and I'm going to say use HTML,

  • and now I have a file called hello.html, and if I go to my desktop, here in fact

  • it is.

  • And if I double click on it, there, in fact, is that pretty simple web page

  • and if I actually reveal the tab, there it is.

  • Hello, title in the very top tab of the page and once I get rid of that

  • do I see the body again.

  • So that's it for HTML at least in terms of its basic structure,

  • but there are some other features that we can take advantage of as well,

  • and let's actually tease these apart.

  • Notice, first of all, that there is indeed this symmetry.

  • What is opened is almost always closed as well in the opposite order.

  • Just as head here and title here, and then

  • followed by body and then the contents therein,

  • but because there is this structure, you can actually

  • think about this in a relation to the past couple of weeks

  • when we've talked about data structures.

  • I would argue that this HTML on the left is

  • kind of equivalent to this tree on the right,

  • and we didn't spend a huge amount of time talking about trees,

  • and even when we did we used them for algorithmic reasons

  • like a binary search tree to search data pretty efficiently,

  • but if you think about it, here is the document, which I'm just

  • drawing with this shape here kind of arbitrarily

  • and it has one child like the entire page as I'm drawing it,

  • which is the HTML tag here.

  • The HTML tag has two children, so to speak, to borrow

  • our language from our data structures.

  • So head and body from left to right.

  • Head has a child called title and then title has a child of some sort,

  • even though it's just raw text.

  • It's not another tag with angled brackets,

  • just as body has its own content there, just hello, body.

  • So that hierarchy and the deliberate indentation, which is there just for us

  • humans-- the browser does not care about whitespace--

  • lends itself to an implementation in memory,

  • and so long story short, when your browser receives an envelope,

  • inside of which are not just those HTTP headers, outside of which

  • are not just the IP address and TCP port,

  • but inside of which is a text file containing HTML like that,

  • all the browser does is load that file into memory,

  • read it top to bottom, left to right and essentially build

  • a tree structure in memory so that it knows how to represent it

  • underneath the hood, so to speak.

  • And in fact, you've seen HTML all around you

  • even if you've just never looked underneath the hood, as we say.

  • In fact, if I go to like harvard.edu and let

  • the redirects happen in the usual way, let me go ahead and inspect the page.

  • This is another way in Chrome and in other browsers

  • to get at the developer tools.

  • You can control click or right click on the web page and choose Inspect.

  • That opens up the same tab.

  • Previously, we used the network panel, but if I click on Elements

  • you can actually see all of the HTML that composes Harvard's page,

  • and it looks beautiful here.

  • It's nicely color-coded.

  • It's prettily indented.

  • I can dive in deeper with all of these arrows,

  • but that's probably not how the humans made it

  • because if I also right click or control click and choose View Page Source,

  • and you can do this in any browser as well,

  • here is the mess that actually came back from Harvard's server.

  • This is HTML and my god, like, it's a lot.

  • I see no indentation, so style 0 here, but that's OK

  • because it's a browser reading it.

  • It's not a human in this case and similarly,

  • if we visit something like yale.edu, and let's go ahead and open up their page

  • source, it's similarly going to be kind of overwhelming and a lot of it,

  • but rest assured that even though these web pages might look really,

  • really sophisticated-- like, my god, we've never written a C program with

  • 500 plus lines of code--

  • a lot of this stuff is generated, and in fact,

  • one of the challenges of pset7 and pset8 when we explore web programming

  • is going to be not to write hundreds of lines of HTML, which would just

  • get mind numbing quickly, but to write a few lines of Python

  • or a few lines of JavaScript that programmatically, like with loops,

  • generates all of the structure of your web page.

  • So if it's like a web page of photos like a Facebook photo album,

  • Facebook doesn't have people writing out thousands of lines of HTML code

  • every time you upload a photo.

  • They have code in PHP or some other language

  • that has a for loop that iterates over all of the photos you've uploaded

  • and spits out the same HTML but different image for each of the photos

  • you've uploaded, and that's where web programming comes into play.

  • You're not writing the HTML, you're generating it

  • by actually writing programs.

  • So today we set the stage for that capability

  • but first we just need a framework for actually doing this.

  • So rather than use, now, my local Mac, which

  • is kind of lame because I can open the web page but no one else in the world

  • can access it, and in fact, if we do that again, you'll

  • notice here, if I double click on hello.html and open the URL bar,

  • it's curiously clearly not on the internet.

  • Like, it's not http, it's not https, it's literally file://,

  • which just means it's a file on my local computer.

  • So none of you could reach that because of course

  • this user jharvard on my laptop exists only on my local Mac.

  • So fortunately we have a web-based IDE with which to put stuff

  • on the internet, but there's a catch.

  • The IDE itself, recall, is a web application, right?

  • It's code that friends at Amazon wrote and that we added to that runs

  • on a server somewhere and, as we'll see, somewhat in your browser too,

  • but more on that when we talk about JavaScript,

  • but CS50 IDE already has a URL like https://cs50.io or https://ide.cs50.io

  • slash whatever your username is.

  • So we're already using port 80 or maybe 443 for the IDE itself.

  • So how in the world could you write web pages in the IDE

  • and then serve them on the internet if the IDE itself

  • is already using the standard port?

  • Well fortunately you can write on the envelopes,

  • when trying to access your own web pages, a hardcoded TCP port number.

  • It doesn't have to be 80, it doesn't have to be 443.

  • Those are just the defaults.

  • If I want to actually visit pages in my IDE,

  • I can just run a web server on a different port number,

  • like 8,080 by convention or 8,081, 8,082.

  • Just a pretty big number that odds are no one else is using on some system.

  • So let's see this as follows.

  • Let me go ahead and in the IDE here create a new file.

  • I'm going to call it hello.html and I'm just

  • going to go into that text file, whoops, which I closed.

  • Let me go ahead and just grab the code that we've

  • been using here, which is right here, go back to the IDE,

  • paste it into the text file here, click Save, and now I have in the IDE

  • a file called hello.html, and indeed if I look at the file browser

  • and I look on the left-hand side, there, in addition to the sample code,

  • is hello.html, but if I double click this file it's not

  • very useful because it's going to open the editor, which

  • is not like a web page.

  • It's the source code for my web page.

  • So I actually now need to run a program that

  • serves this file just like Facebook does, just like Google and Harvard

  • and Yale do, and I'm going to do this literally by running http-server,

  • and I'm going to say on port 8080.

  • So -p in this particular program means port

  • and I'm just going to say, hey CS50 IDE, start a program called httpserver

  • whose purpose in life is to listen for requests on the internet,

  • but specifically on that port number, and serve up whatever requests come in.

  • So I've gone ahead and hit Enter here.

  • Starting up httpserver.

  • It tells me the long URL that this is available at.

  • Your URL will be a little different with your username

  • and if I open this now in another tab, it's a little cryptic at first glance.

  • I'm just seeing the index or contents of my directory and in there is like

  • a secret .c9 for Cloud9 directory.

  • Don't delete that or change that.

  • That just has metadata related to the IDE.

  • Source6 I downloaded earlier and you can too from the course's web site,

  • but there's hello.html, and on the left-hand side

  • here, you'll see some cryptic looking permissions.

  • This has to do with who can read and who can write your files,

  • but for today all I care about is that the file exists.

  • So now, like a user on the internet, I'm going to go to here, click on it,

  • and viola!

  • There is my actual web page.

  • So notice, the URLs are very similar.

  • Here I am on cs50.io and here I am on cs50.io

  • even though your user names will of course be different,

  • but the IDE is running on the default port, 443.

  • I'm now temporarily serving up my HTML files

  • using port 8080 just because and so that's

  • how a server can do multiple things and how you can do

  • multiple things on the server at once.

  • So let's do something else besides that.

  • Let me actually introduce a few other fundamentals that

  • might be handy when writing HTML and let's go ahead and do this.

  • Let me go ahead and create a new file and we'll call this one

  • paragraphs.html, and let me go ahead and just name this like paragraphs and down

  • here I'm going to have some paragraphs of text,

  • and I don't really know what I want to say so I'm going to Google some--

  • so standard Latin-like text.

  • Oh, I want like three paragraphs of Latin-like text and so here we go.

  • Then there's a random website that just generates

  • placeholder text in faux Latin.

  • So, Paste.

  • There are my three paragraphs.

  • I'll be a little nice and tidy and indent them

  • so it looks at least somewhat nicely styled.

  • Save the file and now let me go back to the URL I was at a moment ago.

  • Now notice I have two files being served by this HTTP server program.

  • Click paragraph-- oh.

  • OK, one, Chrome thinks the page is in Latin.

  • [STUDENTS LAUGH]

  • Actually, soccer inferior element estate planning time.

  • Tomorrow soss quiver before as the--

  • that does sound like the Latin I learned years ago.

  • All right, so Show Original.

  • So the point is not to focus on the Latin, but the apparent bug.

  • Like, what's it not doing that maybe you thought it should a second ago?

  • AUDIENCE: No indentation.

  • DAVID J. MALAN: Yeah, there's no indentation and also there's no what?

  • There's no break.

  • I mean this is one big Latin-like paragraph.

  • It's not three.

  • Well this is simply because a browser only does what you tell it to do.

  • Let me go ahead and shrink this window and, as an aside, what you're

  • seeing here, all this mess in the bottom terminal window,

  • as the httpserver program is running, it is logging all of the HTTP requests

  • that come in from browsers just so you can kind of debug or diagnose,

  • but we're going to just ignore that for now

  • and let this thing run down here in the background.

  • But if I want paragraphs I need to be a little more pedantic

  • and actually say, hey browser, make a paragraph with what's called the p tag,

  • and let me go ahead now and indent even though the indentation clearly

  • doesn't matter.

  • It's just to keep my code nice and tidy.

  • So, hey browser, start a paragraph.

  • Here's the text.

  • Hey browser, stop the paragraph.

  • Same thing here.

  • Let me go ahead and start a paragraph.

  • Then let me go ahead and stop the paragraph.

  • Notice the IDE is trying to be helpful.

  • This is not helpful.

  • This is not a password, but it's trying to autocomplete my thoughts.

  • That's fine.

  • I'm just going to ignore it.

  • Then let me go ahead and close the paragraph and save.

  • So it's a little more verbose, but anything in the tags the human is not

  • going to see, but when you reload the page, as with command or control+R,

  • or if you go up here by clicking the reload icon,

  • whatever it looks like in your browser.

  • Now I have three Latin-like paragraphs.

  • So it's a little more deliberate here.

  • So that's all fine and good, but the web is kind of more interesting

  • when you can actually link to things.

  • So let's actually do that instead.

  • Let me go ahead and create a new file called, let's say, link.html.

  • Go ahead and paste this here and say we'll name the title link.

  • Let me get rid of all of this just so I have some placeholder

  • and I can say something like "Hello, world!

  • My favorite school is..."

  • and just to play it safe today, "stanford.edu."

  • Save, reload, click link.html and nothing.

  • So here too it looks like a domain name and it certainly is, and frankly,

  • all of us now are probably conditioned in tools like Slack and Gmail and other

  • tools and Facebook that just kind of figure out that, oh,

  • if something looks like a domain name, make it a link,

  • but that's because someone at Facebook, someone at Google knows HTML and knows

  • how to use if conditions and elses and just says, oh,

  • if a string that the human has typed in looks like a domain name ending

  • in .edu, make it a link.

  • But how do you make it a link?

  • We can now do this manually.

  • It turns out you need an anchor tag abbreviated as a

  • and then I'm going to close the anchor tag at the end of the text

  • that I want to anchor a link to, but this isn't enough.

  • I need to be ever so explicit as to where I want this link to go,

  • and so it turns out HTML also supports what are called attributes.

  • So tags are the things in angled brackets.

  • Attributes are also inside those angled brackets,

  • but they come after the tag's name, and they just going

  • to modify the behavior of the tag, and it makes sense here

  • to need to modify the behavior because 20,

  • 30 years ago when HTML was invented, we didn't make up

  • a tag that leads to stanford.edu.

  • We made up a more generic tag that anchors to some destination,

  • and so here I can now do www.stanford.edu, save the file,

  • and notice, this is like saying to the browser, hey browser,

  • here comes a link or hyperlink to Stanford's web site,

  • and then the end here it says hey browser, that's it for the link,

  • and thankfully it's not super verbose.

  • You don't have to repeat the attribute at the end.

  • You just repeat the tag's name, otherwise

  • you'd be typing the same thing again and again.

  • If I now go back here and reload the page as with command or control+R,

  • now it becomes the familiar and blue underlined link,

  • and if I click on that, notice first it's super small.

  • You can see where the link is actually going to lead,

  • and so if I click on this we'll see Stanford's website and voila.

  • So now we've visited their page as well, but there's an interesting side note

  • here, and if you want to kind of think about things called phishing attacks

  • or frankly, Harvard once in a while and Yale once in awhile

  • will email out warnings like "beware of this phishing attack."

  • P-H-I-S-H-I-N-G.

  • This is when people on the internet generally

  • send you emails or some kind of spam trying to trick you

  • into visiting a phony website to harvest your usernames, passwords, credit card

  • numbers and whatnot, and honestly, most of those phishing attacks

  • boil down to this 10-line example of HTML

  • because what's to stop me from saying something like "Hello, world!

  • Confirm your password at..."

  • and then we'll say like paypal.com and then

  • over here, I can change this to like davidsphishingsite.com,

  • which hopefully doesn't exist.

  • One year I went to badplace.com and--

  • anyhow, so--

  • [STUDENTS LAUGHING]

  • Here I've gone ahead and saved the file, reloaded, and the link is indeed blue,

  • but before I click on it, only the most estute of users

  • is going to even bother checking the bottom left hand

  • corner to see where they're about to be whisked away to

  • and even most of us in this room, myself included,

  • are not so paranoid that we're constantly

  • checking those kinds of things.

  • Odds are, if I get an email like this, oh

  • my god, my accounts been compromised.

  • I've got to go confirm my password for PayPal to protect my money.

  • You might very well just follow the link,

  • but of course it can go anywhere you want just via this very basic building

  • block, but this is just one way you can vet actually

  • what's going on underneath the hood, but of course the internet

  • is more interesting than just text alone.

  • Let me go ahead and open up an example that I whipped up

  • in advance here using image.html and we'll see another tag here.

  • So here is another opportunity to use an attribute

  • and one that's also not necessarily visible to the user.

  • So here's an image tag.

  • Humans years ago decided to be succint.

  • It's img > for image, just like it's just a > for anchor.

  • The source, src, of which is going to be that file, dan.jpeg,

  • which I downloaded in advance from the URL up above,

  • and in fact, this is gray in the cs50 IDE because it's syntax highlighting it

  • just like in C. This is what's a comment in HTML.

  • So if you want to make notes to yourself or to viewers,

  • some sentence or like a citation like this,

  • you can use an HTML comment by doing ! // // > and you can write anything

  • between those things-- for the most part-- that you want.

  • So just like in C do we have the //.

  • So here's the source of this image and this

  • is like an alternative explanation of it, alt. Why might this be compelling?

  • I want to show the image to a user.

  • Yeah?

  • AUDIENCE: Is it for like if they hover their mouse over it,

  • they can see what's happening.

  • DAVID J. MALAN: Yeah, so a couple of reasons.

  • If you hover over the image you can actually see some descriptive text.

  • So like Handsome Dan here, like Yale's mascot.

  • If the user has trouble seeing or is blind,

  • you might need a screen reader to actually tell you

  • what it is that's on the screen, and it's not

  • obvious from dan.jpeg what that could be,

  • but if you have this alternative text, a computer

  • can recite verbally Handsome Dan, which might then

  • jog the person's memory as to what it is that actually on the screen.

  • Or if you have a really slow internet connection,

  • sometimes you'll see a placeholder for an image

  • that just says what it is before the image actually downloads.

  • So being mindful of these kinds of things

  • will just make, ultimately, your websites more accessible,

  • and indeed if I go to this one now and go into my source6 directory

  • where we have even more examples at our disposal and go to Image 6,

  • here is their adorable Handsome Dan as of this past year.

  • So there's an image.

  • We can kind of do funky things now with nesting.

  • So this is not all that interesting because it doesn't go anywhere,

  • but I could just combine these ideas.

  • I could do a href = http://www.yale.edu or, because I don't want the user

  • to bother getting redirected, I could just proactively make

  • it secure because I know Yale supports that per earlier,

  • and I can nest these tags like this.

  • Now if I go here, reload, it still looks the same

  • but notice my cursor changes to like a pointer, and if indeed I click on that,

  • now the image leads to Yale's web site, but I skimmed over something.

  • One of these is not like the other.

  • What detail have I kind of not mentioned?

  • Yeah.

  • AUDIENCE: The image file closes within itself.

  • DAVID J. MALAN: Yeah, the image tag kind of closes in and of itself,

  • and so there are some of these anomalies within HTML

  • where there really isn't a notion of, like, start doing something

  • and then eventually stop doing something.

  • Like, an image is either there or it's not.

  • Like, you can't kind of put something in between it conceptually,

  • and so some of these tags in HTML are what are called empty.

  • Like, they should not have anything after the open tag

  • or before the close tag.

  • So if you wanted to be really sort of precise you could say this,

  • but you should not put anything where my cursor now

  • is because it would make no sense to try to put something inside of an image,

  • but this is just kind of lame to have this unnecessary verboseness.

  • So you can just put the slash in there and technically in HTML5 you

  • don't even need the slash in this case, but at least this way,

  • and I think for pedagogical purposes, doing it, even for empty tags,

  • makes sure and makes more clear visually, when and that your tags are

  • balanced.

  • So that's the only anomaly there and then

  • there's bunches of others which we can fly through really quickly here.

  • So if I go back to our examples here, I whipped up headings.html.

  • So if you want to do something like this if you're

  • writing like a book or a website that has like chapters and sections

  • and subsections and so forth, HTML lets you

  • easily format things as big and bold, slightly smaller and bold,

  • slightly smaller and bold, and so forth by using the h1 through h6 tags.

  • So if I go into headings, this is how I made this web page.

  • I simply have h1, h2, h3, h4 opened and closed and that's it.

  • So any time you're reading some kind of online text,

  • odds are they're using one or more of these tags to format the page.

  • If we look at another example in here, we have something like list.html.

  • Lists are not uncommon on the internet, you'll never believe number three,

  • and here's how you might do something with a bulleted list by just marking up

  • three words-- foo, bar and baz--

  • and the HTML for this, if I open up list.html,

  • simply looks a little more verbose in that we need a parent element so

  • to speak, borrowing our tree terminology,

  • but here we have an unordered list, or ul, each of which

  • has one or more list items, or li, each of which

  • open and close foo, bar and baz.

  • And if I really want it numbered, I can also do this.

  • I can change unordered list to ordered list, ol, reload and now the browser

  • figures out the numbering for me, which is nice if you have lots of data

  • and you don't want to deal with actually laying it out yourself.

  • Meanwhile, we can go one or two steps further before we actually

  • get to something functional.

  • Here is kind of the most complicated of all,

  • but it too just kind of tells the browser what to do.

  • So before we look at the result, this says, hey browser, here comes a table,

  • like tabular data.

  • Rows and columns like Excel or Google Spreadsheets.

  • Hey browser, here comes a table row, or tr.

  • Hey browser, within that row, here comes some table data, a.k.a.

  • a cell or column.

  • Here comes another cell.

  • Here comes another cell.

  • So that's one, two, three cells in a row.

  • Hey browser, here comes three more cells.

  • Hey browser, here comes three more cells.

  • Hey browser, here comes three more cells and if we actually

  • render this in the browser, you can see the layout of a sort of old school

  • phone pad on your phone.

  • It's not very pretty, it's not very well formatted,

  • but if we zoom in you really do see that it is lined up in rows and columns

  • as I sort of verbally implied, but this is all very kind of underwhelming.

  • Like, Google is cool because you can go to it

  • and you can actually search for cats and find lots of cats on the internet,

  • but how is it that this actually works?

  • So, aww, bad news today.

  • OK, so we'll just zoom in on this one.

  • OK, so let's try to focus on the pedagogy here--

  • of cats-- as follows.

  • Let me go ahead and focus on really the URL, which is kind of long and cryptic,

  • but let me just throw away honestly anything that kind of looks confusing

  • or I don't understand.

  • I have no idea what source means so I'm going to get rid of that.

  • I have no idea what the rest of this means.

  • I'm going to get rid of that and I'm going to try to distill-- granted,

  • with some foresight because I knew how Google works here--

  • I changed the URL to something much, much, much simpler.

  • Cats,f where it's www.google.com/search?q=cats.

  • It seems that, somehow or other, Google's behavior

  • is controlled by information that's conveyed in the URL,

  • and it's not just that I'm searching.

  • It's that I'm searching for cats.

  • So in fact, on a whim, I'm going to search for dogs instead and hit

  • Enter, and indeed a few things change.

  • We have all these dog images appear here on the right.

  • We have the text pre-populated up here and we

  • can search for any number of other things

  • here, like Harvard Yale prank 2004, Enter,

  • and there you have a Wikipedia article on the video we saw earlier.

  • So it seems that you can parameterize the behavior of Google

  • just by understanding how this URL works.

  • So here is kind of the path that's being requested,

  • the file or folder or whatever that is.

  • A question mark says, hey browser, or hey server,

  • rather, here come some HTTP parameters.

  • Some inputs from a human who's either filled out a form or apparently

  • is kind of hacking the URL bar here, and then the name of the parameter

  • comes next. q, meaning query, and this is what Larry and Sergey decided years

  • ago for their search box, an equals sign,

  • and then whatever it is the human typed in.

  • Now it got a little funky here quickly.

  • Now you see %20.

  • That is the web's way of encoding a space so

  • that it's not a physical space, it's all one contiguous string.

  • So it's just one contiguous string for the server to actually look at or read,

  • and so why is this useful?

  • Well it turns out I can leverage this information

  • and kind of implement my own Google pretty easily.

  • Let me go ahead and go into search.html, one of the other examples I whipped up,

  • and you'll see another tag all together.

  • Inside of the body of this page is an HTML form tag,

  • and the form tag takes a couple of attributes I know.

  • One is action, which is the URL to which you

  • want to send the form's information, and the other

  • is the method that you want to use.

  • Now it's a little inconsistently lowercased here just because,

  • but we did see that verb before.

  • Where?

  • Where did we see this verb?

  • This was like the somewhat arcane message that was going, supposedly,

  • inside one of these envelopes when we said GET in all caps /http1.1

  • and so forth.

  • So it seems that if you want, as the web developer,

  • to create an HTML form that has text boxes and maybe checkboxes and dropdown

  • menus and so forth that submits its information when the user clicks Enter

  • or a button to this address, and you want it to go inside of a virtual

  • envelope using that GET verb, you literally just say method=GET.

  • And then down here I seem to have two inputs, one of whose names

  • is q, the type of which is a text box, and the other of which

  • is a submit type, whatever that is, the value of which is search.

  • Now you would only know what these things mean by seeing them demoed

  • or looking at some online reference, but if we pull this up to see the results

  • we have a super simple--

  • and I'll zoom in--

  • very, very simple version of Google, right?

  • It don't even have the logo, but it does have, I claim, all of the functionality

  • because watch what happens if I type in, for instance, whoops, birds and click

  • Search.

  • Oh my god, I implemented Google with just like 15 lines of code,

  • but not really, right?

  • Like, I've implemented the front end of Google,

  • which I got to start Googling these things in advance

  • OK, uh, these are very sad stories.

  • [STUDENTS LAUGH AT MORBID NEWS HEADLINES]

  • DAVID J. MALAN: OK, so the point though is, the point-- look up, look up.

  • The point is that the URL is what I generated.

  • So using those HTML tags coupled with the human's cooperation

  • and actually clicking a button did I then

  • generate this URL, whisk the user away from the IDE

  • to google.com, where Google is handling the back end,

  • like all of the hard work, actually checking their database,

  • rendering the HTML, but I made the front end,

  • the user interface via which you can actually interact with Google's search

  • engine there.

  • And it boils down to just these basic heuristics,

  • but of course this is a pretty ugly search engine, right?

  • Black and white text box, a gray button and that's it.

  • Like, even Google, simple though it is, has a little bit of style and color

  • to it and things are centered and kind of spaced differently.

  • So there's an art to this ultimately and indeed

  • being a web designer in itself is a profession

  • and in fact, you'll find in industry that some people are

  • good at front end design.

  • Some people are bad at it.

  • I'm among the ones worse.

  • Like, my web pages look like that search box just a moment ago,

  • but some people really prefer the non-graphical stuff, the back-end,

  • the database stuff, and indeed one of the takeaways over the next few weeks

  • will be for you to figure out for yourselves if you like any of this

  • at all certainly, but also like what your preferences are.

  • And you might hear terms in industry these days

  • like front-end developer, back-end developer.

  • That just means do you work on what the user sees in their browser or app

  • or do you work on the back-end, the database stuff that's

  • really important and sometimes quite difficult,

  • but that the user doesn't interact with directly.

  • Or are you a full-stack developer, which means you just

  • do all of this, which all of you from CS50

  • are effectively, albeit after just one or so semesters of background.

  • So how do we start, though, to make things prettier?

  • Well it turns out that HTML, for the most part, is just a markup language.

  • It's for structuring a web page and semantically tagging things,

  • and by semantically tagging things I mean

  • like, hey browser, here's the head of my page

  • and that's a concept, semantically.

  • Hey browser, here's the body of my page, and that too

  • is a concept, semantically.

  • I didn't say anything about bold facing or font size or colors

  • or all this stuff that's important for a good user experience, or UX,

  • but that can be decoupled from HTML, and in fact,

  • one of the challenges as you learn HTML for the first time

  • is to try to make your way through various online resources and references

  • will sometimes combine these ideas.

  • So, again, today we'll focus not just on correctness, getting things to work,

  • but design as well.

  • So here, for instance, is a super simple web

  • page for someone named John Harvard that has

  • a header and a main part and a footer, and header is distinct from head.

  • It's sort of poorly named here.

  • Head of the web page is just the tab bar and other such things up top,

  • but semantically you might have a page with like three parts.

  • Like the header, like the title on the body of the page itself,

  • like the main part where the actual contents

  • are, and then a footer like a copyright symbol or something like that.

  • So this might be a general division of a page,

  • but notice I've styled it a little differently.

  • Let me go ahead and open this up in a browser as I did just a moment ago

  • and go to, sorry, I'm going back through my entire internet history here.

  • Let's go ahead and open this up just as we did before at this URL

  • so that we can go ahead and open up CSS0.html.

  • Notice that, oh, this is already marginally better than the pages

  • we've looked at before if only because it's centered, which is a step forward

  • from everything just being left.

  • The first line is a little bigger.

  • The second line is kind of medium and the bottom line is the smallest.

  • So there's a little bit of style here, but not all that much.

  • So how did I actually do this?

  • Well take a look at the code here.

  • I have added, now, a style attribute to several of my tags.

  • So the header, the main and the footer really

  • aren't styled in any specific way.

  • They're just a way of telling the browser this

  • is the important stuff for the title, this

  • is the important stuff for the main part,

  • this is the important stuff for the footer,

  • but the stylization or aesthetics come from this yellow text

  • here, thanks to the IDE syntax highlighting it,

  • and notice this text follows a different pattern.

  • Up until now, we've been using angled brackets and words

  • and equals signs and quotes.

  • Now, inside of those quotes, we also have another pattern

  • when you're using this second of two languages today, CSS.

  • fontsize:large is the stylization for this particular element's content.

  • Text align should be center.

  • These are two CSS properties.

  • CSS, cascading style sheets, and we'll see what that means in a moment,

  • but this is just how you configure the style of those elements,

  • and indeed that's why one is a little bigger and then a little smaller

  • and then even smaller because, notice, I did fontsize:large, fontsize:medium,

  • fontsize:small.

  • All right, but as we've often done, let's iteratively improve upon this.

  • Even if you've never seen HTML or CSS before,

  • there's some poor design manifest in this simple example.

  • What might you say seems wrong or seems a little copy paste-like?

  • Yeah.

  • AUDIENCE: They're all centered [INAUDIBLE]..

  • DAVID J. MALAN: Yeah, they're all centered

  • and I literally like copied and pasted that CSS property, its key value

  • pair, its name and value, again and again

  • and again, but remember the hierarchy of HTML

  • and the DOM, Document Object Model, the tree we drew a little bit ago.

  • All of these elements-- header, main, and footer--

  • have a parent element called what?

  • AUDIENCE: Body.

  • DAVID J. MALAN: Yeah, body.

  • So one level higher, which is indented this way

  • or in the tree is higher up in that family tree-like drawing, all of these

  • are children of body.

  • So why don't I just move or factor out text align center

  • into the elements above it?

  • And herein lies the cascading of CSS.

  • Cascading style sheets means that if you have a property up here,

  • it will cascade down to all of the children and descendants below it

  • and it means another thing, too.

  • You can even override these properties somehow,

  • but we'll see that before long.

  • So if I go ahead now and open up CSS1.html,

  • notice that I did exactly that improvement.

  • The code's a little tighter now.

  • It's fewer characters, easier to maintain

  • because now if I want to change it to left or right or center,

  • I change it one place, not three.

  • And so this is kind of consistent with some of our design takeaways from C

  • and indeed, if I visit this page, CSS1.html, it looks the same,

  • but it's better design underneath the hood.

  • But we can do a little better still.

  • If I open up CSS2.html, notice that I've done this.

  • I rather like this design now because it's even more succinct.

  • I'm not using the style attribute anymore.

  • I'm using a different attribute called class,

  • and class is kind of a way to define--

  • much like a struct in C lets you define your own data types, a class in CSS

  • allows you to define a name for a whole bunch of properties,

  • and so here I just said let's call this class large, medium, and small,

  • and I don't know what those mean, and frankly I

  • might be working with a friend who's much better at design

  • than I am so I'm going to let him or her actually define these meanings.

  • I'm just going to kind of tag things in this way semantically,

  • but if we scroll up in this file, you'll see that for now I have no such friend,

  • and so I implemented it myself, and here's, for the first time,

  • one other thing in the head of the page.

  • Up until now, we've just had the title, but it turns out

  • you can have a style tag.

  • Not just an attribute, but a style tag inside of which,

  • it's a little cryptic at first glance, but there's some pattern here, clearly.

  • You have all of those properties, but the new syntax here

  • is that if you want to define a word called centered,

  • you literally do a period and then the word centered.

  • If you want a word like large, you say .large.

  • So it's similar in spirit, though not quite the same as like typedef in C,

  • but you say .center, .large, .medium, .small.

  • You use our old friends curly braces, which we will only see in CSS,

  • and this just defines one or more properties

  • to be associated with that new keyword.

  • And so, if we scroll down here to the bottom,

  • you'll see that I centered the body.

  • I made large the head, medium the main, and small the footer,

  • and the result is going to be exactly the same.

  • Very underwhelming, but again, marginally better

  • design because now we are just one step away of really improving this.

  • If I do finally have that friend, it's not

  • going to be very easy to collaborate, ultimately,

  • if we're both working on the same file and moreover, it

  • seems unnecessary to introduce these semantics.

  • Like, why do I have to have tags like header and main and footer

  • and classes called large and medium and small and centered?

  • Like, why don't I leverage the names of these tags themselves?

  • And this is where HTML can be pretty powerful.

  • Notice I've simplified some of my CSS up top.

  • I've dropped the period, which was like typedef.

  • Like, give me something called large, give me something called medium.

  • Now I'm just saying literally a word, but those words are identical to what?

  • AUDIENCE: The tags.

  • DAVID J. MALAN: The tags themselves.

  • So preexisting tags, if I just mention them by name without a period,

  • which gives me a new name--

  • I just mention the body, the header, the main and footer,

  • and then, inside of the curly braces, define my properties,

  • now I can just stylize the actual tags as they exist in my page,

  • and this now looks like really readable, maintainable HTML.

  • There is no aesthetics associated with the markup language here,

  • but rather there's useful tag names that come with HTML--

  • you can't just make up your own tags.

  • They're in, sort of, the documentation, but now it's just much more readable,

  • and this might look different on my phone or your phone or your laptop,

  • but my friend who's good at stylization can figure out

  • how to style all of these things, and better yet, he or she doesn't even

  • need my file.

  • In the fifth example here, notice that's it for the page.

  • We've gotten rid of the big style tag and replaced it apparently with what?

  • AUDIENCE: Href, a link?

  • DAVID J. MALAN: Yeah, link href, which is a horrible, horrible name

  • because it's not like a link in the page and hyperreference

  • was already used for a link in a page, but this is what we're stuck with.

  • This just says, hey browser, include this CSS file

  • that is elsewhere on the server.

  • The name of this file is arbitrarily CSS4.css

  • because this is our fifth example here-- zero index.

  • The relationship of this file to this page

  • is that it's a style sheet, which is just a list of aesthetics or properties

  • that should characterize its layout and indeed, if I open up CSS4.css,

  • I just copied and pasted everything in there,

  • but this is nice now in principle, even though we're just

  • creating work for ourselves today, because now I

  • can share this file with someone else.

  • He or she can work on it on their own.

  • Then we can merge our work together because my work's in the HTML file.

  • Their work's in the CSS file.

  • Better still, if we're making a whole website that has a dozen pages or 100

  • pages, consider this.

  • Just like in a C header file, I can include bitmap.h

  • in all sorts of programs.

  • Similarly can I include CS4.css in all of my web pages.

  • So if I want to change the font size or the layout

  • or whatever in all of my website all at once, I change in one place,

  • not in every darn web page that might have been created by me or by someone

  • else, and so there's just that maintainability to it too,

  • but we can do even better than that because even the CSS we're

  • looking at here is only so good, and what's really nice

  • is if we go to bootstrap-- let Google tell me where to go.

  • We're safe.

  • OK, so Bootstrap is a library-- formerly from Twitter, now

  • a much larger community-- that's a whole bunch of CSS libraries.

  • So just as in C, we have code and functions that other people wrote.

  • So in the world of web development do we have

  • code that other people wrote and we use that for JavaScript and Python,

  • but even for aesthetics are there sites like Bootstrap

  • and other popular things that allow us to make our sites prettier

  • and build them more quickly without having to reinvent wheels.

  • So for instance, if I go down to let's say Content and I go to Typography

  • and skim through here, you'll indeed see like h1, h2 and h3,

  • but if you want things even bigger than that there's like a display heading.

  • There's this fancy version, which has a fancy display heading

  • with some faded secondary text.

  • So pretty marginal, but I don't have to figure out how to do that now myself.

  • If I want to actually have tables, I can do much prettier tables

  • than I did with my little old school phone pad a moment ago.

  • Like I can make things different colors.

  • I can shade the columns like this and in fact, you can do even fancier things.

  • If I go ahead and open up a web page and go

  • to our big board for speller.cs50.net, you'll

  • see that this is a pretty good looking table as tables go.

  • Certainly much better than the one before, but that's

  • because we're using the Bootstrap library,

  • and even more compelling than the aesthetics are

  • that suppose that you visit speller.cs50.net on your phone,

  • it starts to get pretty ugly once your window gets smaller,

  • but notice stuff can just disappear magically

  • when you're on a mobile device or, in this case,

  • simulating it by using just a smaller browser window.

  • So using CSS and the aesthetic power that it provides,

  • we can also dynamically change our files to just render differently

  • on different devices, and then lastly, let me open up, for instance,

  • this under Components.

  • This is where the really juicy stuff is.

  • If you want fancy alerts to yell at the user or say everything is OK,

  • you get nice little colored boxes like this.

  • The forms are much prettier.

  • I mean, already this looks much more like the web you and I use

  • and not the mess of a form that I created a moment ago

  • and long story short, just like in C it's

  • pretty easy to include these things in your own site, so can I do this.

  • Let me go ahead and open up form0.html, and this is literally

  • an approximation of the very first web application I made,

  • even before web application was a phrase, in 1997.

  • I had taken CS50 and CS51.

  • I hadn't learned web stuff at the time.

  • I just kind of taught it to myself and learned

  • from some friends and the first thing I did

  • was build an interactive website via which first years could register

  • for intramural sports because literally that year in 1996 it was paper-based.

  • You'd walk across the yard, open up Wigglesworth, one of the dorms,

  • slide a piece of paper-- old school-- under the door

  • and you were registered for a sport.

  • We could do better even in 1997, and so we did it with the web,

  • and so this form0 back in the day looked a little something ugly like this,

  • but there's a text box where you could type in your name

  • and then there's the dorm where you could select Matthew.

  • So I could actually do David Malan and Matthews and then click Register,

  • but we don't yet have the ability to make backbends yet.

  • So this form goes nowhere for today, but you at least

  • get these kinds of aesthetics, which are kind of 1997 aesthetics, literally.

  • But if we go into this other example, form1.html,

  • it looks pretty, pretty better now.

  • It's maybe a little big in retrospect, looking at the display font,

  • but all I've done is now use this Bootstrap library, and notice,

  • it's a little hard to see on the projector here,

  • but everything's kind of like nicely outlined.

  • There's like Mark Zuckerberg sample text there which

  • we can override by actually typing in our own email address here.

  • We have a prettier looking box, a prettier looking button, and that's

  • just because if we open up, as down here,

  • form1.html, notice that in addition to my HTML

  • down below and in addition to a couple of other things

  • that I've added to make things more mobile-friendly in particular,

  • I just added this.

  • I read the documentation on getbootstrap.com

  • and I went ahead and added Bootstrap's library to my own code

  • in order to have access to its actual features,

  • and then down here, it's a little overwhelming at first glance,

  • but I just followed the directions.

  • There's something called div in HTML for a division of the page.

  • It means give me this invisible rectangular region.

  • The class I associated with it is called form group.

  • I didn't make this word up.

  • This comes from Bootstrap.

  • I just did what they told me to do.

  • I then have a label, which makes things more accessible

  • and you can click in different places.

  • I have another class here but long story short,

  • I just read the documentation because I know what tags are,

  • I know what attributes are.

  • I know a little bit of CSS now and I know how HTTP works,

  • and so really I have enough building blocks in order to work on this myself.

  • So that then is CSS and there's one last detail I thought I'd show us here.

  • In all of these John Harvard examples, as in just a moment ago,

  • we had something like this at the very bottom.

  • This {} ampersand #169;.

  • What was that rendering as, if you notice, in the web page?

  • AUDIENCE: Copyright.

  • DAVID J. MALAN: Yeah, the copyright symbol.

  • There is, on my US keyboard, no copyright symbol.

  • So you need kind of a pattern of characters

  • with which to represent those in HTML.

  • So just like we have /n and other special escape characters in C,

  • you have what are called HTML entities in HTML that you would only know from

  • reading the documentation, but that's the copyright symbol,

  • but I thought it was rather timely to point that out because just yesterday

  • or this morning, Apple announced that with the very new version of iOS that

  • you can soon download, they added even more damn Emojis to the Emoji character

  • set.

  • So these are certainly in vogue these days

  • and not only do we see, now, a way to represent special characters that you

  • couldn't otherwise type using HTML, it turns out all this time

  • that Emojis are actually just characters, chars,

  • but they're not 8 bits.

  • Recall that C as we've been using it uses

  • ASCII, which uses only 7 or 8 bits total and Emojis, my god.

  • There's so many of them right now and we need more than 8 bits

  • to represent them, and thus was born something called Unicode.

  • Well, that is not why Unicode was invented,

  • but this is what Unicode is now being used for because these emojis are

  • simply like ASCII characters but multiple bytes, generally two bytes,

  • maybe three bytes, and in fact, if you go on unicode.org,

  • you can see that if the number in hex 1F600 represents the grinning face,

  • which happens to be implemented differently by different companies

  • on different devices, but if in closing here,

  • I open up this same file and I change this to 1F600 in hex, 1-F-6-0-0, save,

  • and I go back to my browser and I go back to CSS0,

  • now we have a very happy web page for you.

  • So that's it for today.

  • I'll stick around for questions and we'll see you next time.

[MUSIC PLAYING]

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it