Subtitles section Play video
[MUSIC PLAYING]
DAVID J. MALAN: All right, this is CS50 and this
is lecture 6 and as you may recall, today we
begin to transition away from this low-level world of C
and command line programming into to a domain that's probably
a little more familiar, that of the web, and yet
all the ideas that we've been exploring thus far like functions and loops
and conditions and so forth are still going to be relevant.
It's just we're going to start using slightly different syntax and the user
interface, or UI, is now going to be your browser instead
of a black and white terminal window with just a simple textual prompt,
but how did we get here?
Well, recall that we've looked recently at structs
and what was nice about structs in C was that we had the ability
to make our own custom data types and to kind of encapsulate together
related data, and that became pretty powerful
when it came time for forensics to actually manipulate bitmap files
or JPEGs, and even though this struct is way more complicated
than a student structure, at the end of the day
it's just individual data types that are all somehow interrelated,
and by putting them in a struct you can move them all around
and copy them and save them all together as you might have done as well,
but then most recently did we introduce a somewhat fancier structure,
which is still the same idea.
It's got like one or more things inside of it,
but now, more powerfully, one of those things
had this star or asterisk, which gave us, of course, a pointer or an address,
but what was so powerful about this simple idea and this seemingly
simple symbol is that now we can kind of stitch together in our computer's
memory any kind of structure we want.
It doesn't have to just be one entity.
It can somehow be linked to another and you
can keep linking these structures together as well, and this of course
was an improvement on perhaps our simplest of data structures early on,
an array or a list, but of course, as soon
as you have pointers can you begin to link things together
until we got something like this and perhaps now with the dictionary
implementation you yourself might be exploring a linked list, a hash table,
a [? try ?] or some variant in between.
And then lastly, there was this painting of a picture, whereby
this is your computer's memory put a little more descriptively,
and this is germane only insofar as your computer uses
different chunks of memory differently.
All of your function calls end up using the stack.
All of your users of malloc and its cousins
end up using the heap, and then of course there's this up here,
and what was the text segment, which we didn't really dwell on?
What was the text segment all about?
Text-- you're being volunteered.
Yes, what's the text segment?
AUDIENCE: Files information?
DAVID J. MALAN: Files information, yeah.
Specifically, the 0s and 1s that compose the actual program.
So when you compile your source code, like hello.c, into 0s and 1s,
those end up getting stored in this location in memory
while the program is running.
So long-term they're stored on disk or your hard drive or whatever's inside
of the computer or the server so that the files persist even when the power
goes off or you walk away from the keyboard,
but as soon as you double click a program on your Mac or PC or as soon
as you do ./hello or some command like that,
those same 0s and 1s get loaded into your computer's RAM,
the picture we keep showing, and that's where they live while they're in use
by your Mac or PC or the actual server,
but thus far we've been running all of these programs with something like
./hello or some similar command and running them just in the so-called
terminal window.
But you are probably most familiar, certainly
with more graphical apps on your phones these days
and any time you visit a web browser on your phone
or on your desktop or laptop, you're still interacting with a program.
It's just that program is not only running on your Mac or PC.
Your browser is, like Chrome or Edge or Firefox or Safari
or whatever it is you use.
That's running on your Mac or PC, but what
you're communicating with is a program elsewhere,
somewhere else on the internet and those programs are called web servers.
A web server is just a piece of software that some human or humans wrote
and their purpose in life is to serve web pages.
When you request the homepage of Facebook,
there is a server out there, a program someone wrote,
that essentially spits out the 0s and 1s that compose Facebook's homepage,
but nicely enough, those 0s and 1s are not written as 0s and 1s
by Facebook engineers.
They're actually written as something a little more English-like,
a little more familiar, and it's not even programming code, per se.
It's what's called markup language, and we'll soon see that and more today.
So we've gone from compiling your code and running it
like this to actually doing that in a web-based environment,
but of course, when you're running your own programs in CS50 IDE down here,
you're actually using another piece of software
that fills the screen, CS50IDE, a.k.a.
Cloud9, which is the program essentially running somewhere in the cloud.
And we'll start to make this distinction and through examples
will the distinction among these different types of software
begin to make sense, but where is something like CS50IDE running?
Where is Facebook running?
Where is Google.com running?
Well, back in 1998, Google.com was running on this.
This was Larry and Sergey, the founders of Google's, very first implementation,
apparently, of their first rack of servers.
So servers are generally stored literally in a rack like this.
It's usually like 19 inches wide by convention
and you just stack computer on top of computer on top of computer,
but things were very bare bones back in the day of Google
and so there weren't even plastic or metal
cases around a lot of their computers.
They were trying to minimize cooling, minimize cost, presumably, and cram
as much hardware into that footprint as they could,
and so you actually see a lot of the wires and hardware
kind of sticking out, and this is on display now out west.
Of course these days, fast forward just a decade or two, and this
is one of Facebook's data centers where it's the exact same idea,
but much fancier, much prettier, much better lit servers,
but who serve, at the end of the day, the exact same role.
There are bunches of servers around the world that are just
sitting there waiting for you on the internet
to make a request for a homepage, for an email,
for any other type of information so that it ends up
getting sent from server to client.
And in fact, if you've ever thought about those words server and clients,
which is probably the lesser used of the two,
but a server-client relationship is what you have when you go into a restaurant,
and you ask the waiter or waitress for something to eat.
He or she brings something back to you, thereby serving you,
the client, and the relationship on the web is pretty much the same thing.
We are the clients.
Our browsers are the clients and out there
are servers like these who are serving up content and information,
such as on Facebook.
So let's consider how all this data even gets to us.
So odds are, these days, if you want to visit Facebook.com on your laptop
or desktop or phone without using the app, you probably just
type in Facebook.com and hit Enter.
If you're a little older school or literally older,
you might just actually type out the entirety of www.Facebook.com.
Both work and there are technical reasons for that related to the topic
we'll talk about today, but both of these work just
because Facebook has configured their website
to work in either of those addresses.
Now, as an aside, why are so many websites therefore prefixed with "www"
if both of them actually work?
Like, why have both?
It seems just like redundant to type "www."
if it's implied by Facebook.com.
Yeah, what do you think?
AUDIENCE: Is the "www" required?
DAVID J. MALAN: Is it required?
Nope, not required.
Not required.
Yeah?
AUDIENCE: Is it to identify that it's part of the World Wide Web?
DAVID J. MALAN: Kind of, yeah.
It's to identify that it's part of the World Wide Web
and no one really says World Wide Web these days.
We of course just say web, but back in the day and back in my day, frankly,
it wasn't obvious to a lot of human beings
what Facebook.com might actually even mean, irrespective of the fact
that it didn't exist at some point.
And so there was this sort of signal to the world
whereby you just started prefixing domain names with "www" just
to make super clear to users, oh, this is a website!
This is one of those things on the internet or the like,
and also back in the day, there were also
different services that have fallen into disuse these days,
like FTP was quite popular and Gopher, we used it
when I was here and other such things.
And so "www" was just an arbitrary prefix that just kind of said what it
is, but these days we humans pretty much know what a .com is and .net and .edu,
but even that kind of road is changing again because there's dozens,
hundreds of top-level domains.
It's not just .com and .edu and others now.
I mean, there's hundreds of these things out there
and so it might even be non obvious to this day.
So some people, therefore, go really all the way in and type out http://
and then the address that they want to visit,
but odds are most of us don't do this because our browsers just help us out
and prefix that, but that's where our focus will be today.
Like, this actually has significance because it specifies
what protocol or language, what convention your computer,
your laptop should use when talking to that server's address,
and actually, if you want the communications to be secure,
odds are your typing or your browser is doing it for you.
Adding an s there, denoting secure or encrypted a la Caesar and Vigenere
from some weeks ago and technically your browser
is also probably adding a trailing slash even
if it's not shown to you, which denotes you want the root of the server,
like the default homepage or something else.
In fact, maybe you do want something else.
You don't want just Facebook's website, you want Mark's page,
and so you could specifically /zuck or whatever the username actually is.
So this is a very long way of saying all this kind of stuff
that we type or autocomplete and take for granted these days actually
has some very fundamental meanings, all of which
make possible the entirety of the web.
So what actually goes on with HTTP and what does that actually mean?
So HTTP is a protocol.
It is a set of conventions that dictate how a computer
client, like a browser on your Mac or PC, talks to a web server,
and it's a protocol in the sense that it's not a language, per se.
It's really just a set of conventions and so
like this is kind of an arbitrary and awkward human convention.
Hello, I'm David.
AUDIENCE: I'm Kara.
DAVID J. MALAN: Kara, so Kara and I just introduced ourselves.
I extended my hand and she kind of knew instinctively
that it would be awkward not to shake my hands or to shake my hand
and so we exchanged pleasantries and said hello.
So this is just kind of a silly human convention
whereby we've agreed sort of socially in advance
how to greet each other in that way.
So HTTP is pretty much the same thing, but in this case
you're not actually physically doing something like that.
You're kind of sending a message from client to server.
You're putting a sort of handwritten note into an envelope this,
addressing it somehow and then sending it off on the internet for Kara
or for Facebook.com or Google.com to actually receive,
and then when Google or Facebook or Kara receives that note,
reads and sees what I want, the server or the human
responds in some according way.
So what then goes inside of this envelope?
Well it turns out that when a web browser,
like Chrome or Edge or Firefox, Safari, make a request,
the message they put inside of one of those envelopes, albeit virtually,
is literally this text.
It's like if I had it written down on a piece of paper literally GET / HTTP/1.1
Host: www.facebook.com and then the "..."
just means there's other stuff in there, but it's less fundamentally interesting
right now.
So what's this all mean?
GET is just a verb and it kind of says what it means.
Go get something from the server.
HTTP/1.1 mentions the version of HTTP that I am using or the human convention
that Kara and I were actually implementing there,
and so 1.1 tends to be the one most in use these days, and then /, again,
it's just like the default identifier for the homepage of a website,
the default page that you see in the absence of typing something like
and /zuck or some other suffix.
Host: Is the same thing as whatever's on the outside of the envelope.
So if I'm sending a message to www.Facebook.com,
I'm just making super clear inside of the envelope which server should expect
this request just in case there are multiple websites running
on the same physical server, which is possible for economic and performance
reasons these days.
So alternatively, if I were trying to visit Mark Zuckerberg's homepage,
the request in that envelope's going to look almost the same,
but I'm going to be more precise.
/zuck instead of just /.
Meanwhile, if I'm requesting something from Yale's homepage,
the request would look like this, or from Harvard's web page,
the request would look like this and so forth.
So once Harvard or Yale or Facebook w actually
received the request in that envelope, opened it up, look at it,
how do they decide how to respond?
Well at the end of the day, I'm probably expecting
to get back from the web server some kind of, excuse me,
web page whereby I want to see my news feed on Facebook
or I want to see the search page on Google
or I want to see Harvard's homepage, Yale's homepage
or whatever it actually is.
So there's a lot of information probably packed into that envelope,
but there's also a conventional, a standard,
response that looks literally like this.
So at the very top, for instance, of the "letter" that comes back from Google
or Facebook is a message like this.
Got it.
I'm speaking HTTP version 1.1 also.
Everything is OK and 200.
We'll come back to that in second and then
the type of content inside of the envelope,
if I keep digging deeper into it, is going
to be text, but more specifically, HTML, and we're
going to focus on that today too.
HTML, Hypertext Markup Language.
This is going to be the language in which web pages themselves are written.
Then there's usually some other stuff and way down there
is the actual contents of Yale's or Harvard's or Facebook's homepage,
but let's zoom in on this for just a moment.
200, odds are you've never seen or cared to see this kind of number before,
but have you ever used the web and requested a web page
and seen some number that for some reason keeps popping up in your life?
AUDIENCE (IN UNSION): 404.
DAVID J. MALAN: Yeah, 404.
It's just kind of a weird thing that many of us in the room
know 404 even if we're not necessarily technophiles
and know what HTTP is, but it turns out that in these envelopes coming back
from servers sometimes are not just 200 OK, but instead--
dammit, typo.
This would be much more effective if I said it's not this.
It's not found.
So inside of the envelope is 404 not found, which means exactly that.
The file was not found that you were actually seeking.
You mistyped the URL, the page was deleted.
Somewhere or other, there was some kind of typographical error
and it turns out there's a lot of the status codes in HTTP
and there are even more than these, but these
are the ones we might see the most commonly.
200 OK means all is indeed well.
404 means not found.
403 forbidden might be if you've not logged in
or don't have the right access in order to access some folder
or file on some website.
This is really bad and we'll get to know this over the coming weeks as we
ourselves start implementing code on a server.
500 internal server error, if you will, shall be our new segmentation fault,
but hopefully not too frequently.
It means something is wrong in the code on the server.
This was an April Fools' joke back in 1998 I believe, yeah.
So April Fools', some humans decide it would be funny to announce to the world
that there's yet another code, which is 418 I'm a Teapot,
which kind of comes up from time to time in actual code and then there's this
one--
301 Moved Permanently.
It's kind of a scary sounding thing, as though a website just
kind of up and left and went elsewhere, but it's a powerful mechanism
in the following way.
If a server inside of one of these envelopes
responds with a response like this, there
tends to be one other piece of information at least.
So if I visit a website like http://harvard.edu,
I might get back in the response from Harvard's web server
this answer, 301 Moved Permanently.
Like where the heck did Harvard go?
Well you can see the location based on this other line
and all of these things collectively moving forward we're
just going to call HTTP headers.
Anytime you see a word and a colon, that's
an HTTP header with a name and a value and the first one
ones a little anomalous in that there's no colon,
but that's the only one without the colon.
So location colon http://www.harvard.edu.
Well what's going on?
Well, if I actually visit Harvard's homepage exactly as follows,
let's take a look at what happens.
I'm going to go to http://harvard.edu, Enter.
And notice there's a whole bunch of more stuff happening on the screen thanks
to what's called autocomplete, which is a feature of Chrome or my browser.
It has nothing to do with the topic at hand.
This is just Chrome trying to be helpful today as on your computer too
and suddenly, even though I tried to go to http://hardvard.edu, ,
where did I clearly end up?
HTTPS, so they added the s somehow and what else has it added?
[VARIOUS ANSWERS FROM AUDIENCE]
DAVID J. MALAN: Yeah, the web.
The www prefix was added.
So this is not sort of all that important to the user
like I got to my destination somehow but the reason for that is as follows.
I'm going to go ahead and open up, in the IDE actually,
just a terminal window here and I'm going to use a new program called Curl
for connect to a URL ://harvard.edu, Enter.
And I get back some cryptic looking things and that's actually HTML,
and we're going to come back to this in just a moment
because it turns out there's two parts to the messages coming back.
There's the headers and then there's the content, and we're seeing the content.
So more on that in a bit.
I want to look a little higher up in the response and literally just look
at the headers, and to do that-- and you would only know this from reading
the documentation--
-I means show me just the headers that are coming back.
So here now we see the headers coming back
and you'll see indeed we got back a 301 Moved Permanently,
and then there's some other stuff we haven't really focused on,
but at the bottom is something we have--
location, which says to the browser go to this URL instead.
All right, so let me do that.
Let me save time and just copy paste this and then do curl -I of this,
Enter, and pretend to be a browser requesting that page now, but now
where are they trying to send me?
HTTPS.
So this suggests via some mechanism, some human at Harvard
decided one, uh-uh.
We're not going to be called like harvard.edu.
We shall be www.hardvard.edu for whatever reason
and then they also decided that if a user visits us using HTTP, which is not
encrypted, not secure, we're going to forcibly tell them to come back
via secure channel, and we won't dwell today on how that's implemented,
but much like in Caesar or Vigenere where was a way to encrypt or scramble
information, browsers can do that too and it's
implied by using the HTTPS instead of just HTTP.
All right, so let's actually visit this one more time.
Let me go ahead and highlight that location.
curl -i of that address and now an overwhelming amount of information
coming back, and that's why I kept putting the ...'s, but the juicy stuff
is at the top.
Now everything is 200 OK and indeed, if I run it without -I
so I see the contents of the envelope, it's
like looking deeper inside of the envelope,
now I actually see a lot more content, which collectively
composes Harvard's homepage, and it turns out
we can see this even in Chrome.
Let me go over to my browser again and if you've not done this before,
it turns out that you can go to your View menu, Developer,
and go to Developer Tools-- and we'll do this in upcoming problem sets--
and I can go here and see a whole bunch of features, only a couple
of which we might look at today.
Specifically, I'm going to click on this Network tab.
So to be clear, Developer Tools in Chrome still shows me the homepage,
but it kind of dedicates part of the screen
to these special developer tools that make it easy to understand and actually
create websites.
So eventually we'll start using this ourself,
but what's nice about the Network tab is that you can sniff or monitor
all of the requests going back and forth between browser
and server in the so-called envelopes.
So I'm going to hit a little Clear symbol here first just
to get a clean slate.
I'm going to click preserve log so I can actually see what's happening
and now I'm going to go ahead--
actually, I'm going to go ahead and do this.
http://harvard.edu, so the sort of incorrect version that I'm going
expect the browser to fix for me.
I hit Enter.
A whole bunch of stuff is flying across the screen
and in fact if we zoom in on this, you can
see that just visiting Harvard's home page
requires 85 envelopes it would seem going back and forth with pieces
of the webpage and we'll see soon with some of those pieces are,
but it's not just one file coming back.
It's bunches of files.
Maybe images, maybe fonts, or some other things too,
but I'm going to scroll up in this output
and now notice the story that's been told here too.
So the very first request, which I can hover over and see,
came back with a 301, which we now know is Moved Permanently,
or it's a redirect.
Then if I hover over the second one, you'll
see that it's a slightly more precise URL, www, but still with HTTP.
So that got redirected and then lastly, if we look at the third line here,
this is the one we ultimately ended up at
and indeed it comes back 200, as do bunches of other results thereafter,
and we'll see what those 200s actually mean.
Now, you can do a little better than this
and it's perhaps fitting that our friends down the road indeed did.
Let me go back to the IDE.
Let me go ahead and clear this and instead of curling harvard.edu,
let me do http://yale.edu and ask the question,
what would be a better approach-- knowing these ingredients that we now
have of how redirects work.
How could Harvard do better in terms of getting the user to the address
that we intend them to be at?
Yeah.
AUDIENCE: By not forcing like, two redirects?
DAVID J. MALAN: Yeah, by not forcing two redirects, right?
Even if some of this material is new, we've
long talked now about correctness and design and style
and we've seen some messy style on the screen and that's fine for now.
More on that later.
It seems to be correct because it's working,
but it feels like it could be better designed
because why make one request then make another request just
to fix the first request then make a third request just
to fix the second request?
Why not combine them?
And, as it turns out, someone down the road had that same intuition
and so we visit yale.edu with just HTTP and without the www,
they, in one fell swoop, actually redirect us
to the right place in this case.
So, with that said, it's perhaps fitting that just
a few years, well, some years ago now, you
might have tried to visit this particular address,
and this is something I can only do in Cambridge.
If I go ahead and open a new browser and go to http:// shall we say
safetyschool.org and hit Enter if you've never been.
Oh, interesting!
[STUDENTS LAUGH]
DAVID J. MALAN: And apologies for those of you tuning in online
live from New Haven.
So how is this possibly working?
It's actually a very simple heuristic.
If instead of selecting Yale or Harvard or any other address,
if I literally do like safetyschool.org, we can wrap our mind around
what's going on underneath the hood safetyschool.org has moved permanently
to New Haven it would seem, but it's via this very simple mechanism that someone
back in 2000 registered this domain name,
and so actually as I was looking this up in the history last night,
I was amused to find that whoever bought the domain has been paying for this
domain name now for 17 years for this joke annually, but it's well worth it,
but I think it would be--
[STUDENTS LAUGH]
DAVID J. MALAN: But I think it's only fair now,
it's only fair if we take a look at another one too.
It turns out that if you visit harvardsucks.org, that one has also
redirected, this time to www.
So let's follow this little breadcrumb. curl -I harvardsucks.org,
and this one's OK.
So that means something lives at harvardsucks.org
and it does not as cleverly redirect to harvard.edu,
but to introduce this, let me actually introduce
a friend of ours who's now very awkwardly visiting from New Haven
today.
Hi Natalie.
Do you want to come on up and say hello for just a moment?
So this is Natalie, who is our head of the class with Benedict Brown
and [? Anushri ?] and with [? Staleos ?] in New Haven.
If you'd like to say a quick hello?
Hi, Hi, everyone.
DAVID J. MALAN: So nice to have you here today and as you know--
do you want to make mention of what we're about to see here?
What happened back in 2004 just a few years later?
AUDIENCE: We did a prank back, basically.
DAVID J. MALAN: OK, so perfect set-up.
Thank you very much.
Hello to Natalie.
Let me go ahead and hit play on three minutes
that are kind of hard to justify academically,
but it's perhaps one of the best pranks that's ever been played.
Long story short, our friends down the road
got together with a few of themselves just before Harvard Yale, which
was to be at Harvard that year and actually
mapped out using software, a sort of grid system
that lined up with all of the seats in the Harvard stadium,
whereby you assume that a human each takes up some amount of space,
and then they used special software to figure out
how they might spell something out in the audience in a way that
would be readable to the opponents, the Yalies, on the other side.
So if we could dim the lights for this look back
at yesteryear and a slight use of software.
[MUSIC PLAYING]
- All the way at the top.
- This is for you Yale.
We love you Yale.
- We're here to cheer for Harvard.
- Yeah!
Let's go Harvard!
- Yeah, Harvard!
- Take the top one and pass it down.
- It's not going to say something like Yale sucks is it?
- It says Go Harvard.
- We're nice.
- You see that shit?
Look at them, they have the paper!
It's gonna happen!
It's actually gonna happen!
I can't [BLEEP] believe this.
- What do you think of Yale?
- They don't think good!
- It may be a complete mess, I don't know.
- Does everyone have it?
Does everyone have their stuff?
- The probability that it's gonna be legible is very small.
- It's gonna happen!
It's gonna happen!
- It's too complicated.
- Look, look at all the signs.
- I know but it's too complicated.
- Uh, what houses are you guys in?
That's not a real house.
- Ho-fo?
- Yeah.
You guys aren't from Harvard are you?
- No, fo-ho.
Pforzheimer!
- Yeah, but he said ho-fo.
- Let's just make sure everyone has it.
- Well she's probably drunk.
- Are all the cards disributed?
- Almost!
[APPLAUSE]
[CHEERING]
- Hold up your signs!
- They [BLEEP] did it!
[CROWD CHANTING "YOU SUCK!"]
- They [BLEEP] did it!
They [BLEEP] did it!
[CROWD CHANTING "YOU SUCK!"]
- What do you think of Yale sir?
- They suck!
- One more time!
- One more time!
- Oh and there it goes again!
[CROWD CHANTING "HARVARD SUCKS!"]
[END PLAYBACK]
DAVID J. MALAN: All right, we've been talking
about what goes on inside of this envelope,
but what goes on on the outside?
So when you hand off this envelope from your laptop or your phone
to the internet, how does it actually get to its destination?
Well you've probably heard this acronym IP, or internet protocol,
and it turns out that every computer on the internet and every phone
in this room and any very laptop in this room has a unique address.
That unique address is known as an IP address and it's much like the address
of a building in the real world, like the Science Center might be a 1 Oxford
Street Cambridge, Mass 02138, USA.
Down the road is the CS building.
33 Oxford Street Cambridge, Mass 02138, USA.
So those long strings uniquely identify buildings
in the world for the mail service and the like
and similarly do IP addresses uniquely identify computers on the internet.
These addresses are much more succinct though.
They're not long strings they're instead just numbers that have four parts
and each of those numbers within the IP address are a value from 0 to 255.
So the lowest IP address is all zeros and the biggest IP address
is all 255s with some constraints.
You can't quite use all of those numbers.
So just as a sort of quick teaser, if the smallest number is 0
and the biggest number for each of these sections of the IP address is 255,
how many bits are being used for each of those four numbers?
AUDIENCE: 8.
DAVID J. MALAN: Yeah, 8.
So remember like 8 bits gives you 2 times 2 times 2 times 2 times 2 times
2 times 2 times 2, which is 256, and indeed we have 256 total values from 0
on up to 255.
So an IP address is 8 plus 8 plus 8 plus 8, or 32 bits total,
or, just come really full circle with week zero, if you have 32 bits,
roughly how high can you count?
Like what's 2 to the 32 power?
Yeah, it's roughly 4 billion.
So, long story short, the implication of this very simple definition
is that apparently there can only be, in this model, four billion computers,
phones, refrigerators, internet of things, devices on the internet at once
if they do all need an IP address that's unique.
So I've been telling a slight white lie in that they
don't have to all technically be unique because there's
ways we can share addresses, and it turns out
there's even bigger addresses these days that aren't just 32 bits but 128
bits, which is just massive and daresay unpronounceable how big that number is.
So we've gotten ahead of this issue, but you'll find that in a lot of locations,
companies and internet service providers like Comcast and Verizon and the like
and campuses like Harvard and Yale, you can notice that they tend to follow
patterns, like many of the IP addresses here at Harvard start with
140.247.something.something or 128.103.
Down the road in New Haven, a lot of the IP
addresses there start with 130.132 or 128.36,
which is not at all interesting to the humans who are using these IP
addresses, but it is useful to the servers or the devices that
are actually routing these envelopes from one place to another.
Meanwhile, in our homes and even sometimes on campus these days,
there are also what are called private IP addresses, which
are numbers within these ranges, and this
has been a solution so that when you sign up for Verizon or Comcast
back home or your parents do for internet service,
you technically only get one IP address from your internet service provider.
That's what you're paying for per month, but thanks to something
called network address translation and other technologies,
you can actually give all of your siblings and parents
and family members or roommates in the household their own unique address.
It's just private in the sense that no one else on the outside world
can access it unless you initiate the connection.
So this is generally why at home you can reach
any website you want any service on the internet that you want,
but you can't have like random people necessarily
trying to get into your laptop or your device at home
because there's a device, a home router, that translates these private addresses
into otherwise public addresses, but for now the takeaway really
is just that every computer on the internet has an IP address,
and if you've ever poked around your Mac, like under System Preferences,
you can actually see this.
So I've just pulled up a screenshot here of a network control panel on Mac OS
and if you look roughly there on your own Mac,
you should see that your IP address is something.
It will completely vary by person and by geography,
but you'll see your IP address there.
On Windows, at least Windows 10, you can see your IP address
under Settings here as highlighted here.
So this has a very different address, but that's
just because this person was on a different network all together.
So, where did these IP addresses come from?
Well back in the day someone would literally
come to your home to set up your Comcast or your Verizon internet service
and he or she would like type in these numbers into your Mac or PC
and then leave, and you would have one computer on the internet.
These days it's a lot more dynamic.
You don't need someone coming by.
That certainly doesn't scale very well because there's other protocols.
HTTP is this protocol we talked about earlier about web pages,
but there's other protocols like Dynamic Host Configuration Protocol, which
is a mouthful but it just means that our Macs, our PCs, Android phones, iPhones
and the like, if they speak this protocol, when you first
turn on your phone or boot up your laptop it knows,
if it has support for this protocol, to just announce to the internet,
hello world.
I'm awake.
What should my IP address be?
This just kind of broadcast message and if Harvard or Yale or Comcast
or Verizon or wherever you are in the world
has a DHCP server whose purpose in life is just to listen for those hellos,
that server should respond using the same protocol with your actual IP
address, and it figures out which one to give you based on and available pool
of numbers typically.
So that's how you might get this but there's
other things in these control panels.
In fact, if we look a little lower on Windows, there's DNS servers too.
Domain Name System.
Another acronym and a bit of a mouthful, but you can also see this on Mac OS/2
if you actually click Advanced and actually take a look.
Here, for instance, there's mention of something else altogether, a router.
So there's lots of different addresses going on here
and lots of different servers.
So how do these all piece together?
Well, DNS is an interesting one in that it's
going to be the one that translates domain names to IP addresses, right?
None of us ever probably visits http:// and then a number, right?
Like, we visit facebook.com, google.com or the like,
but that's because our computers knows how to translate one to the other.
So in fact if I do this command, nslookup for name server look up
and then I type in something like google.com, I'm asking the computer,
in this case, the IDE, what is the IP address of google.com.
I know it as the human as google.com, but the internet knows it
by its numeric unique address, and it turns out Google has several,
and even this is a bit of a white lie because they have thousands,
but the ones that my computer is being told to use
is, for instance, this one or this one or any of these other addresses.
So let me see what actually happens here.
If I highlight that address and open up a browser and go to http:// and that IP
address and hit Enter, notice it actually seemed to work.
Well, why is that?
It's a little hard to see it in Chrome, but let's
go ahead and open up the Inspect tab and go to Network just like before.
Let me click Preserve Log so that it saves everything here,
and I could be using curl.
So the curl was just the simpler version.
Now I'm using the more familiar graphical version.
Let me go ahead and do that again and go to http:// and that IP address and hit
Enter.
A whole bunch of stuff flew by even just for Google's homepage,
but notice what happened.
On that very first-- whoops--
request, if I hover over it, I see http:// and then the number that I
typed in, but it's a 301 because, what was the response?
We can actually see these responses.
Let me click on the status code here, or the row, go to Headers
and notice here, if we zoom in, we'll see that Google
responded with this location.
So someone at Google just decided, OK, fine.
You figured out one of our IP addresses.
That's great, but we don't want you to see that in the URL.
It's bad for branding.
We don't want you to bookmark an IP address
because it might change later on.
So we're using the same mechanisms as before,
but that's how we might do the lookup and we can see the same thing
for any number of websites.
Here we go nslookup of harvard.edu and we get back just a couple here.
If I do the same on Yale, I'm going to get back different IP addresses.
Yale has even more in this case and so this
is how the computer's figuring out to where to send the data.
So what goes on this envelope then, it's going
to be not facebook.com harvard.edu or yale.edu,
it's actually going to be the address like 1.2.3.4
or whatever the actual IP address is of the server I'm trying to send to.
Now, of course, I expect a response from the server.
I want to get back my news feed or I want
to get back Harvard or Yale's homepage.
So what more should I probably put on this virtual envelope,
just intuitively?
Yeah.
AUDIENCE: Your own IP address?
DAVID J. MALAN: What's that?
AUDIENCE: Your own IP address.
DAVID J. MALAN: My own IP address, yeah.
So just like in the human world, just in case something
goes wrong with the post office, I might put my own address, 5.6.7.8,
and actually put that on the envelope so that if something goes wrong or, better
yet, if something goes right and they're ready to give me a 200 OK,
it can actually come back to me because they know from which
address this thing actually came from.
So who is it or what is it that's doing all of this routing?
Well it turns out there's servers on the internet called quite simply
routers, otherwise known as gateways, which is just a synonym,
and they're kind of artistically pictured here as just dots
across the world, and there's hundreds, thousands, tens of thousands
of routers.
Odds are you yourself at home, if you had internet access,
have at least one such router and its purpose in life,
again, is to take data from inside your household and send it to the internet,
and then any responses you get, to send it
back to the appropriate laptop or desktop or phone
or smart device that happens to be in your own home.
And we can actually see this too.
Let me go ahead and in CS50 IDE, try one other command.
I'm going to go ahead and type traceroute and I'm
going to trace the route, say, to yale.edu from here,
or technically from the IDE.
So if I hit Enter here, we're going to see a few lines of output,
and if you try this at home, just realize
I've configured my IDE a little differently to simplify the output.
So it looks like there's five steps between Cambridge
and New Haven or technically the IDE and New Haven,
but what are each of these steps?
Well between here and Yale, if we continue that version of the story,
there are, it seems, five routers.
There are five computers that have like lots of RAM, big CPUs
that can handle a lot of internet traffic
that are figuring out how to get my envelope from this origin
to this router, to this router, to this router, to this anonymous router,
to this one.
Sometimes the routers are configured not to answer these questions
from this program traceroute.
They sort of keep it to themselves, and you
can see on the right of each of these IP addresses some numbers.
So just take a guess, what do each of these numbers represent, perhaps?
Whats that?
No it's okay.
AUDIENCE: Milliseconds?
DAVID J. MALAN: Milliseconds, yep.
So milliseconds that are measuring what do you think?
Time to go, or time to reach that specific router.
So we can kind of infer--
and this is the kind of amazing thing.
To get me to New Haven takes like two plus hours,
but to get an email, to get an envelope with a message
takes like 10.597 milliseconds to get data from here to there,
and then hopefully back if it's a request for a page.
Let's do something a little farther away.
So let's do like stanford.edu, tracing the route here,
and already we can see that the numbers are a little bit higher,
and that makes intuitive sense in that Stanford's
a little farther away than New Haven and it takes as many 41 milliseconds
to reach that.
If I go even further and I read like a company's news
like cnn.co.jp, which is the top-level domain for a lot of servers in Japan,
you can see a real uptick in just how many milliseconds it takes,
and in fact, there's something curious here.
Why does it take so much more time to get from router number three
to router number four do you think?
AUDIENCE: The ocean.
DAVID J. MALAN: The ocean, yeah.
So there's a really big body of water in between the US's west coast and Japan's
coast, which probably explains why not just between three and four,
but really every router thereafter is that many milliseconds away.
So these aren't cumulative.
We're measuring constantly from here to there,
from here to slightly farther, from here to slightly farther.
So it makes sense that once you cross that ocean,
that's kind of the total value that you're actually going to see,
and it's fascinating really.
I mean, throughout the entire world there
are not only wireless technologies today, but very much wire technologies
and if we take just a few seconds, we can
see this visualization of so many of the transoceanic cables that have actually
been dropped by big ships that carry many, many, many, many bits from one
coast to another.
[VIDEO PLAYBACK]
[MUSIC PLAYING]
[END PLAYBACK]
So, with all of those cables capable of transmitting data all around the world,
it turns out there's still one more problem.
Even if we want to do something simple like
download an internet image of a cat because there's
different types of servers out there.
There's my computer here like my laptop.
I'm running Mac OS or windows.
There's all those servers in Google's data center
and in their racks and Facebook's and the like
and in between all of those servers there are lots of routers,
but it turns out that those servers in those racks at Google,
at Facebook, even at Harvard and Yale, there
are servers that can do multiple things because technically,
even though we humans tend to talk about servers as being physical devices,
a server is, as we started today, really just a program.
It is a piece of software that someone wrote that, when run,
listens for requests on the internet and responds to those requests,
generally by spitting out information, text or 0s and 1s or, in some cases,
cats.
So upon receiving an envelope, then, how is
it that a server knows whether it's a request for a web page or it's an email
or it's a chat message or a voice message or any number of other things?
It turns out we need one more piece of information at least on this envelope.
It turns out that the world has standardized
via another protocol called TCP, Transmission Control Protocol, that you
need at least one other number on these envelopes,
and that number corresponds to the type of service
that you're trying to access or the type of data
that you're trying to send or receive.
So, for instance, 22 is for something called SSH, Secure Shell.
This is something that most CS majors might use, but most people in the world
wouldn't use this because it's entirely command line
and it allows you to connect securely to some remote server
without using something like a browser, but all of us generally do use browsers
and HTTP, it turns out, all this time has had a unique number associated
with all of those requests.
80 is the number and if we visited any URL starting with https,
turns out there was a special number, 443,
that humans years ago decided just uniquely identify encrypted web
traffic requests and responses.
587 is used for Simple Mail Transfer Protocol, which is for email.
Excuse me, 53 itself is used for DNS.
So if you ever send a message to a server saying
what is the IP address of google.com, you're
using number 53 to identify whatever machine or software can
answer that type of question, and so we can actually see this too.
If I go back to my IDE and I actually do curl -I https://www.harvard.edu,
this of course, worked before and it was 200 OK,
but it also will work if I more precisely say specifically send this
request to TCP port, or number, 80 and--
damnit.
Oh, it's wrong because made a compelling pedagogical mistake.
So what did I do wrong?
AUDIENCE: Https.
DAVID J. MALAN: Yeah, so I kind of screwed up my numbers here.
So I said https, but I meant to say http if I'm using port 80
or, conversely, if I want to talk to the secure port which is known,
I actually want to say 443, and that one in fact works,
and I can do it again even in Chrome.
If I go up to my browser and go to http://yale.edu/80
and let the redirects happen, that too will work.
It's just browsers, to keep our minds focused on the website we're actually
trying to visit and not distracted by technical details like :80
or slashes or even sometimes http itself,
just hide that from the URL bar.
It's all there.
It's all happening, but we humans are getting a little more comfortable
with the internet over the years so Chrome and other browsers
are just starting to hide some of these lower-level implementation details.
So that really means, when I actually want to send a request to a web server,
I should really write :80 on the envelope
to make clear that that's going to a web server listening
on port 80 or maybe 443, and then, you know what?
It turns out, and we won't dwell too much on the details,
even my Mac or your PC also has its own port number for all of these requests,
right?
And it would be pretty annoying if you could only visit one website at a time
or you could use Gmail or Skype but not both
at the same time, or Facebook Messenger or Google Chat but only one
at the same time.
That would be pretty limiting, especially when
we have all this computing power.
So it's also the case that your own computer,
any time you send a request on the internet,
chooses a random or pseudo-random number to uniquely identify
the piece of software on your computer that's waiting for the reply.
So this might be not port 80.
This is going to be a bigger number like 1025,
or some large-ish value all the way up to 65,000, even,
or 32,000 that now uniquely identifies the port on my computer,
and that's how your computer can do multiple things at a time,
and when I get the response those values are just flipped,
but there's one more piece.
Like cats can be pretty high quality and videos certainly
take up a huge amount of data.
Netflix videos and any streaming videos are taking up
a huge amount of information and it would
be pretty annoying to your neighbors if any time you
were watching a movie on Netflix, you had
to be done watching the movie in order for a neighbor
to also watch a video on his or her computer as well.
So it turns out that what computers also do thanks to IP
and TCP is, when they're used together, they offer one more feature still.
It turns out that if I want to download a picture of a cat,
and we have a nice printed version here, I'm
not going to get the whole cat in the one envelope most likely.
This cat or this video file or whatever it is
is actually going to be divided up into a few different pieces.
So this message might get chopped or fragmented into four pieces.
Each of those four pieces now might go in each of one of these envelopes
here, here, and then here with the third and fourth,
and what's nice, though, about TCPIP is that it
provides at least two features for us.
One, IP ensures that every computer on the internet that speaks this protocol
has an address.
So IP handles the getting of the data to some destination.
TCP, the other half of this, ensures or guarantees
with high probability delivery-- that the data actually gets there.
Because as you might have gleaned from even the animation of all
of the transatlantic cables and all of the interconnections among routers,
things can go wrong, right?
Routers, it turns out, can get overloaded.
Their buffers can overflow such that they
can't handle all of the traffic coming into them and in fact,
if you try to watch Game of Thrones, some episode on HBO
and you couldn't access it at some point or [INAUDIBLE] or some tool like that.
If they're overloaded, what does that mean?
It just means the server, or the routers between us and the server,
are getting so many darn envelopes that they just can't keep up
and can't hold onto them all at once, and so sometimes packets do get,
so to speak, dropped, both physically and also digitally,
and this means some packet is lost.
And so what's nice about the internet is that when my computer here
talks to the nearest Harvard router that may very well have antennas
in a room like this or an access point, I might send off a packet here and here
and let's send this all the way to the back if you could,
but these packets, as you can see, don't necessarily
need to travel the same [? path ?] because-- what's
your name in the second row?
AUDIENCE: Monsi.
DAVID J. MALAN: Monsi.
So Monsi is getting a little busy.
So Kara, if you could route to someone else.
This is literally the effect that happens on the internet.
If one router, like Monsi, gets a little bit busy
and her attention is elsewhere or just has too many packets to deal with,
she won't even necessarily drop it but maybe
their path will just be routed around her,
and that's what's nice about having this mesh network around the internet.
Now unfortunately, one of those packets can get dropped
and in fact this is a perfect example.
If you want to drop it, drop it.
Uh-oh, a packet was dropped!
What TCP does for us is the following.
Once those envelopes reach hopefully one specific person--
OK, you are the lucky winner.
Whoever, wants to-- how many do we have?
Two there?
Where did the third go?
That's OK.
TCP can handle multiple packets being lost.
AUDIENCE: It's over there.
DAVID J. MALAN: Oh, and so packets also don't take the shortest path sometimes
on the internet.
So what might happen?
So let's assume for the sake of discussion
that those packets did make their way to at least one of our audience members
here.
He or she, upon receiving them, would also
see not just the origin address and the destination address.
There would also be some notation, like a memo line on the envelope saying
1 of 4, 2 of 4, 3 of 4, 4 of 4, so that the recipient can infer
from that little hint whether or not they received all 4 or just,
as in this case, a subset thereof, and in that case,
assuming the computer speaks TCP, it can simply say,
hey David, resend me packet number 1 or packet number 3
or whichever were actually lost.
And so together all of this happens at blazing speeds.
10 milliseconds to do all that back and forth to New Haven,
let alone even faster here on campus, but those really
are the basic principles and building blocks
that are just getting our data from one place to another.
Of course, the real interesting stuff happens
when we dig deeper into this envelope and look at the contents.
Not just the cat as in this case, but the language, HTML and something else
called CSS which we'll do shortly, but I thought
it might be fun, especially on the heels of our look at forensics,
to take a look at just how sort of presumptuous Hollywood
tends to be when presenting us humans with technical details
that now you'll perhaps have an even better eye for in addition
to the age-old "enhance" line.
[VIDEO PLAYBACK]
- It's a 32-bit IPP4 address.
- IP as in internet?
- Private network.
[? Tamia's ?] private network.
[STUDENTS LAUGHING]
- She's so amazing.
- Oh, Charlie.
- It's a mirror IP address.
She's letting us watch what she's doing in real time.
[END PLAYBACK]
DAVID J. MALAN: OK, so we'll hold it on this screen
here because one, a few of you laughed when you saw the bogus IP
address because the number was what?
AUDIENCE: 275.
DAVID J. MALAN: 275, which is too high and that one
we could forgive because you don't want like random people pausing
their videos on the internet then trying to hack into or get access
to that URL, but even funnier is when the hacker is being described
as doing this on the screen as part of their attack.
This is like the source code in a language called Objective-C
for some kind of drawing program, as suggested
by the use of crayons in the code as a variable.
So let's pause there and when we come back in five minutes, we'll take a look
at HTML itself.
All right, so we're back and we're about to learn a new language.
Though this might feel like a lot to do in just an hour,
this one's a markup language.
So it's not a programming language, which
means you're not going to see loops.
You're not going to see functions.
You're not going to see conditions or any of the kind of logic
that we have built into C and into Scratch and eventually
Python and JavaScript.
You're instead going to see just what are called tags, pieces
of English-like syntax that just tell the browser what to do
and what to stop doing.
So we're going to see tags that say start making this text centered.
Stop making this text centered.
Start making the text bold.
Stop making the text bold.
So these very deliberate kind of statements
that we're going to express using something that's code-like,
but it doesn't give you logical control.
So as such, there's a pretty small language ahead of us
and a lot of what you'll do when learning HTML is just
check an online reference or an example online or look at the source
code of actual web pages to just figure out how these things are done
and today, we will focus on the fundamentals.
So this is perhaps one of the simplest web pages you
can write in a language called HTML.
It's a text-based language.
All of the tags resemble some English words
and there's a pattern to the kinds of things that you might type.
First of all, if you're using the very latest version of HTML, which
happens to be version 5, it's been around for a while,
you simply start every web page with this cryptic incantation at the top
here.
Open bracket, !doctype HTML female closed bracket,
as those things are called.
Angled brackets, which you've probably not had many occasions to type
on your keyboard, but starting soon you will.
Then after that, they start a pattern.
So HTML > and then all the way at the bottom is what we'll call the opposite
of that tag.
If this is a start tag, this will be an end tag, or if this is an open tag,
this will be a close tag, differing only with this forward slash that's
inside of the tag.
So this says, hey browser, here comes a web page.
This says, hey browser, that's it for the web page.
Again, this sort of starting and stopping mentality.
Meanwhile, inside of the web page as denoted by the HTML tag,
there are two parts, a head and a body.
The head of a web page tends to contain very little.
It's usually just like the title bar in the tab
that we humans see when you visit a website,
and the body is like 95% percent of the contents
of the page, the actual viewport or the rectangular region
that contains actual content.
What is that content?
Well here in the head we have a title that's
going to be "hello, title" just because and then
in the body of the web page, this web page,
there's going to be "hello, body."
That's it.
That's HTML.
If you save this text in a file, open it in a browser,
you will see a really lame web page that says hello title and hello body,
but that's a web page using HTML tags as they're called.
Anything in these angled brackets are tags.
So I can actually see this pretty clearly even on my Mac
and you could do this on your PC as well.
I've opened up TextEdit and I've configured it to be simpler than
the default, so know that I've done a little something in advance,
but you could use notepad on Windows or any other number of other programs,
even Microsoft Word if you save it in the right way or Google Docs,
but let me go ahead and just recreate this as !DOCTYPE html, open bracket,
html, and just to kind of remember to do things,
I'm going to tend to get ahead of myself and sort of start and finish
the thought and then dive in inside.
Let me go ahead and do head here, close head tag here,
and I'm indenting, just for good measure, one, two, three, four tabs,
though so long as you're consistent the browser will
be perfectly content, as will we.
hello, title, title, open bracket, open bracket body, closed bracket body,
and then hello, body.
So that's it.
I've just typed out the exact same thing as before.
Let me go ahead and save this as not hello.txt or certainly
not hello.c but hello.html by convention.
I'm going to hit Save.
Mac OS is kind of warning me that this is text, not something called HTML,
but I know what I'm doing and I'm going to say use HTML,
and now I have a file called hello.html, and if I go to my desktop, here in fact
it is.
And if I double click on it, there, in fact, is that pretty simple web page
and if I actually reveal the tab, there it is.
Hello, title in the very top tab of the page and once I get rid of that
do I see the body again.
So that's it for HTML at least in terms of its basic structure,
but there are some other features that we can take advantage of as well,
and let's actually tease these apart.
Notice, first of all, that there is indeed this symmetry.
What is opened is almost always closed as well in the opposite order.
Just as head here and title here, and then
followed by body and then the contents therein,
but because there is this structure, you can actually
think about this in a relation to the past couple of weeks
when we've talked about data structures.
I would argue that this HTML on the left is
kind of equivalent to this tree on the right,
and we didn't spend a huge amount of time talking about trees,
and even when we did we used them for algorithmic reasons
like a binary search tree to search data pretty efficiently,
but if you think about it, here is the document, which I'm just
drawing with this shape here kind of arbitrarily
and it has one child like the entire page as I'm drawing it,
which is the HTML tag here.
The HTML tag has two children, so to speak, to borrow
our language from our data structures.
So head and body from left to right.
Head has a child called title and then title has a child of some sort,
even though it's just raw text.
It's not another tag with angled brackets,
just as body has its own content there, just hello, body.
So that hierarchy and the deliberate indentation, which is there just for us
humans-- the browser does not care about whitespace--
lends itself to an implementation in memory,
and so long story short, when your browser receives an envelope,
inside of which are not just those HTTP headers, outside of which
are not just the IP address and TCP port,
but inside of which is a text file containing HTML like that,
all the browser does is load that file into memory,
read it top to bottom, left to right and essentially build
a tree structure in memory so that it knows how to represent it
underneath the hood, so to speak.
And in fact, you've seen HTML all around you
even if you've just never looked underneath the hood, as we say.
In fact, if I go to like harvard.edu and let
the redirects happen in the usual way, let me go ahead and inspect the page.
This is another way in Chrome and in other browsers
to get at the developer tools.
You can control click or right click on the web page and choose Inspect.
That opens up the same tab.
Previously, we used the network panel, but if I click on Elements
you can actually see all of the HTML that composes Harvard's page,
and it looks beautiful here.
It's nicely color-coded.
It's prettily indented.
I can dive in deeper with all of these arrows,
but that's probably not how the humans made it
because if I also right click or control click and choose View Page Source,
and you can do this in any browser as well,
here is the mess that actually came back from Harvard's server.
This is HTML and my god, like, it's a lot.
I see no indentation, so style 0 here, but that's OK
because it's a browser reading it.
It's not a human in this case and similarly,
if we visit something like yale.edu, and let's go ahead and open up their page
source, it's similarly going to be kind of overwhelming and a lot of it,
but rest assured that even though these web pages might look really,
really sophisticated-- like, my god, we've never written a C program with
500 plus lines of code--
a lot of this stuff is generated, and in fact,
one of the challenges of pset7 and pset8 when we explore web programming
is going to be not to write hundreds of lines of HTML, which would just
get mind numbing quickly, but to write a few lines of Python
or a few lines of JavaScript that programmatically, like with loops,
generates all of the structure of your web page.
So if it's like a web page of photos like a Facebook photo album,
Facebook doesn't have people writing out thousands of lines of HTML code
every time you upload a photo.
They have code in PHP or some other language
that has a for loop that iterates over all of the photos you've uploaded
and spits out the same HTML but different image for each of the photos
you've uploaded, and that's where web programming comes into play.
You're not writing the HTML, you're generating it
by actually writing programs.
So today we set the stage for that capability
but first we just need a framework for actually doing this.
So rather than use, now, my local Mac, which
is kind of lame because I can open the web page but no one else in the world
can access it, and in fact, if we do that again, you'll
notice here, if I double click on hello.html and open the URL bar,
it's curiously clearly not on the internet.
Like, it's not http, it's not https, it's literally file://,
which just means it's a file on my local computer.
So none of you could reach that because of course
this user jharvard on my laptop exists only on my local Mac.
So fortunately we have a web-based IDE with which to put stuff
on the internet, but there's a catch.
The IDE itself, recall, is a web application, right?
It's code that friends at Amazon wrote and that we added to that runs
on a server somewhere and, as we'll see, somewhat in your browser too,
but more on that when we talk about JavaScript,
but CS50 IDE already has a URL like https://cs50.io or https://ide.cs50.io
slash whatever your username is.
So we're already using port 80 or maybe 443 for the IDE itself.
So how in the world could you write web pages in the IDE
and then serve them on the internet if the IDE itself
is already using the standard port?
Well fortunately you can write on the envelopes,
when trying to access your own web pages, a hardcoded TCP port number.
It doesn't have to be 80, it doesn't have to be 443.
Those are just the defaults.
If I want to actually visit pages in my IDE,
I can just run a web server on a different port number,
like 8,080 by convention or 8,081, 8,082.
Just a pretty big number that odds are no one else is using on some system.
So let's see this as follows.
Let me go ahead and in the IDE here create a new file.
I'm going to call it hello.html and I'm just
going to go into that text file, whoops, which I closed.
Let me go ahead and just grab the code that we've
been using here, which is right here, go back to the IDE,
paste it into the text file here, click Save, and now I have in the IDE
a file called hello.html, and indeed if I look at the file browser
and I look on the left-hand side, there, in addition to the sample code,
is hello.html, but if I double click this file it's not
very useful because it's going to open the editor, which
is not like a web page.
It's the source code for my web page.
So I actually now need to run a program that
serves this file just like Facebook does, just like Google and Harvard
and Yale do, and I'm going to do this literally by running http-server,
and I'm going to say on port 8080.
So -p in this particular program means port
and I'm just going to say, hey CS50 IDE, start a program called httpserver
whose purpose in life is to listen for requests on the internet,
but specifically on that port number, and serve up whatever requests come in.
So I've gone ahead and hit Enter here.
Starting up httpserver.
It tells me the long URL that this is available at.
Your URL will be a little different with your username
and if I open this now in another tab, it's a little cryptic at first glance.
I'm just seeing the index or contents of my directory and in there is like
a secret .c9 for Cloud9 directory.
Don't delete that or change that.
That just has metadata related to the IDE.
Source6 I downloaded earlier and you can too from the course's web site,
but there's hello.html, and on the left-hand side
here, you'll see some cryptic looking permissions.
This has to do with who can read and who can write your files,
but for today all I care about is that the file exists.
So now, like a user on the internet, I'm going to go to here, click on it,
and viola!
There is my actual web page.
So notice, the URLs are very similar.
Here I am on cs50.io and here I am on cs50.io
even though your user names will of course be different,
but the IDE is running on the default port, 443.
I'm now temporarily serving up my HTML files
using port 8080 just because and so that's
how a server can do multiple things and how you can do
multiple things on the server at once.
So let's do something else besides that.
Let me actually introduce a few other fundamentals that
might be handy when writing HTML and let's go ahead and do this.
Let me go ahead and create a new file and we'll call this one
paragraphs.html, and let me go ahead and just name this like paragraphs and down
here I'm going to have some paragraphs of text,
and I don't really know what I want to say so I'm going to Google some--
so standard Latin-like text.
Oh, I want like three paragraphs of Latin-like text and so here we go.
Then there's a random website that just generates
placeholder text in faux Latin.
So, Paste.
There are my three paragraphs.
I'll be a little nice and tidy and indent them
so it looks at least somewhat nicely styled.
Save the file and now let me go back to the URL I was at a moment ago.
Now notice I have two files being served by this HTTP server program.
Click paragraph-- oh.
OK, one, Chrome thinks the page is in Latin.
[STUDENTS LAUGH]
Actually, soccer inferior element estate planning time.
Tomorrow soss quiver before as the--
that does sound like the Latin I learned years ago.
All right, so Show Original.
So the point is not to focus on the Latin, but the apparent bug.
Like, what's it not doing that maybe you thought it should a second ago?
AUDIENCE: No indentation.
DAVID J. MALAN: Yeah, there's no indentation and also there's no what?
There's no break.
I mean this is one big Latin-like paragraph.
It's not three.
Well this is simply because a browser only does what you tell it to do.
Let me go ahead and shrink this window and, as an aside, what you're
seeing here, all this mess in the bottom terminal window,
as the httpserver program is running, it is logging all of the HTTP requests
that come in from browsers just so you can kind of debug or diagnose,
but we're going to just ignore that for now
and let this thing run down here in the background.
But if I want paragraphs I need to be a little more pedantic
and actually say, hey browser, make a paragraph with what's called the p tag,
and let me go ahead now and indent even though the indentation clearly
doesn't matter.
It's just to keep my code nice and tidy.
So, hey browser, start a paragraph.
Here's the text.
Hey browser, stop the paragraph.
Same thing here.
Let me go ahead and start a paragraph.
Then let me go ahead and stop the paragraph.
Notice the IDE is trying to be helpful.
This is not helpful.
This is not a password, but it's trying to autocomplete my thoughts.
That's fine.
I'm just going to ignore it.
Then let me go ahead and close the paragraph and save.
So it's a little more verbose, but anything in the tags the human is not
going to see, but when you reload the page, as with command or control+R,
or if you go up here by clicking the reload icon,
whatever it looks like in your browser.
Now I have three Latin-like paragraphs.
So it's a little more deliberate here.
So that's all fine and good, but the web is kind of more interesting
when you can actually link to things.
So let's actually do that instead.
Let me go ahead and create a new file called, let's say, link.html.
Go ahead and paste this here and say we'll name the title link.
Let me get rid of all of this just so I have some placeholder
and I can say something like "Hello, world!
My favorite school is..."
and just to play it safe today, "stanford.edu."
Save, reload, click link.html and nothing.
So here too it looks like a domain name and it certainly is, and frankly,
all of us now are probably conditioned in tools like Slack and Gmail and other
tools and Facebook that just kind of figure out that, oh,
if something looks like a domain name, make it a link,
but that's because someone at Facebook, someone at Google knows HTML and knows
how to use if conditions and elses and just says, oh,
if a string that the human has typed in looks like a domain name ending
in .edu, make it a link.
But how do you make it a link?
We can now do this manually.
It turns out you need an anchor tag abbreviated as a
and then I'm going to close the anchor tag at the end of the text
that I want to anchor a link to, but this isn't enough.
I need to be ever so explicit as to where I want this link to go,
and so it turns out HTML also supports what are called attributes.
So tags are the things in angled brackets.
Attributes are also inside those angled brackets,
but they come after the tag's name, and they just going
to modify the behavior of the tag, and it makes sense here
to need to modify the behavior because 20,
30 years ago when HTML was invented, we didn't make up
a tag that leads to stanford.edu.
We made up a more generic tag that anchors to some destination,
and so here I can now do www.stanford.edu, save the file,
and notice, this is like saying to the browser, hey browser,
here comes a link or hyperlink to Stanford's web site,
and then the end here it says hey browser, that's it for the link,
and thankfully it's not super verbose.
You don't have to repeat the attribute at the end.
You just repeat the tag's name, otherwise
you'd be typing the same thing again and again.
If I now go back here and reload the page as with command or control+R,
now it becomes the familiar and blue underlined link,
and if I click on that, notice first it's super small.
You can see where the link is actually going to lead,
and so if I click on this we'll see Stanford's website and voila.
So now we've visited their page as well, but there's an interesting side note
here, and if you want to kind of think about things called phishing attacks
or frankly, Harvard once in a while and Yale once in awhile
will email out warnings like "beware of this phishing attack."
P-H-I-S-H-I-N-G.
This is when people on the internet generally
send you emails or some kind of spam trying to trick you
into visiting a phony website to harvest your usernames, passwords, credit card
numbers and whatnot, and honestly, most of those phishing attacks
boil down to this 10-line example of HTML
because what's to stop me from saying something like "Hello, world!
Confirm your password at..."
and then we'll say like paypal.com and then
over here, I can change this to like davidsphishingsite.com,
which hopefully doesn't exist.
One year I went to badplace.com and--
anyhow, so--
[STUDENTS LAUGHING]
Here I've gone ahead and saved the file, reloaded, and the link is indeed blue,
but before I click on it, only the most estute of users
is going to even bother checking the bottom left hand
corner to see where they're about to be whisked away to
and even most of us in this room, myself included,
are not so paranoid that we're constantly
checking those kinds of things.
Odds are, if I get an email like this, oh
my god, my accounts been compromised.
I've got to go confirm my password for PayPal to protect my money.
You might very well just follow the link,
but of course it can go anywhere you want just via this very basic building
block, but this is just one way you can vet actually
what's going on underneath the hood, but of course the internet
is more interesting than just text alone.
Let me go ahead and open up an example that I whipped up
in advance here using image.html and we'll see another tag here.
So here is another opportunity to use an attribute
and one that's also not necessarily visible to the user.
So here's an image tag.
Humans years ago decided to be succint.
It's img > for image, just like it's just a > for anchor.
The source, src, of which is going to be that file, dan.jpeg,
which I downloaded in advance from the URL up above,
and in fact, this is gray in the cs50 IDE because it's syntax highlighting it
just like in C. This is what's a comment in HTML.
So if you want to make notes to yourself or to viewers,
some sentence or like a citation like this,
you can use an HTML comment by doing ! // // > and you can write anything
between those things-- for the most part-- that you want.
So just like in C do we have the //.
So here's the source of this image and this
is like an alternative explanation of it, alt. Why might this be compelling?
I want to show the image to a user.
Yeah?
AUDIENCE: Is it for like if they hover their mouse over it,
they can see what's happening.
DAVID J. MALAN: Yeah, so a couple of reasons.
If you hover over the image you can actually see some descriptive text.
So like Handsome Dan here, like Yale's mascot.
If the user has trouble seeing or is blind,
you might need a screen reader to actually tell you
what it is that's on the screen, and it's not
obvious from dan.jpeg what that could be,
but if you have this alternative text, a computer
can recite verbally Handsome Dan, which might then
jog the person's memory as to what it is that actually on the screen.
Or if you have a really slow internet connection,
sometimes you'll see a placeholder for an image
that just says what it is before the image actually downloads.
So being mindful of these kinds of things
will just make, ultimately, your websites more accessible,
and indeed if I go to this one now and go into my source6 directory
where we have even more examples at our disposal and go to Image 6,
here is their adorable Handsome Dan as of this past year.
So there's an image.
We can kind of do funky things now with nesting.
So this is not all that interesting because it doesn't go anywhere,
but I could just combine these ideas.
I could do a href = http://www.yale.edu or, because I don't want the user
to bother getting redirected, I could just proactively make
it secure because I know Yale supports that per earlier,
and I can nest these tags like this.
Now if I go here, reload, it still looks the same
but notice my cursor changes to like a pointer, and if indeed I click on that,
now the image leads to Yale's web site, but I skimmed over something.
One of these is not like the other.
What detail have I kind of not mentioned?
Yeah.
AUDIENCE: The image file closes within itself.
DAVID J. MALAN: Yeah, the image tag kind of closes in and of itself,
and so there are some of these anomalies within HTML
where there really isn't a notion of, like, start doing something
and then eventually stop doing something.
Like, an image is either there or it's not.
Like, you can't kind of put something in between it conceptually,
and so some of these tags in HTML are what are called empty.
Like, they should not have anything after the open tag
or before the close tag.
So if you wanted to be really sort of precise you could say this,
but you should not put anything where my cursor now
is because it would make no sense to try to put something inside of an image,
but this is just kind of lame to have this unnecessary verboseness.
So you can just put the slash in there and technically in HTML5 you
don't even need the slash in this case, but at least this way,
and I think for pedagogical purposes, doing it, even for empty tags,
makes sure and makes more clear visually, when and that your tags are
balanced.
So that's the only anomaly there and then
there's bunches of others which we can fly through really quickly here.
So if I go back to our examples here, I whipped up headings.html.
So if you want to do something like this if you're
writing like a book or a website that has like chapters and sections
and subsections and so forth, HTML lets you
easily format things as big and bold, slightly smaller and bold,
slightly smaller and bold, and so forth by using the h1 through h6 tags.
So if I go into headings, this is how I made this web page.
I simply have h1, h2, h3, h4 opened and closed and that's it.
So any time you're reading some kind of online text,
odds are they're using one or more of these tags to format the page.
If we look at another example in here, we have something like list.html.
Lists are not uncommon on the internet, you'll never believe number three,
and here's how you might do something with a bulleted list by just marking up
three words-- foo, bar and baz--
and the HTML for this, if I open up list.html,
simply looks a little more verbose in that we need a parent element so
to speak, borrowing our tree terminology,
but here we have an unordered list, or ul, each of which
has one or more list items, or li, each of which
open and close foo, bar and baz.
And if I really want it numbered, I can also do this.
I can change unordered list to ordered list, ol, reload and now the browser
figures out the numbering for me, which is nice if you have lots of data
and you don't want to deal with actually laying it out yourself.
Meanwhile, we can go one or two steps further before we actually
get to something functional.
Here is kind of the most complicated of all,
but it too just kind of tells the browser what to do.
So before we look at the result, this says, hey browser, here comes a table,
like tabular data.
Rows and columns like Excel or Google Spreadsheets.
Hey browser, here comes a table row, or tr.
Hey browser, within that row, here comes some table data, a.k.a.
a cell or column.
Here comes another cell.
Here comes another cell.
So that's one, two, three cells in a row.
Hey browser, here comes three more cells.
Hey browser, here comes three more cells.
Hey browser, here comes three more cells and if we actually
render this in the browser, you can see the layout of a sort of old school
phone pad on your phone.
It's not very pretty, it's not very well formatted,
but if we zoom in you really do see that it is lined up in rows and columns
as I sort of verbally implied, but this is all very kind of underwhelming.
Like, Google is cool because you can go to it
and you can actually search for cats and find lots of cats on the internet,
but how is it that this actually works?
So, aww, bad news today.
OK, so we'll just zoom in on this one.
OK, so let's try to focus on the pedagogy here--
of cats-- as follows.
Let me go ahead and focus on really the URL, which is kind of long and cryptic,
but let me just throw away honestly anything that kind of looks confusing
or I don't understand.
I have no idea what source means so I'm going to get rid of that.
I have no idea what the rest of this means.
I'm going to get rid of that and I'm going to try to distill-- granted,
with some foresight because I knew how Google works here--
I changed the URL to something much, much, much simpler.
Cats,f where it's www.google.com/search?q=cats.
It seems that, somehow or other, Google's behavior
is controlled by information that's conveyed in the URL,
and it's not just that I'm searching.
It's that I'm searching for cats.
So in fact, on a whim, I'm going to search for dogs instead and hit
Enter, and indeed a few things change.
We have all these dog images appear here on the right.
We have the text pre-populated up here and we
can search for any number of other things
here, like Harvard Yale prank 2004, Enter,
and there you have a Wikipedia article on the video we saw earlier.
So it seems that you can parameterize the behavior of Google
just by understanding how this URL works.
So here is kind of the path that's being requested,
the file or folder or whatever that is.
A question mark says, hey browser, or hey server,
rather, here come some HTTP parameters.
Some inputs from a human who's either filled out a form or apparently
is kind of hacking the URL bar here, and then the name of the parameter
comes next. q, meaning query, and this is what Larry and Sergey decided years
ago for their search box, an equals sign,
and then whatever it is the human typed in.
Now it got a little funky here quickly.
Now you see %20.
That is the web's way of encoding a space so
that it's not a physical space, it's all one contiguous string.
So it's just one contiguous string for the server to actually look at or read,
and so why is this useful?
Well it turns out I can leverage this information
and kind of implement my own Google pretty easily.
Let me go ahead and go into search.html, one of the other examples I whipped up,
and you'll see another tag all together.
Inside of the body of this page is an HTML form tag,
and the form tag takes a couple of attributes I know.
One is action, which is the URL to which you
want to send the form's information, and the other
is the method that you want to use.
Now it's a little inconsistently lowercased here just because,
but we did see that verb before.
Where?
Where did we see this verb?
This was like the somewhat arcane message that was going, supposedly,
inside one of these envelopes when we said GET in all caps /http1.1
and so forth.
So it seems that if you want, as the web developer,
to create an HTML form that has text boxes and maybe checkboxes and dropdown
menus and so forth that submits its information when the user clicks Enter
or a button to this address, and you want it to go inside of a virtual
envelope using that GET verb, you literally just say method=GET.
And then down here I seem to have two inputs, one of whose names
is q, the type of which is a text box, and the other of which
is a submit type, whatever that is, the value of which is search.
Now you would only know what these things mean by seeing them demoed
or looking at some online reference, but if we pull this up to see the results
we have a super simple--
and I'll zoom in--
very, very simple version of Google, right?
It don't even have the logo, but it does have, I claim, all of the functionality
because watch what happens if I type in, for instance, whoops, birds and click
Search.
Oh my god, I implemented Google with just like 15 lines of code,
but not really, right?
Like, I've implemented the front end of Google,
which I got to start Googling these things in advance
OK, uh, these are very sad stories.
[STUDENTS LAUGH AT MORBID NEWS HEADLINES]
DAVID J. MALAN: OK, so the point though is, the point-- look up, look up.
The point is that the URL is what I generated.
So using those HTML tags coupled with the human's cooperation
and actually clicking a button did I then
generate this URL, whisk the user away from the IDE
to google.com, where Google is handling the back end,
like all of the hard work, actually checking their database,
rendering the HTML, but I made the front end,
the user interface via which you can actually interact with Google's search
engine there.
And it boils down to just these basic heuristics,
but of course this is a pretty ugly search engine, right?
Black and white text box, a gray button and that's it.
Like, even Google, simple though it is, has a little bit of style and color
to it and things are centered and kind of spaced differently.
So there's an art to this ultimately and indeed
being a web designer in itself is a profession
and in fact, you'll find in industry that some people are
good at front end design.
Some people are bad at it.
I'm among the ones worse.
Like, my web pages look like that search box just a moment ago,
but some people really prefer the non-graphical stuff, the back-end,
the database stuff, and indeed one of the takeaways over the next few weeks
will be for you to figure out for yourselves if you like any of this
at all certainly, but also like what your preferences are.
And you might hear terms in industry these days
like front-end developer, back-end developer.
That just means do you work on what the user sees in their browser or app
or do you work on the back-end, the database stuff that's
really important and sometimes quite difficult,
but that the user doesn't interact with directly.
Or are you a full-stack developer, which means you just
do all of this, which all of you from CS50
are effectively, albeit after just one or so semesters of background.
So how do we start, though, to make things prettier?
Well it turns out that HTML, for the most part, is just a markup language.
It's for structuring a web page and semantically tagging things,
and by semantically tagging things I mean
like, hey browser, here's the head of my page
and that's a concept, semantically.
Hey browser, here's the body of my page, and that too
is a concept, semantically.
I didn't say anything about bold facing or font size or colors
or all this stuff that's important for a good user experience, or UX,
but that can be decoupled from HTML, and in fact,
one of the challenges as you learn HTML for the first time
is to try to make your way through various online resources and references
will sometimes combine these ideas.
So, again, today we'll focus not just on correctness, getting things to work,
but design as well.
So here, for instance, is a super simple web
page for someone named John Harvard that has
a header and a main part and a footer, and header is distinct from head.
It's sort of poorly named here.
Head of the web page is just the tab bar and other such things up top,
but semantically you might have a page with like three parts.
Like the header, like the title on the body of the page itself,
like the main part where the actual contents
are, and then a footer like a copyright symbol or something like that.
So this might be a general division of a page,
but notice I've styled it a little differently.
Let me go ahead and open this up in a browser as I did just a moment ago
and go to, sorry, I'm going back through my entire internet history here.
Let's go ahead and open this up just as we did before at this URL
so that we can go ahead and open up CSS0.html.
Notice that, oh, this is already marginally better than the pages
we've looked at before if only because it's centered, which is a step forward
from everything just being left.
The first line is a little bigger.
The second line is kind of medium and the bottom line is the smallest.
So there's a little bit of style here, but not all that much.
So how did I actually do this?
Well take a look at the code here.
I have added, now, a style attribute to several of my tags.
So the header, the main and the footer really
aren't styled in any specific way.
They're just a way of telling the browser this
is the important stuff for the title, this
is the important stuff for the main part,
this is the important stuff for the footer,
but the stylization or aesthetics come from this yellow text
here, thanks to the IDE syntax highlighting it,
and notice this text follows a different pattern.
Up until now, we've been using angled brackets and words
and equals signs and quotes.
Now, inside of those quotes, we also have another pattern
when you're using this second of two languages today, CSS.
fontsize:large is the stylization for this particular element's content.
Text align should be center.
These are two CSS properties.
CSS, cascading style sheets, and we'll see what that means in a moment,
but this is just how you configure the style of those elements,
and indeed that's why one is a little bigger and then a little smaller
and then even smaller because, notice, I did fontsize:large, fontsize:medium,
fontsize:small.
All right, but as we've often done, let's iteratively improve upon this.
Even if you've never seen HTML or CSS before,
there's some poor design manifest in this simple example.
What might you say seems wrong or seems a little copy paste-like?
Yeah.
AUDIENCE: They're all centered [INAUDIBLE]..
DAVID J. MALAN: Yeah, they're all centered
and I literally like copied and pasted that CSS property, its key value
pair, its name and value, again and again
and again, but remember the hierarchy of HTML
and the DOM, Document Object Model, the tree we drew a little bit ago.
All of these elements-- header, main, and footer--
have a parent element called what?
AUDIENCE: Body.
DAVID J. MALAN: Yeah, body.
So one level higher, which is indented this way
or in the tree is higher up in that family tree-like drawing, all of these
are children of body.
So why don't I just move or factor out text align center
into the elements above it?
And herein lies the cascading of CSS.
Cascading style sheets means that if you have a property up here,
it will cascade down to all of the children and descendants below it
and it means another thing, too.
You can even override these properties somehow,
but we'll see that before long.
So if I go ahead now and open up CSS1.html,
notice that I did exactly that improvement.
The code's a little tighter now.
It's fewer characters, easier to maintain
because now if I want to change it to left or right or center,
I change it one place, not three.
And so this is kind of consistent with some of our design takeaways from C
and indeed, if I visit this page, CSS1.html, it looks the same,
but it's better design underneath the hood.
But we can do a little better still.
If I open up CSS2.html, notice that I've done this.
I rather like this design now because it's even more succinct.
I'm not using the style attribute anymore.
I'm using a different attribute called class,
and class is kind of a way to define--
much like a struct in C lets you define your own data types, a class in CSS
allows you to define a name for a whole bunch of properties,
and so here I just said let's call this class large, medium, and small,
and I don't know what those mean, and frankly I
might be working with a friend who's much better at design
than I am so I'm going to let him or her actually define these meanings.
I'm just going to kind of tag things in this way semantically,
but if we scroll up in this file, you'll see that for now I have no such friend,
and so I implemented it myself, and here's, for the first time,
one other thing in the head of the page.
Up until now, we've just had the title, but it turns out
you can have a style tag.
Not just an attribute, but a style tag inside of which,
it's a little cryptic at first glance, but there's some pattern here, clearly.
You have all of those properties, but the new syntax here
is that if you want to define a word called centered,
you literally do a period and then the word centered.
If you want a word like large, you say .large.
So it's similar in spirit, though not quite the same as like typedef in C,
but you say .center, .large, .medium, .small.
You use our old friends curly braces, which we will only see in CSS,
and this just defines one or more properties
to be associated with that new keyword.
And so, if we scroll down here to the bottom,
you'll see that I centered the body.
I made large the head, medium the main, and small the footer,
and the result is going to be exactly the same.
Very underwhelming, but again, marginally better
design because now we are just one step away of really improving this.
If I do finally have that friend, it's not
going to be very easy to collaborate, ultimately,
if we're both working on the same file and moreover, it
seems unnecessary to introduce these semantics.
Like, why do I have to have tags like header and main and footer
and classes called large and medium and small and centered?
Like, why don't I leverage the names of these tags themselves?
And this is where HTML can be pretty powerful.
Notice I've simplified some of my CSS up top.
I've dropped the period, which was like typedef.
Like, give me something called large, give me something called medium.
Now I'm just saying literally a word, but those words are identical to what?
AUDIENCE: The tags.
DAVID J. MALAN: The tags themselves.
So preexisting tags, if I just mention them by name without a period,
which gives me a new name--
I just mention the body, the header, the main and footer,
and then, inside of the curly braces, define my properties,
now I can just stylize the actual tags as they exist in my page,
and this now looks like really readable, maintainable HTML.
There is no aesthetics associated with the markup language here,
but rather there's useful tag names that come with HTML--
you can't just make up your own tags.
They're in, sort of, the documentation, but now it's just much more readable,
and this might look different on my phone or your phone or your laptop,
but my friend who's good at stylization can figure out
how to style all of these things, and better yet, he or she doesn't even
need my file.
In the fifth example here, notice that's it for the page.
We've gotten rid of the big style tag and replaced it apparently with what?
AUDIENCE: Href, a link?
DAVID J. MALAN: Yeah, link href, which is a horrible, horrible name
because it's not like a link in the page and hyperreference
was already used for a link in a page, but this is what we're stuck with.
This just says, hey browser, include this CSS file
that is elsewhere on the server.
The name of this file is arbitrarily CSS4.css
because this is our fifth example here-- zero index.
The relationship of this file to this page
is that it's a style sheet, which is just a list of aesthetics or properties
that should characterize its layout and indeed, if I open up CSS4.css,
I just copied and pasted everything in there,
but this is nice now in principle, even though we're just
creating work for ourselves today, because now I
can share this file with someone else.
He or she can work on it on their own.
Then we can merge our work together because my work's in the HTML file.
Their work's in the CSS file.
Better still, if we're making a whole website that has a dozen pages or 100
pages, consider this.
Just like in a C header file, I can include bitmap.h
in all sorts of programs.
Similarly can I include CS4.css in all of my web pages.
So if I want to change the font size or the layout
or whatever in all of my website all at once, I change in one place,
not in every darn web page that might have been created by me or by someone
else, and so there's just that maintainability to it too,
but we can do even better than that because even the CSS we're
looking at here is only so good, and what's really nice
is if we go to bootstrap-- let Google tell me where to go.
We're safe.
OK, so Bootstrap is a library-- formerly from Twitter, now
a much larger community-- that's a whole bunch of CSS libraries.
So just as in C, we have code and functions that other people wrote.
So in the world of web development do we have
code that other people wrote and we use that for JavaScript and Python,
but even for aesthetics are there sites like Bootstrap
and other popular things that allow us to make our sites prettier
and build them more quickly without having to reinvent wheels.
So for instance, if I go down to let's say Content and I go to Typography
and skim through here, you'll indeed see like h1, h2 and h3,
but if you want things even bigger than that there's like a display heading.
There's this fancy version, which has a fancy display heading
with some faded secondary text.
So pretty marginal, but I don't have to figure out how to do that now myself.
If I want to actually have tables, I can do much prettier tables
than I did with my little old school phone pad a moment ago.
Like I can make things different colors.
I can shade the columns like this and in fact, you can do even fancier things.
If I go ahead and open up a web page and go
to our big board for speller.cs50.net, you'll
see that this is a pretty good looking table as tables go.
Certainly much better than the one before, but that's
because we're using the Bootstrap library,
and even more compelling than the aesthetics are
that suppose that you visit speller.cs50.net on your phone,
it starts to get pretty ugly once your window gets smaller,
but notice stuff can just disappear magically
when you're on a mobile device or, in this case,
simulating it by using just a smaller browser window.
So using CSS and the aesthetic power that it provides,
we can also dynamically change our files to just render differently
on different devices, and then lastly, let me open up, for instance,
this under Components.
This is where the really juicy stuff is.
If you want fancy alerts to yell at the user or say everything is OK,
you get nice little colored boxes like this.
The forms are much prettier.
I mean, already this looks much more like the web you and I use
and not the mess of a form that I created a moment ago
and long story short, just like in C it's
pretty easy to include these things in your own site, so can I do this.
Let me go ahead and open up form0.html, and this is literally
an approximation of the very first web application I made,
even before web application was a phrase, in 1997.
I had taken CS50 and CS51.
I hadn't learned web stuff at the time.
I just kind of taught it to myself and learned
from some friends and the first thing I did
was build an interactive website via which first years could register
for intramural sports because literally that year in 1996 it was paper-based.
You'd walk across the yard, open up Wigglesworth, one of the dorms,
slide a piece of paper-- old school-- under the door
and you were registered for a sport.
We could do better even in 1997, and so we did it with the web,
and so this form0 back in the day looked a little something ugly like this,
but there's a text box where you could type in your name
and then there's the dorm where you could select Matthew.
So I could actually do David Malan and Matthews and then click Register,
but we don't yet have the ability to make backbends yet.
So this form goes nowhere for today, but you at least
get these kinds of aesthetics, which are kind of 1997 aesthetics, literally.
But if we go into this other example, form1.html,
it looks pretty, pretty better now.
It's maybe a little big in retrospect, looking at the display font,
but all I've done is now use this Bootstrap library, and notice,
it's a little hard to see on the projector here,
but everything's kind of like nicely outlined.
There's like Mark Zuckerberg sample text there which
we can override by actually typing in our own email address here.
We have a prettier looking box, a prettier looking button, and that's
just because if we open up, as down here,
form1.html, notice that in addition to my HTML
down below and in addition to a couple of other things
that I've added to make things more mobile-friendly in particular,
I just added this.
I read the documentation on getbootstrap.com
and I went ahead and added Bootstrap's library to my own code
in order to have access to its actual features,
and then down here, it's a little overwhelming at first glance,
but I just followed the directions.
There's something called div in HTML for a division of the page.
It means give me this invisible rectangular region.
The class I associated with it is called form group.
I didn't make this word up.
This comes from Bootstrap.
I just did what they told me to do.
I then have a label, which makes things more accessible
and you can click in different places.
I have another class here but long story short,
I just read the documentation because I know what tags are,
I know what attributes are.
I know a little bit of CSS now and I know how HTTP works,
and so really I have enough building blocks in order to work on this myself.
So that then is CSS and there's one last detail I thought I'd show us here.
In all of these John Harvard examples, as in just a moment ago,
we had something like this at the very bottom.
This {} ampersand #169;.
What was that rendering as, if you notice, in the web page?
AUDIENCE: Copyright.
DAVID J. MALAN: Yeah, the copyright symbol.
There is, on my US keyboard, no copyright symbol.
So you need kind of a pattern of characters
with which to represent those in HTML.
So just like we have /n and other special escape characters in C,
you have what are called HTML entities in HTML that you would only know from
reading the documentation, but that's the copyright symbol,
but I thought it was rather timely to point that out because just yesterday
or this morning, Apple announced that with the very new version of iOS that
you can soon download, they added even more damn Emojis to the Emoji character
set.
So these are certainly in vogue these days
and not only do we see, now, a way to represent special characters that you
couldn't otherwise type using HTML, it turns out all this time
that Emojis are actually just characters, chars,
but they're not 8 bits.
Recall that C as we've been using it uses
ASCII, which uses only 7 or 8 bits total and Emojis, my god.
There's so many of them right now and we need more than 8 bits
to represent them, and thus was born something called Unicode.
Well, that is not why Unicode was invented,
but this is what Unicode is now being used for because these emojis are
simply like ASCII characters but multiple bytes, generally two bytes,
maybe three bytes, and in fact, if you go on unicode.org,
you can see that if the number in hex 1F600 represents the grinning face,
which happens to be implemented differently by different companies
on different devices, but if in closing here,
I open up this same file and I change this to 1F600 in hex, 1-F-6-0-0, save,
and I go back to my browser and I go back to CSS0,
now we have a very happy web page for you.
So that's it for today.
I'll stick around for questions and we'll see you next time.