Placeholder Image

Subtitles section Play video

  • [MUSIC PLAYING]

  • SPEAKER 1: Cryptography.

  • What is it, and why is it important?

  • We're going to answer those two questions in exactly that order.

  • Let's start with what cryptography is.

  • It's the art and science of obscuring, and ideally protecting, information.

  • Now it's an art and a science because there's math involved with it.

  • It's pretty straightforward to manipulate characters in some way

  • by adding some constant number to them or to change them

  • in some systematic manner.

  • But it's an art, because doing so in a way to defend against potential attacks

  • is not as easy as it might first appear.

  • There's a lot of guesswork and calculation

  • that needs to go into play to find a really strong cipher.

  • Cryptography gives us the opportunity to have

  • a basic level of security against an adversary who might

  • do bad things with the information.

  • We usually contrast, in cipher information,

  • with information that is presented in the clear, which

  • is to say there's no protection surrounding it at all.

  • And it's generally considered better to protect information using cryptography

  • than to have information just freely available out there.

  • Now a cipher, we're going to start by talking about cryptography

  • sort of through history.

  • We'll lead up to more modern forms of cryptography,

  • which are derived from more ancient forms of cryptography.

  • But a cipher is one of the most fundamental forms of cryptography.

  • And ciphers are algorithms.

  • And recall that an algorithm is just a step-by-step set of instructions

  • that we use to complete a task.

  • And in case, the task is to obscure or encipher information.

  • And ciphers can also be used in reverse to unobscure, or decipher,

  • that same information that was previously encoded or enciphered.

  • Now there are many different ciphers out there

  • that have varying levels of security potential.

  • Some of the more ancient ciphers that we're going to start with

  • should be [INAUDIBLE] be considered to have no security potential at all

  • considering how easy they are to crack.

  • But again, this leads into the more modern approach to cryptography,

  • which is much more secure than some of these basic ones.

  • And now let's start by imagining that we have possession of this device.

  • Now if you're looking at this device and it seems somewhat familiar to you,

  • it may be because you've recently seen the movie A Christmas Story,

  • where Ralphie, the character there, obtains

  • one of these, which is a little orphan Annie's secret society decoder pin.

  • And this decoder pin has a set of numbers going sequentially one

  • through 26 around the inner edge, and a set

  • of letters, which is not presented in any particular order,

  • around the outer edge.

  • And what would happen is the radio announcer would provide,

  • set your pins to some combination.

  • So line up one number with one letter.

  • And then it would read off some secret message

  • that, ostensibly, only individuals who possessed this pin,

  • or many of the duplicate versions of this pin that were distributed

  • to children around the country, could then decipher

  • by taking the numbers that were given over the radio

  • and transforming them back into letters so that it makes sense.

  • So if you can, if you zoom in on this image,

  • it might be a little difficult to see, but you

  • can see that the 3 corresponds to the letter L, and the 4 corresponds to an M

  • based on this particular setting of this decoder pin.

  • So this is one potential, what we would call a substitution cipher,

  • where we're changing, we're substituting a letter in this case for a number,

  • and that number will henceforth represent that letter

  • for the rest of this message.

  • But what is the problem with this cipher?

  • Or more generally, when we think about issues in computer science

  • where we have adversaries who are trying to penetrate some system,

  • or break a code, or break in, or hack into anything,

  • hack your password, we sometimes frame this in terms of asking the question,

  • what is the attack vector?

  • Where is the vulnerability that is potentially

  • part of this particular cipher?

  • And in this case, it's that anybody who has access to this pin

  • is able to break any cipher that is made with this pin.

  • And again, this pin was distributed pretty extensively in 1930s and 40s

  • to children who listened to this very popular radio program.

  • So these pins were in the hands of many people.

  • And anybody who had access to the pin would

  • be able to understand the message.

  • And so that is, how we might frame this attack vector,

  • is the key, in this case, the pin, which we will call a key for this purpose,

  • is just very prevalent.

  • It's pretty well known how to use this key and manipulate this key.

  • A lot of people have access to that key.

  • But that's just one example of a substitution cipher.

  • We have many different examples of substitution ciphers that we could use.

  • Let's just take another very simple, straightforward one,

  • which is imagine we have all of the letters of the alphabet

  • and we're just going to assign the ordinal position of that letter

  • as its cipher value.

  • So with the secret society pin, there was this sort of random element

  • to it, right?

  • The letters were being skipped.

  • There wasn't a rhyme or reason to them, although the numbers were sequential.

  • Here let's just line up both.

  • Let's use sequential letters and map them to their sequential numbers.

  • So A becomes 1, B becomes 2, and so on.

  • Both of these things are increasing linearly.

  • Now you may recall that as computer scientists,

  • we ordinarily start counting from zero rather than counting from one.

  • I'm counting from one here because this mapping of A to 1 and Z to 26

  • is much more familiar to us intuitively as humans,

  • and I want to keep us grounded in this discussion of cryptography right now.

  • But ordinarily, you might actually instead see this as 0 to 25, 0 being A,

  • through Z being 25 as opposed to 1 through 26.

  • But this cipher would work exactly the same

  • and has roughly the same security potential

  • as Annie's secret society cipher does.

  • And we can actually make this a little bit better because we are consistently

  • increasing the letters, A through Z, and consistently increasing

  • the numbers, 1 through 26.

  • We could also, instead of just doing this direct mapping,

  • we could rotate around.

  • We could start the 1 somewhere else as opposed to being A.

  • And now instead of having just one cipher where A maps to 1, B maps to 2,

  • we have a variety of different ciphers, depending

  • on where we decide we want to have our starting point.

  • So for example, we might instead add two to every number.

  • So instead of going from 1 to 26, we go from 3 to 28.

  • Now think about it.

  • If you're trying to break this cipher and you see patterns

  • like this with all these numbers in them, what might jump out at you?

  • Well, if you're used to seeing ciphers that are 1 through 26, for example,

  • something where you don't see any 1s or 2s

  • and suddenly you're seeing 27s and 28s potentially in the message that might

  • be long enough to have, in this case, Ys or Zs in it

  • might seem to you that this is slightly off.

  • Like this cipher must be shifted in some way.

  • Instead of being this straightforward line,

  • there's some modification that's been made to it.

  • That's kind of a tip off if you're trying to defend

  • against somebody figuring that out.

  • And so instead of going 27, 28 at the end,

  • we might instead wrap around the alphabet.

  • Once we have exhausted the 26 possible values that we started with,

  • the 26 letters of the alphabet, we might instead, once we have X is 26,

  • say, well, instead of Y being 27, Y is 1 and Z is 2.

  • And this is not a massive improvement on the security of this cipher.

  • Like I said, it's still quite fragile and quite easy to break.

  • But it doesn't give quite as much of a clue to a potential adversary

  • as to how to crack it, how to decipher the message.

  • And this can be done for any different value

  • to obtain any number of different ciphers.

  • Instead of going forward by two positions,

  • we could add 20 to every letter's value, again,

  • wrapping around the alphabet when we exhaust,

  • when we get to 26, instead of having 27, 28, we would just reset at 1

  • and continue on.

  • But we can also add 26 to it.

  • But that doesn't look very different than what we had before.

  • And that's where this cipher's vulnerability comes into play.

  • There's only 26 possible ways to rotate the alphabet

  • while keeping the order of the letters preserved, right?

  • Unless we start skipping A, D, G, and then,

  • you know, rearranging the other letters in some other way.

  • If we want to keep everything straightforward in a line,

  • again, wrapping around 26 when necessary, there's

  • only 26 ways to do it.

  • That is to say that shifting the alphabet forward by 26

  • is exactly the same as shifting the alphabet forward by 0.

  • And so that's our limitation.

  • We have a very small number of, again, this word keys that can

  • be used to decipher using this cipher.

  • Now this is an example of something called a rotational cipher,

  • and it's actually a rather famous rotational cipher

  • known as the Caesar Cipher.

  • It's attributed to Julius Caesar and was apparently used

  • more than two millennia ago for him to encode messages to his troops

  • on the line.

  • And at the time, this was revolutionary.

  • And generally what you're going to find with cryptography

  • is there's just this pattern of breaking the mold and doing something new

  • and trying to stay one step ahead.

  • And oftentimes, other people will then catch up.

  • And this cipher, which was once, you know,

  • lauded as being a wonderful cipher, is no longer as strong

  • as it once was thought to be.

  • And so we keep having to advance and improve and get ahead

  • of it for whatever kind of adversary that is, whether that's

  • a potential enemy on the battle line, as might have been the case with Julius

  • Caesar, or whether that's a hacker who's trying to break into your system

  • as might be the case today.

  • And fortunately, again, we're not using Caesar Cipher today

  • to uncipher any of our information.

  • We're using much more modern techniques.

  • But these modern techniques evolved from seeing

  • codes being created, ciphers being created and broken,

  • and then having to be created anew to try

  • and defend against new vulnerabilities that have been exposed.

  • So like I said previously, very easy to decipher or to crack the Caesar Cipher,

  • but at the time, very, very difficult.

  • The limitation, again, limited number of keys.

  • There's only 26 ways to rotate the alphabet for it to make sense.

  • In the English alphabet, of course.

  • If you're using a different alphabet, you're

  • number of keys might be different if you're

  • using the same rotational approach.

  • But the fundamental limitation is you are

  • confined by how many letters are in your alphabet

  • that you're using to encipher information.

  • So let's take things one step further.

  • What is an improvement that we might be able to make to Caesar?

  • That would lead us to this idea potentially of the Vigenere Cipher.

  • So Caesar had this limitation of there's one key

  • and there's only 26 possible values for that key.

  • What Vigenere Cipher does is it, instead of using a single key,

  • uses multiple keys.

  • Instead of picking a number to shift by, we're

  • instead going to define a keyword.

  • And we're going to use the letters of that keyword in sequence as we

  • go to change what our key is at any given

  • time, such that our enciphered message, instead of being enciphered using one

  • key, might use three keys or five keys or 10 keys,

  • depending on the length of the keyword that we use,

  • if that keyword is three or five or 10 letters long.

  • So this keyword becomes the interesting twist

  • that made Caesar much more challenging for an adversary

  • to crack by using different keys.

  • Now let's walk through an example of how the Vigenere Cipher works

  • because I think it makes more sense to see this visually rather than just

  • discussing it verbally.

  • So what we want to do here is encrypt the message HELLO

  • using the keyword LAW.

  • So here our message HELLO is what also might be called plain text.

  • It is in the clear.

  • It is not enciphered.

  • It is not hidden against any adversary.

  • And our key is LAW.

  • All right, so let's take a look at how we might do this.

  • So it oftentimes helps, especially when trying to encipher or decipher

  • using the Vigenere Cipher, to consider all

  • of the inputs that go into determining the final outputted character.

  • So we're going to take a look at plain text,

  • and we're going to convert it, just like we did

  • with Caesar, to its ordinal position.

  • We're going to see where in the alphabet that is.

  • Is it the first letter?

  • Then it's 1.

  • If it's the last letter, it's 26.

  • And so on.

  • We're going to do the exact same thing with each letter of our keyword.

  • So we're going to take a look at the keyword,

  • figure out what that letter's numerical correspondence would be.

  • We're going to then add those two things together.

  • If we go over 26, just as we did with the Caesar Cipher,

  • we're going to wrap back around such that we're confining ourselves

  • to that range of 1 through 26.

  • And then we're going to take that number and transform it into a letter.

  • So for example, if the result there is 2, we're going to change that into a B.

  • And the reason for that is that B is the second letter of the alphabet.

  • So let's walk through this with HELLO as our plain text and LAW as our key.

  • So the first letter of our plain text is H, and the ordinal position

  • of that H is 8.

  • It is the eighth letter of the alphabet.

  • We do the same thing with the first L for LAW, the first letter of LAW.

  • L is 12, it's the 12th letter of the alphabet.

  • So our next step is to add those two values, eight and 12 together.

  • We get 20.

  • We don't need to wrap around.

  • We didn't go over 26, so we're still OK.

  • And the 20th letter of the alphabet is T. So the first step of this

  • is enciphering process with HELLO, using the Vigenere Cipher, using the key LAW,

  • is to turn the H into a T.

  • So we can do this again, we can take a look

  • at the E, the second letter of our plain text.

  • We use the second letter of our keyword now.

  • So we're not using the same key.

  • We're not using 12 over and over and over.

  • We're using a different key.

  • We're now using the A, the second letter of our keyword,

  • whose ordinal position is 1.

  • So 5 plus 1 is 6, and that results in F.

  • Next, we use the first L of HELLO, and the W of LAW.

  • So L is 12, W is the 23rd letter of the alphabet, we add those together,

  • we're at 35.

  • 35 is not a legal value in terms of this cipher.

  • We are confined to 1 through 26.

  • And so we just subtract 26 and we get down to 9, and now we have I.

  • So now what do we do?

  • We've exhausted our keyword, but we still

  • have plain text that we need to encipher.

  • Well, as you might expect, the logical thing to do

  • is just go back to the beginning of the keyword and continue on.

  • And so we will.

  • So we'll use the L, the second L of our plain text, and the first L--

  • because we've now exhausted all of those letters,

  • we have to go back to the beginning--

  • the L for LAW.

  • 12 plus 12 is 24.

  • 24, the 24th letter of the alphabet is X.

  • And we do that finally as well for the O, advancing it one position, because

  • of the A in LAW, to 16, and that is P.

  • So ultimately, HELLO in this case becomes

  • this random set of characters, TFIXP.

  • And some advantages might also immediately jump out at you.

  • With the Caesar Cipher, anytime we changed a letter,

  • it always was that same letter every time we

  • saw it in the enciphered message.

  • So if we had a B and we were advancing everything by two characters,

  • every B in the original message would always

  • be a D because D comes two letters after B.

  • So again, if our Caesar Cipher key is two, every time we see a B,

  • it becomes a D, every time we have an A, it becomes a C, always.

  • Here with the Vigenere Cipher, because we have different keys

  • and we're rotating these keys differently,

  • depending on which letter of the keyword we are

  • and which letter of the plain text we are,

  • those two Ls are not the same, right?

  • Instead of H-E-L-L-O, we don't have some mapping.

  • Those two Ls are I and X. They are not the same character.

  • And so already we're seeing a bit more security here

  • because there's not this potential to guess.

  • Caesar is also much more secure when you consider

  • how many keys are available to you.

  • With the Caesar Cipher we had 26 keys available to us.

  • With the Vigenere Cipher we have 26 to the n keys,

  • where n is the length of our keyword.

  • So for example, if we're using a two letter long keyword,

  • for example, AA or AB or all the way up, that leaves us with 26 squared,

  • or 676 possibilities.

  • Now if we extend to three letter keywords or four letter keywords,

  • we're getting even more and more possibilities.

  • And as we start to increase the number of possibilities,

  • we start to really increase the difficulty for some adversary

  • to figure out what the key is.

  • And that's really the goal of cryptography, right?

  • We want to be able to protect information

  • and we want to defend that information from being determined by other people.

  • So the more work we put into making more challenging keys, the more likely

  • we are to be successful in our attempt to encipher information.

  • So again, Vigenere much more of a secure cipher.

  • It's still not secure and it's definitely

  • not a cipher that is used today.

  • There are computer programs that are capable of figuring out how to decipher

  • using the Vigenere Cipher pretty well.

  • But it's more secure than Caesar for sure

  • because of its changing alphabets and its much larger number of keys.

  • Let's go back to this decoder pin and think about another potential problem

  • that we have.

  • Now assume that your adversary is actually

  • not a member of Annie's secret society.

  • They don't have this pin.

  • So that's already a step up.

  • We previously had assumed that anybody who had the pin could crack it,

  • and that's still true.

  • But let's assume your adversary, lucky you, doesn't have this pin.

  • Is there still a way that they would be able to crack the code without the pin?

  • Think about it for a second.

  • Think about what our characteristics of the English language

  • are that might suggest people figure out what this cipher is.

  • Think about some unique features of the English language,

  • which is one letter words, like I and A, which might appear in the message.

  • If you see a single letter word in a message,

  • you're probably going to guess that it's either the letter I,

  • and every time I see that character or that number I'm going to assume it's

  • an I, or you're going to assume it's an A

  • and you're going to try and plug in an A everywhere.

  • And some trial and error might reveal some patterns that emerge.

  • And there is a very prevalent pattern in the English language,

  • which is that letters appear with a pretty regular frequency.

  • Given any arbitrary text in the English language,

  • it's pretty likely that the distribution of letters within that text

  • is going to follow this pattern roughly 13% of the time, give or take.

  • Any arbitrary letter selected from a text

  • is going to be the letter E. And only 1/10 of 1% of the time will it be a Z.

  • And only 2/10 might it be a J. So there are some letters that

  • appear very frequently and there are other letters that

  • appear very infrequently.

  • And that is still a problem in this generic substitution cipher,

  • even with the letters being scrambled, which seems at first

  • blush to perhaps be much more secure than one where

  • the letters are increasing sequentially and the numbers

  • are increasing sequentially.

  • Even this scattershot mapping of letters to numbers,

  • as long as we're still confined to these two domains

  • where we have A through Z and 1 through 26

  • and there's always a mapping between them,

  • whether they're ordered or not ordered, is still

  • a problem, in the English language anyway, because of frequency analysis.

  • These are actually very common puzzles.

  • Humans might find it kind of tedious to try and solve these puzzles,

  • but otherwise, this is well known as a cryptogram.

  • You may, if you are the puzzling type, this type of puzzle

  • is called a cryptogram.

  • And this pattern is definitely something that is across all messages

  • that appear in the English language.

  • There are plenty of other ciphers that appear, that are used,

  • that are more secure than any of these what

  • we might call one-to-one ciphers, mapping

  • a single character to a different character or to a number.

  • There are some ciphers that substitute pairs or triples of characters

  • at a time.

  • And these ciphers, again, form the basis for what

  • eventually becomes more modern cryptography, which

  • we're getting to in just a moment.

  • There are also transposition ciphers, where

  • instead of substituting one character for another,

  • we simply use an algorithm to rearrange all the letters in some systematic way.

  • And the defect there is that all the letters of our original plain text

  • message are still there and all we need to do is unscramble them.

  • And because there's an algorithm that was

  • used to scramble them in the first place,

  • there's got to be a way to undo it as well.

  • With a little bit of trial and error, we can probably sort that out.

  • Finally, the most egregious issue with these classical ciphers

  • is, how do you distribute the key?

  • How do you tell someone who you want to share information with?

  • How do you tell your ally what the key is for the cipher

  • that you are going to use?

  • You can't encrypt it because if you encrypt the key,

  • how will they know what the real key is?

  • If you say, if you send them a message and they

  • don't know how to interpret it, or they see it and they interpret it

  • as something else, that's not going to be helpful to you.

  • You want them to see the key in the plain text.

  • You want them to see the key in the clear, rather.

  • You want them to just have it.

  • You don't want to encrypt that as you hand it to them.

  • That doesn't do them any good.

  • But if you're giving the key to your ally

  • and your adversary is within earshot, or they

  • have access to that same piece of paper because your ally carelessly throws it

  • away and they can just pick it up, now all

  • of a sudden all of your messages using basic ciphers are fairly insecure.

  • But let's take a step forward in modern cryptography.

  • Perhaps you've seen a screen that looks like this at some point

  • when you're trying to log in to some system.

  • Enter your email and we'll email you a link to change your password.

  • Well, why don't you just email me my password?

  • Like you're going to give me a link to change it,

  • you must know it if I use my credentials to log in

  • to your service any given day.

  • But OK, I guess, sure.

  • The reason for this is actually a reason of security.

  • So let's distinguish ciphers, which we've been talking about, from hashes.

  • So one of the most critical distinctions is

  • that ciphers are generally reversible.

  • You can undo what you did.

  • That's the whole reason why it's important to share with your allies

  • the key.

  • But hashes are generally not reversible.

  • Or certainly, they're not supposed to be reversible.

  • And so it turns out, and we'll learn about this a little bit

  • later, when you log in to some service, if that service

  • is doing a good job of protecting your data,

  • the reason they can't just send you your password is because they actually

  • don't know your password.

  • And that might seem strange because clearly, there

  • must be something-- if I type in my password then I get logged in.

  • But a good service is one that does not store your password in the database.

  • That's probably a good thing if you think about it.

  • In case there was ever a data breach, you

  • wouldn't want your password to be in their database.

  • Instead what they do is they store a hash of your password in the database.

  • And then when you provide your password to them,

  • they run that hash through the same things,

  • called a hash function, which is just a generic idea for a function that

  • takes any arbitrarily large amount of data and maps it to some other range

  • or some other set of values.

  • Now that might be a arbitrarily long string of information.

  • It might be some fixed string where if I run my password through this,

  • I'm going to get back something that is always 20 characters long.

  • But it looks nothing like my original password.

  • I've just made some weird manipulations to it.

  • And that's what happens in log-in systems more generally

  • is you will log in to some service, you'll

  • type in your password, when that information is then submitted

  • to the organization to check your log-in credentials,

  • they will run your password through that same hash function again.

  • And if that value matches what they have in their database for you,

  • that is how they know that you have provided the correct credentials.

  • They're mapping-- they're matching some mapping of your password to the one

  • that they have stored, but they're not actually checking your actual password.

  • And that should probably give you some sense of security.

  • And if you ever use a service where you end up having to click on that link

  • and they actually send you your password,

  • you probably don't want to use that service anymore

  • because they're not taking strong enough precautions to protect your data.

  • So as I said, once we have a password stored in the database,

  • it is actually stored as a hash rather than as the password itself.

  • The service should not be able to tell you what your password really is.

  • So this idea of a hash function-- what is it?

  • Well, as I said, it's something that takes any arbitrary data--

  • and eventually we'll get into hashing things like files and not just words

  • or strings, but for now let's keep it to strings, strings

  • being a sequence of characters or letters, like a word

  • or a phrase or a sentence--

  • and mapping it to some other range.

  • So we'll start out by just mapping a string, a set of letters, to a number.

  • But it could be to a different string, a string that's

  • always 10 characters long, and so on.

  • So there are some properties that good hash functions have.

  • Let's take a look at what some of these are.

  • So they should use only the data being hashed.

  • There shouldn't be anything else that comes into play.

  • They shouldn't be bringing in any outside information.

  • It should rely exclusively on whatever data is

  • being passed in to the hash function.

  • They should also use all of the data being hashed.

  • It becomes a bit less effective if every time I provide a word or a string

  • to my hash function, I'm only using the first letter of that string,

  • such that my hash function for every word

  • or every string I provide that starts with A

  • is going to return the same value.

  • That's not terribly useful to me.

  • I want to get a better distribution of values.

  • Your hash function should be deterministic.

  • And when we say deterministic, we mean no random elements to it.

  • Oftentimes we think that random numbers are nice to jumble things up.

  • But the problem is we want our hash function to always output

  • the same value for the same inputs.

  • So if I give you my password and hash it and I get

  • some output, every time I provide my password

  • and run it through that same hash function,

  • I want to get the same output every time.

  • And that's what sites rely on when they're using hashed passwords as part

  • of the credentialing check.

  • They're relying on the fact that they will always get

  • the same output given the same input.

  • So that's a requirement of a hash function.

  • Hash functions should uniformly distribute data.

  • So oftentimes you're mapping these strings,

  • let's say, to some set of values.

  • Those could be numbers, again, those could be strings.

  • You want to spread those out evenly, ideally,

  • across all of the possible values that you have.

  • You don't want everything to hash to 15 if your range is 0 to 100.

  • You'd ideally like everything to be spread out such

  • that there's an equal number of 0s, 1s, 99s,

  • and so on, as we talked about a little bit when we discussed hash tables.

  • Finally, we also want to be able to generate very different hash codes,

  • very different values for very similar data.

  • For example, LAW and LAWS should hash two very different values.

  • That would be ideal if a tiny bit of variation

  • created a really dramatic ripple effect.

  • And creating this really dramatic ripple effect

  • is pretty key when we're talking about cryptographic hash functions, which

  • we'll get to in a second, which form the basis of almost

  • all modern cryptography, which form the basis of everything

  • that we do that we rely on when we think of security in the computational field,

  • it's almost always relying on these hash functions being really, really

  • good at making small changes have very dramatic ripple effects

  • in the hash code or the hash value, the data that comes out

  • of the hash function.

  • So after all this talk about good hash functions,

  • let's take a look at a pretty bad hash function.

  • And we'll talk about why.

  • We'll talk about one of its virtues, but some of its potential problems as well.

  • So instead, let's add up all of the ordinal positions

  • of all the letters in the hash string.

  • So this ordinal position idea is exactly the same

  • as we had a moment ago when we were talking about Caesar and Vigenere.

  • So A is 1, B is 2, and so on.

  • So for example, for a word like STAR, if we want to add up the ordinal positions

  • of all of the letters in that word, we have S-T-A-R.

  • That's 19 plus 20 plus 1 plus 18.

  • So if you do that math quickly, that ends up being 58.

  • So what is a good thing about this hash function?

  • Well, it's not reversible.

  • If I get a 58, I don't necessarily know that the input that I had there

  • was STAR.

  • It could have been any one of a whole variety of things.

  • It could have been ARTS or RATS or SWAP or PAWS

  • or WASP or MULL or this whole random set of 29 Bs in a row.

  • All of these things, when run through this really terrible hash

  • function that I've defined here, all add up to 58

  • when I follow the rules of this algorithm.

  • So I never know what my input was given my output.

  • That is a good thing.

  • That is what a hash function should do.

  • Hash functions, unlike ciphers, should not be reversible.

  • But the problem that I have here is that I have a lot of collisions, right?

  • There are a lot of different things that map to 58.

  • And when we talked about collisions a little bit previously,

  • we were talking about them in the context of a hash table.

  • And collisions were OK in that context.

  • We were just clustering things together.

  • If they all happened to have the same hash value,

  • we'll just put them in the same bucket.

  • When we're talking about cryptography though,

  • when we start to get into relying on cryptography to keep our data secure,

  • we can't have collisions at all.

  • In fact, pretty much we rely on the fact that it is so mathematically

  • unlikely, neigh impossible to have a collision in order for these things

  • to work.

  • And so collisions, when we're talking about cryptographic hash functions,

  • are definitely not a good thing.

  • So to recap, to check that a user gave us the correct password, if we're

  • storing a hash of the password in the database versus just storing

  • the plain text password in the database, which hopefully no one is storing

  • a plain text password in the database, we

  • run the actual password, the real password through the hash function.

  • We get a hash value as an output, some string or some number

  • or what have you as the output.

  • And if we get a match, odds are they entered the right password.

  • Now I'm saying odds are because we can't be 100% sure.

  • And we can never be 100% sure.

  • We can be really, really, really sure, but there's always

  • a chance of a collision.

  • Even with the best designed hash functions, even

  • with the best designed cryptographic hash functions,

  • there's always a chance of a collision.

  • But ideally, that chance is quite infinitesimal.

  • Very, very, very, very, very, very unlikely.

  • So odds are if we get this hash, comes out of this hash function,

  • it's quite likely, like 99.9% plus likely

  • that they entered the correct password, this is, in fact,

  • the user whose credentials are being verified, and we should log them in.

  • Modern cryptography is just hashing.

  • It's just hashing that's quite a bit more clever, certainly than the example

  • that I just talked about a moment ago.

  • Also, these algorithms tend not to work on a character by character basis.

  • It's the algorithm that I just did as well where

  • I was adding up every single letter.

  • I was looking at each one individually.

  • They tend to take, these modern algorithms

  • tend to take clusters of letters, pairs or triples or so on at a time,

  • maybe do even more things.

  • They might rearrange the letters before they do things to them.

  • So there's multiple layers going on with these encryption algorithms.

  • And unlike some of the ones I've discussed earlier,

  • most of these also have the property where given data of arbitrary size--

  • and now we're starting to really expand our minds into not just words

  • or strings, but also images, files, videos, documents, PDFs,

  • and so on; anything can be run through a hash function to get a value--

  • but we're always going to get a string of bits, a bit string, that

  • is always exactly the same size.

  • So depending on the algorithm, maybe it's

  • going to be a 160-bit long string, or a 256-bit long string.

  • But our range is finite.

  • It's always going to be exactly 256 bits.

  • But the combination of those bits will be different, ideally,

  • for every single piece of data we might throw at it, no matter what.

  • OK, so let's expand our definition of a hash function

  • to this idea of a cryptographic hash function.

  • What properties should they have?

  • They should be very difficult, very, very difficult, basically impossible

  • to reverse.

  • It should be computationally impossible for anybody to undo the encryption.

  • That's pretty much the same as a regular hash function.

  • We're just really hammering the point home when we say this here.

  • They should still be deterministic.

  • We don't want any random elements to it.

  • We still want to a hash a value and always

  • get the same output no matter what if we run that same value through the hash

  • function an arbitrary number of times.

  • They should still generate very different hash codes

  • for very similar data.

  • We still want things to be spread out and we want

  • minor changes to have dramatic effect.

  • And they should never--

  • and this is one of those words that computer scientists love--

  • they should never allow two different sets of data to hash to the same value.

  • Do you see a potential problem when we frame it in this way?

  • When we say they should never be able to do that?

  • We've already restricted ourselves to a finite domain, right?

  • I said a moment ago, maybe this hash function maps to 160-bit long strings.

  • There's only so many combinations of 160 bits.

  • Now that might be an unfathomably large number, but using the word never

  • there becomes a bit dangerous.

  • We can't really rely on that.

  • And we'll see why this could potentially be a problem.

  • This static length string, by the way, is usually referred to as a digest

  • in this context.

  • When we start to talk about more modern cryptography techniques,

  • the output of a cryptographic hash function

  • is usually referred to as a digest.

  • Let's take a look at one of these cryptographic hash functions.

  • And certainly I'm not going to dive into the mathematics of it.

  • I wouldn't be able to explain the mathematics.

  • I wouldn't be able to do it justice if I tried to explain the mathematics of it.

  • But let's just take a look at some of the basics of this.

  • So SHA-1.

  • SHA-1 is quite a famous algorithm.

  • It was designed by the National Security Agency in the mid-1990s.

  • So these are really smart people who are tasked with working

  • with things like military intelligence.

  • These are people who are dedicating their lives to trying to protect data

  • as best as they possibly can.

  • Far more brilliant minds than I, for sure.

  • And this hash function-- and this is a published paper.

  • Hash functions tend to be, actually it's this very strange dichotomy where

  • you describe exactly how the function works,

  • but it still should be irreversible.

  • And this just really becomes a question of incredibly complicated mathematics

  • involved, such that even if you knew so many of the pieces going in,

  • you still might not-- you still wouldn't be able to undo it, even if you tried.

  • It's kind of amazing actually.

  • SHA-1's digests are always 160 bits in length.

  • So this is one of those ones I just said a moment ago.

  • That means that there are 2 to the 160 different SHA-1 digests, which

  • is a bit over 10 to the 48th power.

  • And again, 2 to 160 means for every single one of the 160 bits,

  • that could be a 0 or a 1.

  • So we have that, two options times two options times two options, 160 times.

  • Just to try and make it fathomable, to understand how large this number is,

  • let me try and paint a picture for you.

  • So imagine that you are looking on Earth for a specific grain of sand.

  • You're looking for one specific grain of sand on Earth.

  • That is easier by far than trying to have SHA-1 have a collision where

  • two values would map to the same thing.

  • There's about 10 to the 18 grains of sand on Earth.

  • So that's eight quintillion--

  • I had to look up that word--

  • eight quintillion grains of sand.

  • So way easier to find the grain of sand on Earth

  • than it is to have a collision.

  • In fact, we go even further and say that imagine

  • that every single one of those grains of sand

  • was another planet Earth, each of which also had sand on it.

  • So you have eight quintillion planet Earths.

  • You're trying to find a specific grain of sand

  • on one of those eight quintillion planets.

  • It's still easier than trying to have a collision with SHA-1.

  • SHA-1 is such an important algorithm that it's actually

  • one of the algorithms that is required in federal regulations

  • to be used by the government for encrypting information.

  • There are others as well, but SHA-1 is listed

  • by the National Institute for Science and Technology as a standard algorithm.

  • But there's a problem, which is that SHA-1 is broken.

  • And it has this clever website called SHAttered, shattered.io.

  • So the research team that figured out how to create a collision intentionally

  • create a collision.

  • And intentionally creating collision has the effect of basically saying,

  • this cryptographic hash function is broken.

  • And they have proven that there is a way that they can systematically

  • generate collisions.

  • So that's bad.

  • And we'll see why that's bad in just a moment.

  • But you can go to this URL, shattered.io,

  • and read quite a bit about how the researchers do it.

  • They explain it in different levels.

  • So if you really want to dive into the technology and the mathematics of it,

  • you're certainly welcome to.

  • If you just want to understand it at a base level and why this is a problem,

  • I definitely encourage you to take a look at this site

  • and read more about this.

  • So what did these researchers do?

  • So they said, It is now practically possible

  • to craft two colliding PDF files and obtain a SHA-1 digital signature

  • on the first PDF file, which can also be abused

  • as a valid signature on the second PDF file.

  • In short, what they're basically saying here

  • is we were able to create two PDF files such

  • that if I run them through the SHA-1 algorithm, the digest that I get

  • is the same.

  • Why is this potentially bad?

  • For example, by crafting the two colliding PDF files

  • as two rental agreements with different rent,

  • it is possible to trick someone to create

  • a valid signature for a high-rent contract

  • by having him or her sign a low-rent contract.

  • If you can take a PDF and twist it into anything

  • you want it to be, but have a valid signature,

  • a valid SHA hash associated with it, that's not great.

  • Now before alarm bells start going off because SHA-1 is still

  • use quite extensively, even now, this SHAttered research result

  • was developed in 2017 it was released, but SHA-1 is still

  • being used now, even then.

  • Before you panic though, it has not been broken that many times,

  • although they did very--

  • they worked for two years to create this PDF collision.

  • And they demonstrated a method for how to do it.

  • It has still not happened that many times.

  • Cryptographic hash functions, once they've demonstrated one collision,

  • are broken.

  • That is certainly true.

  • But the actual effects of this have not yet really materialized.

  • The computational power required to create this

  • is well beyond the capabilities of most people, or most syndicates even.

  • So no cause for alarm yet.

  • But it does show that there is a limitation with SHA-1,

  • and we still want to always be staying one step ahead.

  • Just like when Julius Caesar's enemies figured out

  • how to crack the Caesar Cipher, the goal was, we need to get one step ahead.

  • As technologists, we always want to stay one step ahead

  • to make sure that we are doing our best job protecting our data.

  • And as lawyers, we want to make sure we're

  • doing our best job protecting our clients' data

  • against potential adversarial attacks.

  • So as I mentioned, there are other standards

  • that are in use by other organizations, including the federal government.

  • SHA-1, as I mentioned, is just one of a few different options that they use.

  • SHA-2 and SHA-3 are much more robust algorithms.

  • They use more bits, basically, in their digest.

  • So instead of being 160 bits, you can have anywhere between 220

  • and 500 or so bits.

  • So way larger of a domain, even reducing the likelihood

  • of a collision that much more.

  • Again, imagine how unlikely it was with 2 to the 160.

  • Now we make it even more so.

  • 500 bits, that's unfathomably large and difficult to duplicate.

  • MD5 and MD6 are other cryptographic hash functions, or hash functions

  • that you may encounter.

  • MD5 in particular I've highlighted here in yellow because it's not actually

  • considered secure anymore, but it's still very, very

  • commonly used as a checksum.

  • Basically, what we do is we run a file through MD5.

  • And say we're a distributor of a file and we

  • want people to come download our source, and they

  • want to be able to trust our source, we might run our file through MD5 and say,

  • if you run this file through MD5, the hash will be blah blah blah blah blah.

  • And other people can then download the file and run it through MD5.

  • It's usually a program that is available on computers for people

  • to just run any arbitrary data through to get a hash result.

  • And they can check, OK, the hash value that I received from this trusted

  • source matches the hash value that I was told I would receive,

  • and so I will trust this.

  • Versus perhaps getting that same software

  • versus some corner of the internet that you don't really trust.

  • If you find the MD5 hash of the trusted source

  • does not match what you downloaded and what you thought was that same file,

  • it's probably a sign that something has changed in it

  • and you don't really want to--

  • you might want to be skeptical about trusting that file rather than just

  • diving right into it.

  • So what do we do that relies on cryptography on the internet today?

  • Or you know, just using our computers every day.

  • Email.

  • Email relies pretty extensively on cryptography,

  • particularly when we start to use secure email services, of which Gmail might

  • not be considered one, but there are services out there,

  • for example, ProtonMail and others, that do encrypt email completely

  • from point to point.

  • Much safer in terms of protecting one's communications.

  • Similarly, you may be familiar with the mobile app Signal is also

  • used to encrypt communications between two people over the text messaging

  • network rather than over email and the internet.

  • Secure web browsing, you may be familiar with this distinction

  • between HTTP and HTTPS.

  • And if you're not, that's OK.

  • We're going to be talking about that a little bit later as well.

  • But you want to make sure that your web traffic is encrypted against people

  • who are able to just monitor the network for all the traffic that is going by.

  • You probably don't want your searches to be someone else's

  • fodder for entertainment.

  • VPNs.

  • If you use a VPN, that's a great thing to do if you're traveling, for example,

  • and you may be on less secure networks than you might find at your business

  • or at home or at a university institution, for example.

  • VPNs allow you to encrypt communications with a network,

  • and also allow the network to pretend to do something on your behalf so that

  • your web traffic cannot be traced back to you directly,

  • which might be advantageous in some situations as well.

  • Document storage as well.

  • So if you use services like Dropbox, for example, generally what

  • Dropbox is going to do is break your document into pieces

  • and encrypt those pieces.

  • Rather than just storing the whole file writ large in some server somewhere

  • on the cloud, it's going to encrypt it before it sends it over

  • so that you have some more comfort that your data is being

  • protected by these cloud services.

  • And certainly, we're going to talk a bit more about what the cloud is

  • and what cloud services are and what they can be used for a little bit

  • later in the course as well.

  • Hash functions and cryptographic hash functions are great,

  • but they are well documented and there's only the one.

  • There's only one version of SHA-1.

  • There's only one version of SHA-3.

  • And that is a limitation.

  • Now it might not be a severe one because it's pretty strong.

  • They're pretty strong algorithms.

  • But are there ways that we can improve our own cryptographic techniques

  • if we're trying to protect data that we are receiving,

  • data that we are sending, and so on?

  • And that leaves this idea of public-key cryptography,

  • or public- and private-key cryptography, or asymmetric encryption.

  • You'll hear these terms kind of used interchangeably.

  • Let's start by talking about public-key cryptography by way of an analogy.

  • We're going to go way back to arithmetic and algebra days here.

  • So imagine we have something like this.

  • We have 14 times 8 equals 112.

  • Multiplication we can think of as a function.

  • It is a function.

  • If 14 is our input and our function is times 8, the result is 112.

  • Now multiplication is not a hash function because it is reversible.

  • I can take that 112, multiply it by 1/8, or equivalently divide by 8,

  • and get back the original input.

  • So multiplication is a function, but it is not a hash function.

  • It is reversible because if we multiply any number x by some other number y,

  • we get a result z.

  • And we can undo that whole process by taking z, multiplying it by 1 over y,

  • or the reciprocal of y, and getting back the original x.

  • Reversible.

  • Goes in both directions.

  • Now let's take this function and kind of obscure it.

  • We know for ourselves that this function that I'm using is n times 8.

  • Whatever I pass in is going to be multiplied by 8.

  • But I don't tell you what that is.

  • I don't tell my friends what that is.

  • I just say, hey, if you want to send me a message,

  • just run it through this function.

  • So again, we're going to just use math as an example here.

  • If my message is 14, I might say, f of 14--

  • and again, this is getting back to algebra, maybe a little bit

  • back in the day--

  • f of 14 is 112.

  • That is my public key, you might think.

  • And you might say, having just gone through this whole example,

  • that, well, it's pretty easy to undo that.

  • If I know that 14 is the plain text and 112 is the cipher text,

  • I can probably figure out that your function is n times 8.

  • And so I've broken your encryption scheme.

  • I have figured out how to reverse your cryptography.

  • Well, it's true that n times 8 is certainly one function

  • that I could use to turn that plain text, 14 in this example,

  • into that cipher text, 112 in this example.

  • But there are other ways that I can do it.

  • My actual function could have been n times 10 minus 28.

  • So 14 times 10 is 140, minus 28 is 112.

  • And there are other contrived mathematical examples

  • that I could continue to do pretty much ad infinitum to define

  • ways to transform 14 into 112.

  • So just because you see that 112, that doesn't mean you

  • have figured out how to break my hash function.

  • You haven't figured out what my encryption technique is.

  • If all I say is, here's a black box that I would like you to feed an input

  • into, even if you see the output, you, or really more concernedly an adversary

  • who sees that output as well should not be able to, or cannot in this case,

  • undo it.

  • Because yes, I could have been using n times 8.

  • I could have been using this crazy thing involving the square of n.

  • And that's kind of the idea behind public-key cryptography.

  • I am going to publicize that I have a function that can be used,

  • but I'm not going to tell you what that function is,

  • and I'm certainly not going to tell you how to reverse it.

  • So public- and private-key cryptography are actually two hash functions

  • where the goal is to reverse them.

  • We kind of talked about this as hash functions

  • are supposed to be irreversible.

  • But the distinction here is that we are creating two functions, f and g, which

  • are intended to reverse one another.

  • So it's not that the function does the single function that is reversible,

  • it is that we have two functions that, working together, create a circuit.

  • If I take data and I run it through function f, I get some output.

  • If I run that output through function g, I get back the original data.

  • I have deciphered the information.

  • And the same thing works in reverse.

  • If I take some data and I run it through function g,

  • I get some hashed output that makes no sense.

  • And if I run that hashed output through function f,

  • I get back the original data once again.

  • Now the key is that-- pun intended-- the key is that one of these functions

  • is public and the other one is private.

  • One of them is available to everybody, and everybody uses

  • that function to send you messages.

  • If you want to send me a message using encryption,

  • using public and private key encryption, you take the message

  • and you use my public key to encrypt it, and you

  • send me the result, the hashed encrypted result.

  • And I use my private key to decrypt it.

  • And I am, ostensibly, the only person who

  • has my private key, even though I've broadcasted,

  • made my public key widely available.

  • Now the math that goes into this is well beyond the scope of a discussion

  • that we're going to have here today.

  • But basically, and most encryption, most cryptography

  • involves the use of prime numbers, particularly

  • very, very large prime numbers.

  • And you're looking for prime numbers that have a particular pattern.

  • And when I say "you're" looking for it, don't worry,

  • you don't have to do this yourself.

  • There are plenty of programs out there, RSA

  • being a very popular one, that can be used to generate

  • these public and private key pairs.

  • But the amazing thing is that it can generate these pairs very quickly,

  • but it's almost impossible to break or figure out

  • what the underlying functions, or even in this case

  • what the underlying two prime numbers are

  • that are the foundation for your own encryption strategy.

  • So it's pretty amazing that it's easy to define these functions

  • and almost impossible to reverse engineer them, so to speak.

  • So we start with a huge prime number, we find some other prime number that

  • has a property, a special property related to it,

  • and from those two numbers we generate two functions whose goal in life

  • is to undo whatever the first one does.

  • So f's job is to undo what g does, g's job is to undo what f does.

  • And this is called a public and private key pair.

  • So your public key is really some complicated hash function

  • that does work.

  • And that hash function is represented as a very long string

  • of numbers and letters.

  • It looks just like a hash digest.

  • But it's just a human representation, a readable representation

  • of a mathematical function.

  • And your private key is the same--

  • or your private key is also a representation of letters and numbers.

  • It's not exactly the same as your public key,

  • but it undoes the work that your private key does.

  • And again, these keys are generated using a program called RSA.

  • So let's take a look at exactly how we would

  • go about doing some asymmetric encryption using

  • public and private keys.

  • So here we have some original data.

  • It's a message perhaps that I want to send.

  • And I want to send it to you.

  • I want to send this message to you, but I don't

  • want to send it to you in the clear.

  • I don't want to, you know, it's sensitive information.

  • I don't want to send it via plain text.

  • And I don't want to use a generic hash function

  • because if I use a generic hash function, like SHA for example,

  • it's irreversible.

  • You will not be able to figure out what I tried to say.

  • So instead, I take this original data and I use your public key.

  • Your public key, again, is just a mathematical-- a very complex--

  • mathematical function.

  • So I take this data, I feed it into your public key, your public hash function,

  • and I get some garbled stuff out.

  • OK?

  • And this is what I send to you.

  • I send you this garbled stuff.

  • In order for you to figure out what the original message is,

  • you use your private key.

  • Not your public key-- your public key is what

  • I use to encipher the information-- but your private key, which

  • is known only to you, hypothetically.

  • It should not be distributed to others.

  • It undoes the work that your public key did.

  • And so if I give you the scrambled data and you

  • use your private key to try and decipher it,

  • you will get back that original data.

  • But here's the great thing.

  • No one else's private key will be able to do that.

  • If anybody intercepts that message other than you and they use their private key

  • or they use your public key again, they will not

  • be able to decipher the message that I sent to you.

  • And so public and private keys are very interesting because they

  • create these pairs.

  • They're these unique encryption schemes that are unique to two people,

  • or really even to one person.

  • If you were to send me a message back, you

  • would send me a message using my public key.

  • You would then send me whatever the encrypted sort of scrambled data

  • is for the message that you sent using my public key.

  • I would then use my private key, which is not

  • known to you or to, hypothetically, anyone

  • else to decipher what you sent me.

  • And I would get back the secret message, or the perhaps not-so-secret,

  • but sensitive message that you sent to me.

  • And so that's this idea of asymmetric encryption.

  • You can encrypt using someone's public key.

  • And anybody can do so.

  • And for that reason, you'll often find technically-minded people will

  • sometimes post their public key literally on the internet,

  • such that anybody who wants to send them a message using a secure channel

  • can do so.

  • And programmers as well.

  • So if I'm doing some work using a tool called GitHub, a popular service

  • available online for sharing and posting source code,

  • if I want to send something from my computer to GitHub's servers

  • in the cloud, I might authenticate using a public key and private key encryption

  • scheme so that they see that I'm using their public key to send them

  • information, they're decrypting it.

  • When they send information back to me, they're using my public key

  • and I use my private key to decrypt it.

  • It's actually part of--

  • it's part of a communication strategy used by technically-minded folks.

  • And you're not restricted to just having one public and private key.

  • For example, I have one public and private key

  • that I use for a secure email, I have one public and private key

  • that I would use for secure texting on my phone,

  • and I have one public and private key that I use for my GitHub repository.

  • So I have different sets and different combinations of these keys.

  • But the key is that-- the key, again, pun intended--

  • is that the decryption can only be done by someone who has the private key, not

  • the public key, because only those two functions are reciprocals

  • of one another.

  • They undo the work that the other did in the first place.

  • But interestingly enough, that's not the only thing

  • we can do with public and private keys.

  • So instead of just encryption, we also have this idea

  • of a digital signature, which is different than e-signature,

  • an e-signature just being the tracing of a pen typically

  • along some surface and just logging where all the pen strokes happen to be.

  • So we're talking about something much more complex than that.

  • We're talking about something cryptographically

  • based when we talk about digital signature.

  • It's kind of the opposite of encryption.

  • And using someone's digital signature, you

  • can verify the authenticity of a document and verify, more precisely,

  • the authenticity of the sender of a document.

  • And we're going to explain this in great detail in just a moment,

  • but the basic idea is they're signing the document using their private key.

  • You still don't see what the key is.

  • And because these public and private key pairs

  • are specific to an individual person, if you

  • were able to verify that that document could only

  • have been signed using someone's private key,

  • then you have quite a serious belief that that person

  • is the person who signed the document, who sent the document, and so on.

  • Digital signatures are 256 bits long pretty consistently,

  • which means there are 2 to the 256th power distinct digital signatures,

  • which makes the potential of a forgery effectively zero.

  • Again, I'm using this--

  • I'm trying to avoid saying never because computer scientists don't like never.

  • But effectively, there is no chance of a forgery.

  • Now the process for how one verifies a digital signature is quite--

  • there's quite a few steps involved.

  • And I have a diagram here that I sourced from online.

  • And what I'd like us to do now is walk through

  • this process to hopefully give you an understanding of how these work

  • and how you might be able to rely on digital signatures.

  • And states and different entities are recognizing digital signatures

  • as a valid way to sign documents, but it really helps

  • to have a good understanding of them such that you, as an attorney,

  • are comfortable with the fact that this does represent a specific individual.

  • So let's take a look at how this process works.

  • So we start with data.

  • Data in this case is any document.

  • Perhaps it's a scanned, signed version of some PDF with somebody's actual ink

  • signature.

  • But again, the whole thing is just scanned.

  • The next step is to use a hash function.

  • The hash function that we could use in this context could be anything.

  • It could be SHA-1.

  • It could be something very complex.

  • In general, the hash function that's going to be used here

  • is actually not a cryptographic hash function.

  • It's going to be something like MD5.

  • So something that anybody has access to.

  • And that's going to result in a hash, a set of zeros and ones.

  • In the case of MD5, it's going to be about 160 or so different characters.

  • Now where things get very interesting is we

  • take that hash, that set of zeros and ones,

  • and we encrypt it using the signer's private keys.

  • Remember, these functions are reciprocals of one another.

  • A public key can undo what the private key does,

  • and the private key can undo what the public key does.

  • Notice in this case we're still not sending anyone our private key.

  • We are just using our private key to encrypt something.

  • So we take this hash that we received from running our file through MD5,

  • we encrypt it using our private key, and we get some other result out of it.

  • This number that comes out of running the hash through our private key

  • is called the signature.

  • We then just couple that-- so when we send this off,

  • we send the signature plus the original document,

  • and that would be considered a digital signature.

  • So that's the signing part of the process.

  • That's where we go.

  • We start with a file.

  • We run that file through a generic hash function.

  • Not our public and private keys, something

  • that is generally pretty accessible.

  • We take that hash, we encrypt it using our private key

  • to get some other hash that looks similar, different zeros and ones,

  • but totally different pattern of zeros and ones.

  • We attach the original document and the digital signature when we send it off,

  • and that is considered a digitally signed document.

  • Now the real crux is how do you prove that I'm the person who

  • sent you this document, right?

  • If you want-- if you're receiving something

  • that has a digital signature, which is supposed

  • to be as good as any other kind of signature,

  • it's supposed to have legal effect.

  • How do we verify that that person who sent you the document

  • was actually the correct one?

  • So then we go to the verification step.

  • So we start, we've now received this digitally signed data.

  • This is the same as this digitally signed data here

  • that was sent by the sender.

  • We also received two pieces of information.

  • We received the document, the original document,

  • and we received the signature.

  • And recall, again, that the signature is what

  • happens when we take the hash of the document

  • and run it using our private key to get a result.

  • Now the interesting step here is remembering

  • that the public and private keys are reciprocals of one another.

  • So we can take this complicated signature hash

  • and we can use the public key, which, again, is publicly available.

  • Anybody should ostensibly have access to someone's public key, not

  • their private key.

  • And notice that the signer has never sent their private key.

  • They've only used it to encrypt some data,

  • but they never sent the private key.

  • The public key has always been available though.

  • We take the signature, we run it through the public key function,

  • and we get a hash.

  • We take the data, the document, and we run it through MD5,

  • the same hash function that the sender was supposed to use, and we get a hash.

  • And we're checking to make sure that these two

  • hashes are equal to one another.

  • If they are equal to one another, that means the signature is valid.

  • Let's talk about why that would be the case.

  • If we use the MD5 of this file, the generic hash of this file,

  • and we encrypt it using our private key, we get some result, OK?

  • But this is very easy to calculate.

  • It's MD5.

  • We're taking a basic document, we're running it through a publicly known,

  • well-defined hash function.

  • Anybody who has access to this document and a program on their computer

  • called MD5 can literally run this document through it

  • and get this number.

  • This is not the tricky part of this.

  • We then take this hash function, we encrypt it using our private key

  • to get some secret number.

  • The public key though will undo that.

  • Remember, the public and private keys are reciprocals of one another.

  • Whatever one does, the other one can undo.

  • And so only my public key will undo the work of my private key.

  • So if I take this value and I encrypt it using my private key,

  • and then I run this value through the public key,

  • I should get the original result again, the original MD5 hash.

  • And that's why we have to send the document as well, not

  • just the digital signature, the numbers that we

  • get by running it through our private key in the first place.

  • That way we have a way to validate that yes, this file has this checksum,

  • and the sender took that checksum, they ran it through their own private key,

  • and when I used their public key to undo it,

  • I get the same value, which is effectively proving, but is,

  • we'll term it as it's very, very, very, very

  • likely that this person who claimed to have sent the document

  • is, in fact, the person who sent that document.

  • And so that's what digital signatures can be used for.

  • It is a mathematical, cryptographic way to verify

  • the identity of the sender of a document or an individual.

  • Or in whatever context you might be using or receiving digital signatures,

  • it is purely a verification step that is based entirely in mathematics.

  • There's one other potentially interesting use

  • of digital signatures that's also quite buzzy right now,

  • and that's blockchain technology.

  • And what is the blockchain?

  • Digital signatures are really key to knowing how the blockchain works

  • and why it is trusted as a decentralized source of information for individuals.

  • So understanding digital signatures means

  • you are in a position to understand blockchain.

  • And I use here the term the blockchain, but it really is a blockchain.

  • There's no such thing as the one blockchain.

  • There are many different-- this is just an idea that is implemented.

  • Generally, we're hearing it in the context of a cryptocurrency,

  • but it does not need to be restricted to that, although cryptocurrencies are so

  • discussed in the media and have been dissected by so many researchers

  • that they provide an interesting vehicle, an interesting lens

  • through which to consider blockchain.

  • And so our example today is going to focus on Bitcoin.

  • It is the most well-documented of the cryptocurrencies.

  • It is the most well-documented implementation of the blockchain,

  • or among the most well-documented implementations.

  • But this is not specifically a lecture about Bitcoin.

  • We're just using Bitcoin as a lens through which to understand blockchain.

  • There's also an outside source that I strongly encourage.

  • This channel on YouTube provides interesting mathematical dissections

  • of topics, and they tackle blockchain and Bitcoin pretty extensively.

  • And this is an excellent supplementary resource

  • to consider if you're trying to dig into this

  • or understand it a little bit more, because in this video

  • I'm going to omit some of the more technical details for the sake

  • of, hopefully, broader understanding.

  • But if you want to dive into it more deeply,

  • this is a resource that I would recommend.

  • And I really like talking about Bitcoin in the context of blockchain

  • because it's actually how I kind of got started almost as an attorney.

  • When I was practicing, when I graduated from law school,

  • I decided to go out on my own and start my own firm.

  • I live in a small town and so a lot of my early work

  • was doing estate plans, wills and such for individuals in my town,

  • getting to know them.

  • But I had studied extensively technology-related law in law school

  • and I really wanted to use it.

  • And a few years into my practice, I had a friend

  • who needed an estate plan prepared, and he asked if he could pay me in Bitcoin.

  • And I had no idea what that meant.

  • I didn't really know anything about Bitcoin at the time.

  • And I looked it up and thought it sounded interesting,

  • and so I said sure.

  • So I learned how to set up an account.

  • And it's also worth mentioning at the outset,

  • as we're talking about cryptocurrency, that you

  • need to understand how Bitcoin works to use Bitcoin.

  • You don't need to understand how the federal banking

  • system works to use a bank.

  • And the same is true here with Bitcoin.

  • But I ended up accepting a Bitcoin payment

  • by creating what's called a Bitcoin wallet.

  • I immediately sold the Bitcoin that I received and turned it into cash, such

  • that I could use it for more generic purposes.

  • And what I decided to do was send out a press release saying,

  • oh, I accept Bitcoin, because it was something that was novel

  • and I hadn't really heard that much about it.

  • And this got the attention of my local paper and companies

  • in the area that were technically minded as well.

  • And so Bitcoin sort of provided this forum

  • to meet new clients that also allowed me to explore fields

  • of the law about which I am passionate.

  • So it's kind of an interesting segue to be able to share that with you now.

  • All right, so stepping away from Bitcoin again more broadly to blockchain.

  • What is the blockchain?

  • It's very similar to something you've already learned about,

  • which is a linked list.

  • So recall that a linked list is a set of nodes, each of which

  • have connections forward and backward to other nodes in the chain.

  • They are linked together.

  • And similarly, with a blockchain, all of the blocks are chained together.

  • It's basically the same terminology slightly modified.

  • So a linked list is a set of nodes, each of which is connected to the one prior

  • and the one after it.

  • We learned about linked lists as having generally three pieces of information

  • associated with them-- a previous pointer, which is basically

  • a reference to the prior node, or in this case, the prior block;

  • we have the next pointer, which is a reference to the next node

  • or the next block; and we had data.

  • And in this case, the data is actually two different things.

  • There's the real data.

  • And again, in the context of a cryptocurrency blockchain

  • we're going to be talking about a list of transactions,

  • a numbered list of transactions from person A to person b,

  • each of those transactions being digitally signed such

  • that you can verify that the person who logs that transaction

  • is actually the one who made that transaction.

  • And also, something called a proof of work.

  • And this proof of work is very interesting

  • because this is how Bitcoin ostensibly derives its authority.

  • There is no central controller of the Bitcoin currency,

  • and it is very decentralized.

  • And there needs to be some way for people

  • to agree as to what the true ledger is, or what the true set of transactions

  • that have happened are.

  • And the way that is done is by relying on something called the proof of work.

  • And we'll dive into that shortly as well.

  • So again, cryptocurrencies, that data is a ledger of transactions, each of which

  • is digitally signed using the digital signature

  • technique we've just discussed by the person who

  • made or initiated that transaction.

  • And that ledger is decentralized, which means that any time there's

  • ever a change, any time any transaction is recorded, in this case,

  • using Bitcoin, again, our lens through which to consider blockchain,

  • that message is broadcast out.

  • So if I make a transaction in Bitcoin, I pay you $10,

  • I'm going to announce to everyone else who has a Bitcoin wallet

  • or who is monitoring the blockchain, the list of transactions, hey,

  • please add the following transaction to this list, Doug pays you $10.

  • And that is announced to everybody, everybody records it in their ledger,

  • and then some stuff is going to start happening.

  • But here is a potential issue.

  • How do you know that the blockchain is legitimate?

  • How do you know that your copy of what is being said is the truth?

  • How do you know that your copy of the blockchain is accurate with respect

  • to all other transactions that have happened?

  • Everybody else has their own copy as well.

  • It's decentralized.

  • We all maintain, anybody who's using Bitcoin maintains

  • their own copy of the blockchain.

  • How do you defend against people modifying it?

  • That's a very interesting question.

  • The way that cryptocurrencies do it is to assume--

  • and this is defined in the Bitcoin paper--

  • the way the cryptocurrencies do it is to assume

  • that the chain that has the most computational work put into it

  • is the true chain.

  • This decision is completely arbitrary.

  • There's no reason why one needs to be vetted over the other.

  • But something had to be agreed upon by, collectively, users of Bitcoin

  • to say in the event of a dispute, between which person's chain is

  • the accurate de facto definitive list of transactions?

  • We're going to go with the one that has been verified the most times.

  • And again, this word verified is sort of a sketchy word.

  • There's nothing inherently about proof of work

  • or anything else that proves that a transaction has taken place in the way

  • that we normally think of this term verified.

  • Rather it is the collective standard by which we all agree to adhere,

  • that the person--

  • or that the blockchain that has the most proof of work in it is the list.

  • That is just something we must subscribe to as users

  • and consumers of blockchain.

  • Now how do we determine which blockchain has had the most computational work

  • into it, which copy of the blockchain has had the most computational work put

  • into it?

  • Well, this is proof of work.

  • So proof of work is how the correct blockchain of all the copies

  • that are decentralized is determined.

  • So recall how hashing works.

  • Hashing allows us to take any arbitrary data and run it through a hash function

  • and get an outcome.

  • And that outcome is going to be, let's say 256 bits, each of those bits being,

  • of course, 0 or 1.

  • Now there's a lot of different combinations there.

  • But some of them will be very unique.

  • And the way Bitcoin works, Bitcoin's blockchain works

  • is to prove a particular block.

  • We are asking people who are oftentimes called miners--

  • that's where this term comes from because they are mining. ,

  • Ultimately the reward for doing this proof of work is to receive Bitcoin

  • that are sort of generated out of thin air.

  • And so these people are termed miners.

  • But we are asking anyone who has a computer to hash the entire block.

  • So hash the entire list of transactions, the reference

  • to the previous block and the next block.

  • And remember, all of that is contained in a single node of this blockchain,

  • basically.

  • And we're looking for a highly unusual pattern.

  • We're looking for maybe the first 30 bits or the first 40 bits

  • to all be zeros.

  • That's really weird.

  • Like, that's a really difficult pattern to find.

  • And the only way to do it is to guess.

  • So you take this entire block, you attach a single piece of data

  • to the bottom of it, like 1, 2, 3.

  • You can just count in that way trying to guess.

  • And if you hash that entire thing together,

  • do you eventually find a block that, when hashed in this way,

  • produces this very, very unique pattern?

  • If so, you just say, here's the number that I attach.

  • So let's say I took the entire block and I hashed it with 12345

  • was the number, right?

  • It's very difficult to find a value that would

  • create this unique pattern of zeros and ones,

  • in particular, zeros, 30 zeros in a row.

  • But it's really, really easy to verify that someone has done it.

  • To verify that someone has done it, all you have to do is if they announce

  • the number that they used, 12345, as their proof of work--

  • and that's what the proof of work really is,

  • it's that number that they use to figure it out--

  • if they announce that and you hash the block with that number,

  • you can verify, yes, that pattern is actually 30 zeros in a row.

  • So I guess you have proven it.

  • Now this is, again, kind of arbitrary.

  • Like, this seems weird.

  • Why are you spending all your time trying

  • to figure out a specific pattern that exists somewhere?

  • That is a question that I cannot answer other than to say that it is

  • the standard by which people who have ascribed to the Bitcoin standard have

  • just agreed to be bound by.

  • The person who finds this number is probably the--

  • is proving the validity of all the transactions above it.

  • And this gets interesting when you think about somebody trying

  • to perpetrate a fraudulent transaction.

  • So imagine I'm trying to perpetrate a fraudulent transaction by initiating

  • a transaction that says, I'm going to pay you $100.

  • And I announce that to you, but I don't broadcast it

  • to everybody else who maintains the blocks, who are maintaining

  • their own copies of blockchains.

  • Which is interesting because you think that I have spent $100,

  • and as far as you're concerned I have spent $100 to you,

  • but no one else is aware of that.

  • So no one else thinks that I have spent $100.

  • They all think I am $100 wealthier than I actually am.

  • The problem then arises that I need to verify that block.

  • I need to verify that transaction.

  • So I append the transaction to my own copy of the blockchain

  • because I am the only person other than you--

  • the two of us maybe have these copies of the blockchain,

  • but everybody else, I didn't broadcast this transaction

  • so no one else knows about it.

  • In order for it to have a proof of work attached to it,

  • in order for it to be considered the valid chain,

  • I would need to prove that block.

  • I would need to find that secret number that when hashed with the entire block,

  • produces a pattern of 30 consecutive zero bits before anybody else does.

  • So that's a 1 in 2 over 2 to the 30th power chance

  • because I'm looking for a pattern of 30 consecutive zeros.

  • There's a 1 in 2 to the 30th power chance

  • that I'm going to find that pattern.

  • And I have to find that pattern before somebody else.

  • And in the meantime, other transactions are coming in on my ledger.

  • On my-- other people are broadcasting their transactions.

  • And I have to keep adding them to my ledger

  • and keep proving that work over and over and over,

  • all the while trying to stay ahead so that my fraudulent transaction is

  • considered ultimately the correct blockchain.

  • Now the odd-- you just can't beat the odds of that.

  • One malicious person trying to perpetrate a fraudulent transaction

  • using the blockchain cannot stay ahead.

  • They can't win the find the secret number

  • game over and over and over and over.

  • Eventually, some other chain, which contains valid transactions,

  • will win out over my attempted fraudulent chain.

  • And it will be disregarded.

  • Nobody will consider that to be a valid part of the chain anymore.

  • And so that's kind of how this works.

  • Again, it's arbitrary the way they decide to resolve or verify.

  • There's nothing about this process that proves

  • that person A sent person B money.

  • It's just the consensus that we have decided, well,

  • if people have gone through the effort to try and find these secret numbers,

  • and many different people are doing it, and this one chain is longer than

  • the others because it's been verified-- again, using this term verified--

  • it's been proven with work over and over and over, we're

  • just going to agree that that's the right one.

  • So again, it's kind of strange.

  • And I do, again, refer you to that video that I shared earlier

  • to get into some of the more technical details of this,

  • which I'm glossing over a little bit here in this discussion.

  • But proof of work is basically the collective consensus

  • of blockchain users, or in this case specifically,

  • of Bitcoin users, for which transactions they are going to consider valid.

  • Because changing any one-- and if you go back in time, as opposed

  • to trying to forward think I want to add a new fraudulent transaction,

  • if you try and go back in time to modify a transaction from the past,

  • say there was a transaction that was you pay me $10

  • and I maintain a copy of the blockchain, so I can go back in time

  • and modify that file, technically, I change it to you pay me $100, well,

  • because I've changed even the tiniest thing

  • and I'm hashing that block, that means that when I hash it

  • with that secret number, I'm no longer getting that secret pattern of 30

  • numbers, 30 zeros in a row.

  • And so that kind of calls that transaction into question.

  • It also, because each of those blocks contains a reference

  • to the next block and the previous block,

  • it also invalidates all of the other transactions in that blockchain.

  • And so because of this weird technique we're

  • doing of hashing blocks, hashing data, trying to look for specific patterns,

  • but realizing that any cryptographic hash function with the tiniest

  • change to the input creates a totally different output,

  • we actually are pretty well defended against people

  • who try and go back in time and make fraudulent transactions using

  • the blockchain.

  • So it's mathematical and it's quirky, but it does provide a clever way

  • to defend against that kind of thing, considering

  • we don't have a central authority to rely on

  • to adjudicate these kinds of disputes.

  • We are collectively, not trusting one another enough,

  • but agreeing to trust the mathematics of the blockchain in order

  • for it to succeed.

  • So as I mentioned, we can very easily verify the correctness

  • of someone's proof of work.

  • That proof of work is just the number that is hashed with the block

  • to produce the secret pattern of 30 zeros and then some other bits,

  • and so on.

  • The longer a chain gets, the more and more likely

  • it is that all the transactions in it are "verified."

  • Again, I keep putting air quotes around that word

  • because it doesn't mean in exactly the same way

  • that we might consider verified colloquially to mean.

  • It doesn't prove anything about the transaction itself,

  • just that we accept it as the standard.

  • We accept this as the de facto truth because of all the mathematics

  • that have been put into it.

  • So the longer a chain gets, the more likely

  • it is that it consists of only verified, legitimate transactions.

  • But that brings up a question of, what is a transaction?

  • A transaction is just an exchange between two people.

  • And if we start to really spread things out,

  • we can almost think about a transaction as a contract.

  • I offer you $10 for you to do something on my behalf,

  • and assuming that we're intending for me to actually give you these $10,

  • and you're intending to actually do something on my behalf,

  • and the thing that you're doing for me is not illegal,

  • we've basically formed a contract.

  • And so while Bitcoin can be used, the blockchain for Bitcoin

  • can be used to send money back and forth between people,

  • the data that goes into the data block of any blockchain is arbitrary.

  • And there's no reason why, instead of being a list of transactions,

  • that data couldn't be something much more significant than that.

  • There's no reason it couldn't be a digitally signed PDF

  • scan of a contract between two people.

  • There's no reason it can't be a message from me typed to you saying,

  • I will pay you $100 if you paint my house on Tuesday,

  • and you sending something back in that same chain saying, I will paint--

  • I accept your offer for this payment.

  • I accept your offer.

  • I will paint your house on Tuesday in exchange for $100.

  • We've just formed a contract with no middleman at all.

  • We are announcing our intentions.

  • It is being recorded publicly in everybody's version of the blockchain.

  • There is verified, again, verified in the sense

  • that we collectively term to be accurate rather than

  • proving that I definitely sent this although the digital signatures

  • associated with these transactions do, again,

  • suggest yes, I am the person who made this transaction because I digitally

  • signed it.

  • If I do the same thing with a contract, if I send you an offer

  • and you accept, and both of those items are in the chain,

  • we arguably have formed a contract.

  • And that is what the blockchain associated with the Ethereum technology

  • is actually more akin to.

  • So Bitcoin is kind of restricted in how it

  • approaches cryptocurrency and approaches transactions between people.

  • And Ethereum opens up a little bit more.

  • And there are other blockchain technologies and other services

  • that rely on the blockchain in order to do things far

  • beyond what a cryptocurrency could do.

  • But all these things are only possible because we rely on--

  • we rely so extensively on cryptography.

  • We use computers to send information securely, encrypt information.

  • And the mathematical unlikelihood of someone

  • being able to duplicate our work, or certainly

  • reverse engineer this encryption is what gives us

  • the confidence to make these transactions in the first place.

  • And so cryptography forms the basis of almost everything

  • that we do when we talk about security on a computer.

  • But ultimately, cryptography just relies on mathematics.

  • So the moral of the story is probably this.

  • You are probably not going to be implementing

  • your own version of the blockchain.

  • And really, you don't need to understand it completely in order to use it.

  • Like I said, you can use Bitcoin without knowing the mathematics of how Bitcoin

  • works, just like you can use a bank without knowing the minutia of how

  • the banking system works.

  • The point of the blockchain is to remove a central authority.

  • We don't rely on one person or one entity or one government

  • to determine what has happened, what the transactions are

  • like we do with a bank.

  • Your bank has a ledger of everybody's accounts.

  • With blockchain technology, we are decentralizing this and making it

  • so that everybody has access to all of the information at once,

  • and it is everybody's responsibility to keep that ledger accurate.

  • And because these ledgers rely so extensively on cryptography,

  • because this technology relies on cryptography,

  • we can use the power of cryptography, the fact

  • that things are very difficult to reverse engineer mathematically

  • to verify that yes, these are the things,

  • these are the things that have happened, these are the transactions that

  • have been logged, and everybody knows about it at the same time.

[MUSIC PLAYING]

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it