Placeholder Image

Subtitles section Play video

  • WOMAN: The following content is provided under a Creative

  • Commons license.

  • Your support will help MIT Open Courseware

  • continue to offer high quality educational resources for free.

  • To make a donation or to view additional materials

  • from hundreds of MIT courses, visit

  • mitopencourseware@ocw.mit.edu.

  • NEHA NARULA: OK, so let's get started.

  • OK?

  • So great.

  • We're here to talk about cryptocurrency

  • engineering and design.

  • I think the first question that comes up that's very obvious

  • is what is a cryptocurrency?

  • So this word was kind of invented 10 years ago when--

  • I don't know how many of you know the origin story of where

  • bitcoin came from, but basically a pseudonym on the internet

  • dropped a paper and some open source code

  • in a forum on an email list, and said, hey,

  • I have this idea for this thing called bitcoin.

  • It's kind of like electronic cash.

  • Here's how I think it could work,

  • and here is some code if you want to run it and become part

  • of this peer-to-peer network.

  • We don't know who this person is.

  • This person has basically virtually disappeared

  • from the internet and from the world.

  • But it's created something that has captured

  • so many people's imaginations and has sort of, depending

  • on how you measure it, created billions

  • and billions of dollars of economic value

  • and inspired a lot of people to think about how

  • to use this technology to solve a myriad of different problems,

  • not just electronic payments.

  • So cryptocurrencies and the technology behind them

  • are inspiring people to think about how to bank the unbanked,

  • add more auditability and traceability to our world,

  • get rid of trusted intermediaries and institutions

  • in certain situations, and basically solve

  • every problem, if you read about what blockchains

  • can do on the internet.

  • Now that's not exactly what this class is about.

  • This class is not going to be about applications.

  • This class is going to be about technology and infrastructure.

  • You're going to learn how to create a cryptocurrency, what

  • goes inside a cryptocurrency, what's important,

  • what are the techniques.

  • And what application you choose to apply that to down the line,

  • that's kind of up to you.

  • But we're not going to be doing digital identity or health care

  • records or something like that.

  • We're going to be talking about the technology.

  • So a big question is how are cryptocurrencies

  • different from regular currencies?

  • And another thing that I want to make really clear

  • is that the terms in this space are still being defined.

  • So you will hear people throw around

  • all sorts of terms-- cryptocurrency, blockchain,

  • consensus.

  • And these words kind of have floating, evolving meanings

  • right now.

  • Part of that is because bitcoin, the first cryptocurrency,

  • didn't come from academia, as far as we know.

  • It came from a community of enthusiasts on the internet.

  • And so it doesn't necessarily have the same basis and rigor

  • that we might expect from most of our academic fields

  • of study.

  • It's totally OK.

  • We're figuring it out as we go along.

  • And academia is really embracing this topic.

  • So if any of you are graduate students who

  • are looking for an area in which to do research,

  • I think basically, the number of papers

  • published on cryptocurrencies and blockchain technology

  • in respected academic venues is doubling every year.

  • So there's huge opportunity here.

  • So cryptocurrencies are not regular currencies.

  • They're not $1.00 or a pound or a euro,

  • what we normally think of as currency.

  • They're something different.

  • Bitcoin was sort of created out of nowhere.

  • And what does it mean to create a cryptocurrency?

  • Who says you can create a cryptocurrency?

  • What backs a cryptocurrency?

  • Why is it valuable?

  • Well, first, before we answer that question,

  • I just want to make it really clear what

  • this course is not about, OK?

  • We are not going to help you ICO.

  • If you are interested in ICO'ing, just go.

  • That's not what this class is going to be about.

  • We are not going to offer any trading advice.

  • We have zero opinions on whether you should buy bitcoin

  • now or sell or whatever, or zen cash,

  • or whatever all these things are.

  • So none of that.

  • Don't even ask us.

  • We're not interested.

  • And this class is not really going

  • to be about permissioned blockchains either.

  • Now you might not know what this term means yet,

  • and that's totally OK, but I just

  • want to make it clear that what we're talking about here

  • are cryptocurrencies.

  • They're open permission with systems in which there is

  • a token which has some value.

  • So that's what we're not going to do in the class.

  • So going back to--

  • and let me just pause there for a moment.

  • Let me pause and ask you if there are any questions so far

  • about what I've said.

  • Yeah.

  • AUDIENCE: Do they always have to have value?

  • NEHA NARULA: No, not at all.

  • And let's start to get into that.

  • So the question was do tokens always have to have value?

  • So I think, really, to understand

  • what are cryptocurrencies, what are tokens, what do they mean,

  • we have to talk about money.

  • And we have to talk about what money is and what it means.

  • So this is going to be very hand-wavy

  • and I'm sure not very satisfying to a real monetary economist.

  • But money developed-- there are a few different theories

  • about how money developed.

  • There is this thing called the coincidence of wants.

  • So maybe I have a sheep and Tadge has some wheat.

  • I am hungry and would like to make bread.

  • Tadge would really like to make a sweater.

  • And so we can barter, we can trade.

  • I have one set of goods that is useful to Tadge.

  • Tadge has another set of goods that are useful to me.

  • We can get together and make an exchange.

  • So that's fantastic.

  • Barter is incredibly important.

  • Barter has existed for a long time.

  • But what if Tadge doesn't have wheat, Tadge has vegetables,

  • and I don't want vegetables.

  • I want wheat.

  • But Tadge still wants the wool from the sheep.

  • How do we execute this trade?

  • We don't have a coincidence of wants.

  • We don't actually want the exact same thing from each other.

  • So some theories are that money evolved out of this problem.

  • And money can be represented in so many different ways.

  • Money, I think, was first created around 5000 BC,

  • so it's really, really, really old.

  • The things that represented money usually

  • had certain properties.

  • They were rare.

  • They were not easily reproducible.

  • People, at times, used things like shells or beads for money.

  • The first coins-- this is like a really interesting coin

  • that was developed.

  • Precious metals were often used for money.

  • And then eventually we sort of evolved

  • into what we think of as money now,

  • which is paper bills, currency.

  • Another theory of how money came about

  • is this idea of receipts, debt and credit.

  • So maybe I have a sheep, and I shear all of my sheep

  • and collect a lot of wool.

  • What I can do is I can store that wool somewhere.

  • And I can get a receipt from someone

  • from having stored that wool, and that receipt is of value.

  • It entitles the person who holds the receipt to the good that

  • is being stored.

  • And so another theory of money is

  • that money evolved out of these receipts,

  • trading these receipts back and forth.

  • Instead of taking all that wool with you,

  • you leave it in one place in a depository,

  • and the receipt acts as a bearer instrument.

  • Whoever owns it has access to the wool in the depository.

  • And so you can kind of see two different ideas

  • about what money is develop from this.

  • One is, well, it's a bead or a coin,

  • or something that I hold, something

  • physical that we've decided to assign value to

  • in and of itself.

  • And another idea is I'm going to use a trusted institution.

  • I'm going to deposit something with that institution,

  • and they are going to ensure the validity of that deposit

  • and manage who has access to that deposit.

  • So this doesn't really get at the question

  • that was originally asked, which is why do tokens have value.

  • But one thing I want to point out is--

  • well, a question I want to ask you guys, actually,

  • is why do these things have value?

  • Does anyone have any ideas?

  • Yes.

  • AUDIENCE: Because everyone agrees that they do.

  • NEHA NARULA: Because everyone agrees that they do.

  • Any other thoughts on why those things have value?

  • Yeah.

  • AUDIENCE: They're also backed by institutions

  • like the government.

  • NEHA NARULA: They're backed by institutions.

  • Say a little bit more about that.

  • What does that mean?

  • AUDIENCE: So the government's kind

  • of promising you to respect the value of that.

  • NEHA NARULA: OK.

  • The government's promising to respect the value of that.

  • Does anyone want to add to that or have another reason?

  • Yes.

  • And say your name.

  • I'm sorry, yeah.

  • AUDIENCE: Jared Thompson.

  • In the example of the dollar, the government

  • is willing to accept it as payment for taxes.

  • NEHA NARULA: Payment for taxes.

  • OK.

  • So that kind of connects the government thing.

  • AUDIENCE: Even if it had no value of any other sort,

  • it has value in that sense.

  • It's the last thing that holds up its value.

  • NEHA NARULA: OK, great.

  • Anybody else?

  • Yes.

  • AUDIENCE: I'm Paul.

  • I think those the three on the front of the dollar,

  • those have inherent value because they

  • might be more rare.

  • NEHA NARULA: They have value because they're rare.

  • OK.

  • Interesting.

  • All right.

  • So those are all really interesting ideas.

  • I think that those are all sort of properties

  • of what makes things valuable.

  • There are definitely things that are rare that are not valuable,

  • right?

  • I can think of some things that might be extraordinarily rare.

  • There's only one or two of them in the universe,

  • and you would have no interest in owning them whatsoever.

  • You wouldn't assign value to them.

  • Certainly it's really important that you can pay taxes

  • with this stuff because taxes is pretty much a requirement

  • of living in any country.

  • There are things that have value that you don't necessarily

  • use for taxes.

  • So that's a little confusing.

  • And then there's this idea that it's backed,

  • that it's backed by something.

  • And the dollar used to be backed by something.

  • And actually, if you look at $1.00,

  • I think it still says this, right?

  • It's backed by the full faith and credit of the United States

  • government.

  • TADGE DRYJA: They don't say that anymore.

  • NEHA NARULA: They don't say that anymore?

  • They used to say that.

  • But that's what a lot of people say about money.

  • It's backed by the full faith and credit of the United States

  • government.

  • What does that really mean?

  • I think what it all goes back to is

  • these things are valuable because we

  • think they're valuable.

  • We've all decided they're valuable.

  • And you know that if you have a $1.00 bill and you want to buy

  • something from someone, they're going to take it,

  • that you can make that exchange.

  • And the reason that they're going to take it

  • is because they know that someone else is

  • going to take it.

  • These things hold value because we think that they hold value.

  • It's a collective story that we all tell.

  • So I think once you look at money that way,

  • then when you start to look at tokens, which are essentially

  • digital representations of these things, things

  • that are rare and a little bit special, then

  • when you ask, well, why does this token have value,

  • because we think it has value.

  • So what makes a token inherently valuable?

  • The fact that we think it's valuable.

  • And a lot of different things can go into that.

  • Maybe we think it's valuable because it's very rare.

  • Or maybe we think it's valuable because someone's

  • promised that you can use it to pay

  • for storage, like with Dropbox.

  • Or maybe we think it's valuable for a completely different

  • reason, because we like the name,

  • or we like the people who are running the network.

  • But ultimately tokens are valuable.

  • These digital representations are

  • valuable because we think they're valuable.

  • Yes.

  • AUDIENCE: And also because they're a limited amount.

  • NEHA NARULA: Name.

  • AUDIENCE: [INAUDIBLE].

  • Because they're a limited amount.

  • NEHA NARULA: Well, so my argument

  • is that the fact that they're limited

  • is something that goes into our perception that

  • makes it valuable.

  • Great.

  • OK.

  • So now that we've learned a little bit about money,

  • talked a little bit about money, I

  • want to go into how payments work because ultimately, we're

  • going to get to cryptocurrencies.

  • And cryptocurrencies are electronic cash.

  • So here's the way that digital payments kind of work right

  • now.

  • You have an institution called a bank.

  • You have Alice and you have Bob, and Alice and Bob

  • have accounts at this bank.

  • And so the bank is keeping track of who owns what.

  • And these are these are records.

  • These might be digital records.

  • They might be paper records, whatever

  • the bank is using to keep track of who

  • has what in their account.

  • And so the way that I've set up this example right now,

  • Alice and Bob both have bank accounts.

  • Alice has $10.00 with the bank and Bob does not have any money

  • with the bank.

  • So let's say that Alice wants to pay Bob.

  • Let's say that Alice and Bob have gotten together.

  • Maybe they're in the same coffee shop.

  • And Alice wants to buy a sandwich from Bob.

  • And Bob says, OK, you need to pay me $1.00.

  • If you give me $1.00, then I'll give you the sandwich.

  • So how can Alice do this?

  • How can she transfer $1.00 to Bob?

  • Well, if she had a paper dollar, she could just do that.

  • But let's say that she doesn't have a paper dollar.

  • So Alice can ask the bank to make this transfer for her--or

  • $5.00.

  • So Alice sends a message to the bank

  • and authenticates with the bank to show the bank

  • that she is, in fact, Alice, but I'm not

  • going to go into the details on how that works.

  • And then the bank confirms that, makes the transfer in its

  • ledger, says Alice now has $5.00 and Bob now has $5.00.

  • Alice tells Bob, hey, I did this.

  • I talked to the bank.

  • Go check.

  • You can verify it for yourself.

  • Bob checks with the bank and sees, yes, in fact,

  • the bank is saying that he has $5.00 now,

  • whereas before he had zero.

  • And then Bob gives Alice the sandwich because he believes

  • that he now has $5.00.

  • And the bank sort of preserved the property

  • that money was not created out of nowhere, that the balance

  • was ultimately maintained.

  • So the bank is very important in this scenario.

  • The bank is critical.

  • This is how digital payments work.

  • Credit cards, Venmo, banks, kind of all sort

  • of based on the same idea, that there's some trusted

  • institution that is handling that payment for us

  • and that is keeping track of everything.

  • Now what are the pros and cons of this scenario?

  • Anyone want to throw a couple out?

  • Yeah.

  • AUDIENCE: The bank can get hacked

  • and people could move money around between the accounts.

  • NEHA NARULA: Right.

  • So we're putting a lot of trust in this bank.

  • And maybe should we trust the bank?

  • Banks fail sometimes.

  • Banks are hacked.

  • Banks have humans who are running them

  • who occasionally might want to change

  • those balances in their favor.

  • This has all happened.

  • Anything else?

  • Yeah.

  • And say your name.

  • AUDIENCE: Brittany.

  • If it's urgent, sometimes you might run into a delay

  • or it might take time with the process.

  • NEHA NARULA: Yeah.

  • Alice has to talk to the banks, and that's kind of annoying.

  • So there's that.

  • Anything else?

  • Yeah.

  • AUDIENCE: And if everyone can actually

  • withdraw at the same time, then the bank can actually

  • get money into the system.

  • NEHA NARULA: So OK.

  • So this is getting a little bit more advanced here.

  • What if everyone takes their balances out at the same time?

  • Well, we need to make sure that the bank actually

  • has that money, so to speak.

  • We're not going to be talking about that problem right now.

  • But very good problem.

  • So to kind of talk through some of the pros

  • and cons of this situation, one of the big pros, I think,

  • is, that even if Alice and Bob are not

  • in the same physical location, Alice can still pay Bob

  • if they can talk to the bank.

  • So it's pretty cool, and that's something

  • you can't do with dollar bills or with coins or with bars

  • of gold.

  • So having this trusted institution

  • that you can communicate with electronically

  • means that Alice and Bob could be halfway

  • around the world from each other and they can still

  • pay each other.

  • So that's pretty awesome, and that is definitely a property

  • that we want to have.

  • In terms of cons, I think we covered quite a few of them,

  • which is we're really putting this bank

  • kind of in the middle of everything here.

  • And there are a few different ways that can cause us trouble.

  • So the bank needs to be online during every transaction.

  • If the bank is offline, then how does Bob

  • know whether he got paid or not?

  • The bank could fail at some point in time, which

  • is kind of related to that.

  • The bank could simply decide that they

  • don't want to do this anymore and can block transactions.

  • And then privacy.

  • The bank has kind of insight into everyone

  • and their payments.

  • And this is incredibly sensitive information.

  • Payments are quite important.

  • And we're going to be talking about privacy

  • a lot in this class, during the second half of this class.

  • So just an example, a couple of visual examples of that.

  • The bank could just totally go away,

  • and then what happens to that ledger?

  • Who knows, right?

  • I mean, literally, it could just disappear.

  • Maybe it's paper and it gets burnt,

  • or maybe it's bits on a computer and it wasn't replicated.

  • The bank could decide that they don't

  • like Alice for some reason, and that they

  • don't feel like processing Alice's transactions.

  • This happens all the time in the real world.

  • So there have been designs for electronic cash

  • that work a little bit differently.

  • And we're going to kind of step up

  • to the design that came right before bitcoin,

  • and we're going to do that iteratively.

  • So let's talk about e-cash and how e-cash works.

  • So the way that e-cash works is Alice

  • tells the bank-- instead of saying,

  • hey, bank, do this transfer for me,

  • Alice says, hey, I would like a digital representation

  • of a coin.

  • Can you give me something that is digital

  • so I don't have to be in the same physical place as you,

  • and that I can use in such a way that I can prove to someone

  • else that I have this thing and that I haven't double spent it,

  • because that's the problem with digital representations

  • of coins.

  • A fundamental problem is that bits can be copied.

  • So whatever system you use to design your electronic cash,

  • you need to make sure that people can't just

  • copy coins and give what is the same coin to multiple people.

  • In the previous example, the bank

  • was making sure this happened.

  • The bank was maintaining balances

  • and debiting Alice's account and crediting Bob's account.

  • But if we want to think about something that doesn't involve

  • the bank, and we're starting to get there,

  • then we need to think about how to ensure that a coin can't be

  • what is known as double spent.

  • So Alice asks the bank for coin.

  • And maybe she has an account with a bank like before.

  • Or maybe she gives the bank teller actual physical money

  • in order to get one of these coins.

  • So the bank generates a unique number--

  • SN stands for serial number--

  • and decides that this is the digital representation

  • of the coin.

  • The bank then gives that coin to Alice in a way

  • that it's clear that the bank did this.

  • Usually this is done using a technique called

  • digital signatures.

  • We're going to get to that as class progresses,

  • but not right now.

  • Once Alice has this coin, then she can give it to Bob.

  • And Bob can take a look at this coin,

  • and hopefully there's enough going on with this coin

  • that Bob can be convinced that this is a real coin.

  • Alice didn't make it up out of nowhere.

  • She actually had the funds, so to speak, to give to Bob,

  • and that it hasn't been double spent.

  • And once Bob is convinced of that,

  • he can give Alice the sandwich.

  • Now in traditional e-cash, the way that this is done

  • is Bob actually goes back to the bank

  • and says, here's this coin.

  • Alice just gave me this coin.

  • Is this an OK coin?

  • But the fact of the matter is that the bank, in this case,

  • has a serial number and knows that it

  • gave that unique serial number to Alice,

  • and then Bob is showing up with a coin that

  • is that serial number.

  • And what the bank is doing here, in this example,

  • is the way that the bank checks to make sure

  • that this coin is correct is it looks at the serial number,

  • and it makes sure that it hasn't been spent before.

  • So the bank can link the coin between Alice and Bob,

  • which is unfortunate.

  • The also still sort of has to be online,

  • not to do the actual payment between Alice and Bob,

  • but in order for Bob to have confidence

  • that this coin is real.

  • And later on Bob can say, I would like $1.00 for this coin

  • that I've just given you, or something like that.

  • Or Bob can have an account with the bank

  • and can maintain a balance there.

  • So just to go through some of the pros and cons here,

  • OK, we've kind of done something where the bank's not

  • in the middle, except the bank is still really in the middle.

  • We're getting a step closer, but we're not there.

  • Alice can technically give Bob this electronic thing that

  • represents value, but Bob still needs

  • to talk to the bank to make sure it's real

  • and it hasn't been double spent.

  • And we still have this problem where the bank is the one who's

  • minting these things.

  • The bank can decide not to give Alice

  • a coin if it feels like it.

  • And we still have this privacy problem

  • because the secret number, the serial number that we

  • invent for the coin, can be linked across these payments.

  • So there's this notion of something

  • called Chaumian e-cash.

  • So David Chaumian is a cryptographer,

  • and he developed this system which

  • has slightly nicer properties than previous forms of e-cash.

  • So the idea here, which is really key,

  • is instead of the bank choosing the secret number,

  • Alice chooses the secret number.

  • And we have ways of generating random numbers

  • that we can be fairly sure are unique.

  • So we can let everybody generate their own random numbers.

  • So in Chaumian e-cash, Alice chooses the secret number

  • that represents a coin.

  • And then Alice blinds her message.

  • So Alice adds some randomness to the secret number such

  • that the bank doesn't know what that number actually is.

  • And we'll get into more detail about exactly what that means.

  • It's all in the paper that was assigned

  • reading for this class, so make sure

  • that you take a look at it.

  • So when the bank verifies that the secret number is a real

  • secret number and it's really a coin,

  • and Alice gave the bank $1.00 or something like that,

  • the bank does so on the blinded secret number.

  • And Alice actually has the ability

  • to remove that randomness, or that blinding, later

  • and end up with a valid signature on a secret number.

  • So Alice does the same thing that she did before.

  • She gives Bob a representation of that electronic coin.

  • And when Bob redeems it, note that the bank never

  • sees what the number is, so when Bob redeems it,

  • the bank has no way of linking the payment between Alice

  • and Bob.

  • So just to get into how this works visually,

  • Alice will talk to the bank, and Alice

  • will use a blinding factor on the secret number.

  • And so when Alice talked to the bank,

  • the bank doesn't actually see what the secret number is.

  • They can't decode it.

  • Again, Alice gives $1.00 or something like that to get this

  • coin from the bank.

  • And the bank signs this.

  • Alice can remove the blinding factor later.

  • And this is what the coin is.

  • The coin is a valid bank signature

  • on the secret number, and also the number itself,

  • which Alice can then send to Bob.

  • Bob can check and make sure that this is a valid signature

  • from the bank.

  • And if that's correct, then Bob can give Alice a sandwich.

  • In order to redeem this, Bob gives this coin to the bank.

  • The bank says, OK, I've never seen the secret number before,

  • and you have my signature on it.

  • So I'm going to assume that I went through this process

  • with somebody and signed something.

  • And now I'm going to record that secret number.

  • Once that happens, Bob can be sure

  • that this coin hasn't been spent before.

  • The bank keeps a running list of all the secret numbers

  • it's seen, and it makes sure that if it ever sees one again,

  • it can say no, this is not correct.

  • I should never see a secret number more than once.

  • Now, OK.

  • But know what about Alice could give one version of that

  • to Bob.

  • Alice could also give a version of that to Charlie.

  • And how are Charlie and Bob supposed

  • to know whose coin is correct?

  • Because remember, we wanted to try to get the bank out

  • of the way when doing this.

  • And so in Chaumian e-cash, the way

  • that this works is the bank actually

  • keeps a bit more information.

  • And the information that the bank is keeping

  • won't let the bank link these transactions together

  • unless Alice happens to give this to two people.

  • And so if Alice gives the same coin to two different people,

  • the bank will be able to detect it

  • and the bank will be able to know it was Alice.

  • And so this is kind of a motivator for Alice

  • not to do that.

  • So the idea being here is that the way that we get around

  • the fact that we don't know if a coin has been double spent

  • or not is we add punishment if the coin is double spent.

  • So Bob doesn't know for sure that this coin he receives

  • hasn't been double spent, but he does know that if it was,

  • someone's going to know it was Alice,

  • and they're going to punish her.

  • So this is a pretty clever scheme.

  • And this actually gets us around a lot of problems.

  • We have digital payments.

  • We can make the actual transfer without the bank in the middle.

  • We have some privacy now because the bank can't

  • link transactions together.

  • And we have this way of doing double spend detection.

  • We have a way of motivating people

  • not to double spend their coins, which means that you probably

  • don't have to check at the time you

  • receive a coin whether or not it's been double spent.

  • Of course, this still suffers from a really big problem,

  • which is that a bank can still decide that they just

  • don't want to do this with you.

  • They can just decide that they don't want

  • to play this game with you.

  • They don't want to issue coins.

  • Maybe they don't like you specifically.

  • Maybe they don't want to take your coins and exchange them.

  • So this scheme, Chaumian e-cash, solves quite a bit of problems

  • when it comes to how do we have electronic money

  • with some nice features, but it doesn't quite

  • get to all of them.

  • And so the real question in this class

  • is how do we do electronic money, really,

  • in a peer-to-peer way, where there's

  • no institution in the way.

  • There's no sort of entity that can say no.

  • TADGE DRYJA: So e-cash, the math is really interesting.

  • It kept relying on these banks and so it never quite

  • got off the ground.

  • So I'm willing to talk about somewhat more abstract and low

  • level primitives.

  • I'm not going to quite get into cash or tokens or transfers

  • or anything this lecture.

  • But I'm going to talk about the really basic primitives

  • that you need that we already sort of mentioned,

  • hash functions and signatures.

  • Signatures, obviously, we talked about a little bit,

  • what you need to be able to sign messages in order

  • to send these tokens around.

  • But first I'll talk about hash functions, which are basically

  • the most fundamental basic thing we use in these systems.

  • And I think if you've used computers,

  • or if you've programmed a little,

  • you probably have some familiarity

  • with hash functions.

  • They're simple, but they're actually extremely powerful.

  • The hash function is basically you

  • have some data, a bunch of bytes,

  • a bunch of ones and zeros.

  • You run it through a hash function

  • and you get an output that's also a bunch of ones and zeros.

  • Generally, the input data can be of any size.

  • You can hash something--

  • put in a megabyte, put in a gigabyte,

  • or put in a single byte, and generally the output

  • is of a fixed size.

  • So in the case of bitcoin, we use Sha-256.

  • The output size is 32 bytes long, or 256 bits long.

  • And this is used for lots of things in computers.

  • I guess the reason they call it a hash is because it's

  • like when you take the potatoes and chop them up

  • into little squares and grill them for breakfast,

  • it's sort of that idea, that we're taking this data.

  • And the data going in gets chopped up and smushed around

  • and then comes out into an output.

  • So this is not a sufficient different definition.

  • But I will say that you can sort of do everything

  • with hash functions.

  • There's some fun things that you can't do,

  • but you could make a cryptocurrency only using

  • a single hash function.

  • And I think people have, sort of for experimental reasons.

  • You limit the fun stuff you can do, but you can do signatures.

  • You can do encryption.

  • You can do all sorts of things like that.

  • OK.

  • So this is not a sufficient definition,

  • that there's any size input, a fixed size output,

  • and the output is random-looking.

  • That's sort of wishy-washy.

  • But what does random-looking mean?

  • It's not actually random.

  • If you put in the same input, everyone

  • will get the same output.

  • So if you say, OK, well, what's the hash of one,

  • you'll get some output.

  • And if someone else says, OK, what's the hash of one,

  • you'll get the same thing.

  • However, the output, while it is deterministic,

  • it's sort of high entropy in that

  • the output should have about as many as one bits as zero bits.

  • If you take the hash of one, it's

  • just going to look like a big random number.

  • And the hash of two will look like a completely unrelated

  • random number.

  • The outputs look like noise.

  • So if you've ever seen hash functions,

  • you can run it on your computer.

  • You say echo.

  • Hello, pipe Sha-256 sum, and you'll just

  • get some kind of crazy, random thing.

  • There doesn't seem to be any order to the outputs.

  • A little bit more well-defined.

  • We usually talk about the avalanche effect,

  • in that changing a single bit in the input

  • should change about half the bits of the output.

  • So even though you have extremely similar inputs,

  • they should be completely dissimilar outputs--

  • well, completely dissimilar, as in about half

  • the output changed.

  • If every bit changes, then it just

  • is the inverse of what you had, and so it's easily correlated.

  • But the avalanche effect is sort of how

  • hash functions are constructed, where generally they're

  • iterative rounds.

  • And so you say, OK, I'm going to swap these things

  • and multiply these things and shift these bits around such

  • that if any change in the beginning

  • will sort of propagate an avalanche, too,

  • so that all the output bits have been affected by it.

  • OK.

  • And a little bit more well-defined.

  • Generally, the hash functions are defined

  • by what they should not do.

  • So the three main things they should have--

  • preimage resistance, second preimage resistance,

  • which I'll sort of skip over, and collision resistance.

  • And we can define what these things are.

  • So a preimage is the thing that came before the output.

  • So it's sort of a math-y term.

  • But the idea is OK, if you know y, you can't find any x such

  • that the hash of x is equal to y.

  • So if I give you a hash output, and that's all I give you,

  • you should not be able to find an input that

  • leads to that output.

  • So if I just say, hey, here's a hash output.

  • It's 35021FF-- whatever, some long string,

  • you won't be able to figure out what

  • I used to put in to get that.

  • Of course, you can find it eventually.

  • For any given y, there's probably some x.

  • In fact, there's probably a lot of x's that

  • will lead to that y.

  • Since y is a fixed length and there's

  • two to the 256 possible y's, but there's

  • an infinite number of x's because x is not

  • bounded in length.

  • You can have a megabyte or a gigabyte or a terabyte size x.

  • So since there are sort of infinite numbers of x's,

  • and a fixed, though very large number of y's, as long

  • as it is a random mapping, there will

  • be lots of different x's that can lead to this y.

  • And so you should be able to find it.

  • It's just impractical.

  • It's like, yeah, you may be able to find it,

  • but it's going to take you two to the 256 tries

  • to find any specific y value.

  • And that's about 10 to the 78, which

  • is a number that's big enough that you can sort of round it

  • up to infinity.

  • Well, I mean, not quite, but big enough

  • that you're not going to be able to compute

  • that, the sun'll burnout and the universe'll die

  • and stuff like that.

  • So that's preimage resistance.

  • You can't go backwards.

  • Given the hash, you can't find what led to that.

  • OK.

  • Any questions about preimage resistance?

  • Seems reasonable?

  • It's a little interesting in that given y is

  • a little tricky, and that it's like, OK, well,

  • someone might know x in order for them to have computed y.

  • Or maybe it's just completely random,

  • and no one actually knows what the x is.

  • So there's a sort of loss of information

  • in the idea of a preimage stack.

  • OK.

  • Second preimage resistance.

  • This one's a little trickier and can get messy.

  • So I'll define it, but we won't go into it too much.

  • The idea is given x and y such that the hash of x

  • is equal to y, you can't find x prime where x prime is not

  • equal to x.

  • And the hash of x prime is equal to y.

  • So we're sort of giving you a preimage.

  • We're saying, hey, here's this number x and here's

  • this result y.

  • I bet you can't find another x that leads to it.

  • This one is actually poorly defined in the literature.

  • And so it's a little like, well, who

  • made x, and who gets to choose, and is it any x prime

  • and things like that.

  • So it's not actually that useful.

  • So we can just sort of gloss over that one,

  • just sort of mentioning it.

  • And then the other one that's very important is collision

  • resistance, where the idea is that nobody can find any x,z

  • pair such that x is not equal to z,

  • but the hash of x is equal to the hash of z.

  • And this one's a lot cleaner in that

  • there's no lack of information.

  • There's no secrets or anything.

  • It's just like, look, no one can find this.

  • And so it's really easy to disprove.

  • You can just say, hey, look, here's an x and here's a z.

  • Try hashing them.

  • Oh, shoot, the hashes are equal.

  • And it doesn't really matter how you got these

  • or who's doing it.

  • So that's a really nice, easy, clear property.

  • And again, you can find this eventually.

  • So if your output size is 256 bits long,

  • you'll be able to find two inputs that

  • map to the same output.

  • In fact, you do not need to try 256 times.

  • I'm not going to go into the details,

  • but you actually only have to try 128 times.

  • Sorry, two to the 128.

  • So you need to take the square root of the number of attempts

  • in order to find this collision because the intuitive reason

  • is, well, you just start trying things and keeping

  • track of all their hashes.

  • And there's what's called the birthday attack, which,

  • as you keep trying them, there's more possibilities.

  • The next thing you try, you can collide

  • with any of these things you've tried before.

  • And so you actually only have to do the square root.

  • And it's called the birthday attack

  • because there's the birthday paradox, which is not really

  • a paradox, but the idea is so in this room,

  • there's people that have the same birthday.

  • It's almost certain, which seems kind of weird

  • because the intuitive thing is, like, well, there's 365 days

  • a year.

  • Maybe once you get 160, 170 people in a room,

  • you're going to have two people with the same birthday.

  • But actually, it's like 22 or something--

  • anyway, that it becomes likely that people

  • have the same birthday.

  • So it's kind of counterintuitive,

  • and it applies in this case as well.

  • So to find a collision, you need the square root

  • of the output space.

  • But a hash function should not have collisions.

  • If you can find a collision, if any collision

  • exists for this hash function, you

  • can consider the hash function broken.

  • It's a little bit different than preimage resistance

  • because it's hard to definitively prove

  • that you've broken preimages.

  • That's something of an interactive process where

  • you say, hey, here's a y, and then someone comes back

  • with an x, and you're like, oh, OK,

  • you prove to me that you can find preimages.

  • But that's hard to tell to the rest of the world

  • because it was sort of interactive,

  • whereas collisions are very clear and non-interactive.

  • You can just say, hey, here's an x and here's a z.

  • Anyone can verify these.

  • Didn't really matter how you got it.

  • OK.

  • So some practical, how do these functions work.

  • Practically speaking, the collision resistance

  • is a harder property.

  • So there are many functions where the collision resistance

  • has been broken where the preimage resistance has not

  • been broken.

  • So examples are Sha-1 and MD5.

  • MD5's a fairly old one written by Ron Rivest over at--

  • well, I guess it wasn't at the Stata Center

  • because it was in the '80s.

  • But this was message digest 5.

  • I guess there were several before that.

  • And that is quite broken.

  • You shouldn't use it.

  • Its collision resistance is trivially broken.

  • You can find collisions in under a second on a modern computer.

  • Sha-1 happened later, in the late '90s, I think,

  • and NSA made it.

  • And there have been collisions found.

  • I think there's really only one collision that's

  • been found, basically, by a team at Google

  • and some Italian university last year.

  • And they spent a lot of computer time to find this collision.

  • But they did find it.

  • And then once you find one, it's sort of like, oh, yeah,

  • we really shouldn't use this anymore.

  • But in both of these cases, sha-1 and MD5,

  • there's no feasible preimage attack.

  • So given a hash output for either of these,

  • you can't find what the input was, or you

  • can't find a different input.

  • So generally, it's a lot easier to make a function

  • strong against preimages.

  • Collisions is sort of harder to deal with.

  • Also, practically speaking, how do these hash functions work?

  • It's a little bit of black magic.

  • There's no proofs that a hash function can even exist.

  • So if you could prove that there is a one-way function,

  • you get the Fields Medal, right?

  • It's like a million dollar prize.

  • So if you can prove it there is such

  • a thing as a hash function, you will be

  • a super famous mathematician.

  • We have no idea that this is even mathematically possible.

  • Or maybe the universe doesn't work this way.

  • It seems to, though.

  • It seems like there are these things that

  • work like hash functions, that work like one-way functions,

  • but we have no proof of that.

  • So even the most fundamental part that everything hinges on,

  • we don't even know if it exists.

  • And then this is sort of closely related,

  • if you're in the computer science-y stuff,

  • like p and mp--

  • anyway, so we don't know that these actually work.

  • And also, in practice, hash functions

  • are not nice math, cool things like elliptic curves

  • and RSA, prime numbers and stuff like that.

  • They're really, if you look at the code, it's sort of like,

  • well, I'm going to take these bytes

  • and I'm going to swap them.

  • And then I'm going to add these two numbers,

  • and then I'm going to rotate the bits over here,

  • and then I'm going to x over these things.

  • And then I'm going to do that 50 times.

  • And why 50?

  • Well, it seems like 50 is a good number.

  • It's not too slow.

  • No, really.

  • It's sort of black magic, Sha-256 uses 64 rounds.

  • Nice even number.

  • Different functions like Blake 2B uses 20 rounds.

  • But then there's also a version that uses

  • 12 rounds, which is faster.

  • And people think, well, it's still seems quite secure.

  • But if you want to be really secure,

  • use the 20-round variant.

  • If you want to be probably secure enough,

  • use the 12-round variant.

  • So there's no proofs.

  • There's heuristics and things like that, and best practices.

  • But this kind of cryptography is a little bit of black magic.

  • And it's not based on any cool mathematical number theory

  • stuff, either, the way that elliptic curve

  • cryptography or RSA stuff is.

  • So if you break RSA, you can say,

  • hey, I can now factor these composite numbers

  • very quickly, that's, in and of itself,

  • a cool mathematical discovery.

  • The breaking of Sha-1, there's not really

  • any cool math insight.

  • It was just like, yeah, we found this fairly specific,

  • weird path that we were able to break Sha-1

  • after a couple of years of computer.

  • So it's cool, and some people are super into it.

  • But it's something of a niche to actually build hash functions.

  • I would recommend not building your own hash function.

  • Yes.

  • AUDIENCE: I'm Wayne, and my question

  • is, is breaking a hash function literally just guess and check,

  • or is there more of a method to it?

  • TADGE DRYJA: So no.

  • If you say, hey, I found a collision

  • by doing two to the 128 attempts.

  • One, nobody's done two to the 128 attempts.

  • That's still seen as like beyond technology today.

  • But if that's how you break the function, that's not really

  • considered a break because that's sort of the definition,

  • is yeah, well, we know this is 256 bits long.

  • So to find a preimage, if you do two to the 256 attempts,

  • you'll find it.

  • So that's not considered a break.

  • A break is considered, hey, I found a preimage in two

  • to the 240 attempts.

  • Or I have a proof that you will be

  • able to find a preimage in two to the 240 attempts,

  • and here's how to do it.

  • And that's considered a break.

  • It's still impractical.

  • Two to the 240's still impossible in today's

  • technology.

  • But if you had a paper and people looked at it,

  • like, oh, yeah, that would work, you wouldn't be able to do it.

  • But that's still considered broken.

  • And so something like MD5, MD5 output size

  • was 16 bytes or 128 bits.

  • So collisions, even if it were strong,

  • it would still be too short today

  • that collisions would be able to be found in two to the 64

  • iterations, which is doable on today's computers.

  • If you run a bunch of stuff on AWS, you can do two to the 64

  • in a couple of days.

  • But that's the different definitions

  • of breaking the function.

  • Sort of fun.

  • Ethan Hellman, who's at BU and we

  • work with, he-- and we all broke the IOTA wrote their own hash

  • function, which is like some cryptocurrency.

  • And we found collisions in it.

  • And it was kind of fun.

  • But yeah, it was weird.

  • It wasn't like number theory.

  • It was just like, oh, well, I wrote this Python script

  • and we have this go script, and we tried this thing

  • and we got a collision.

  • So it was kind of fun.

  • So usages.

  • What do you use these hashes for?

  • There's lots of cool things you can use them for.

  • use them sort of as names or references,

  • where instead of naming a file, you can just

  • take the hash of a file.

  • And that is a good, compact representation so you can point

  • to what you're talking about .

  • So the hash of a file is a unique representation.

  • And if you change any bit in that file,

  • the hash will change.

  • And so you know that, OK, here's this way to point to a file.

  • You can also use it as sort of a reference or pointer

  • in different algorithms.

  • So you can say, anything you're using pointers for,

  • linked lists or maps and stuff like that, you can say,

  • well, I'm going to use a hash as a pointer

  • and then be able to sort through it that way.

  • So anytime you think of pointers and graph theory and stuff

  • like that in computer science, think, well,

  • could I use a hash function here instead of just

  • like regular memory pointer?

  • And in many cases, you can.

  • In some cases, you can't.

  • So you can't have cycles.

  • So the idea is you can't find preimages,

  • you won't be able to find a--

  • whereas you could make a cycle of pointers in a computer,

  • where A points to B, B points to C,

  • C points back to A. You shouldn't

  • be able to produce that with hash functions

  • because having that cycle means, OK, well, somehow you

  • found this preimage.

  • But in many cases, you can do this.

  • And another way to look at it is the hash is a commitment.

  • You can say, well, I'm not going to tell you what x is,

  • but I'll tell you what y is, and I can reveal x later.

  • And then, since everyone remembers y,

  • they can be sure that yeah, he's revealing the right thing.

  • There are no collisions in this function,

  • so we can be sure, if we're presented

  • with x, that this was the x that was committed to yesterday.

  • So I'll give a little example of that, of commit and reveal.

  • So you can commit to some kind of secret or something

  • you want to reveal later and reveal the preimage.

  • So here's my commitment.

  • This is an actual hash, Sha-256.

  • I just made it on my computer.

  • And there is a string.

  • There's an Ascii string that maps into this,

  • and it is a prediction about the weather,

  • but that's all I'll say.

  • And given that information and given this hash,

  • you probably can't find my prediction.

  • You can try to try all these different Ascii strings

  • about the weather today, but I'll reveal it.

  • So I think it won't snow Wednesday.

  • But I think it actually-- anyway,

  • and then I put this number in.

  • And so if you put this in your computer in Linux--

  • I think in Mac it's a slightly different command.

  • It's like Sha-2 or something.

  • But in Linux, this will work, and you can say,

  • I think it won't snow Wednesday.

  • And then I put some random numbers here

  • because if I had committed to just the phrase,

  • I think it won't snow Wednesday, you

  • might have been able to guess that.

  • You could say, well, he said it was about the weather.

  • I'm going to take all sorts of millions

  • of different strings related to days

  • and weather and common English words,

  • and I'm going to try hashing them

  • and see if I find a collision.

  • And you might be able to.

  • But I added this four bytes of randomness

  • at the end to make that difficult.

  • It doesn't really contribute to my commitment.

  • And you know this doesn't really mean anything.

  • But it makes it harder to guess what my input was

  • because I've already revealed that it's not

  • a fully random input.

  • So you might be able to guess things.

  • So I could say, hey, I'm going to make

  • a prediction about the weather, commit to it,

  • and then reveal my prediction tomorrow.

  • And we'll see if I was right.

  • This can be useful in the case where--

  • not the weather, but in other things--

  • if knowing my prediction could influence the actual events,

  • this would be a nice way to commit to what my prediction is

  • without everyone knowing what the prediction is

  • and then revealing it the next day.

  • Yes.

  • AUDIENCE: What are the use cases for double hashing,

  • like where you would hash that hash?

  • TADGE DRYJA: Hashing this again?

  • Well, so in bitcoin they hash everything twice.

  • Generally, you don't need to.

  • There's no explanation for why they do that in bitcoin.

  • You could.

  • But there are things you can construct

  • where you can, say, append some extra data

  • and then hash this again.

  • So you can say, here's my prediction for next week.

  • And this is the hash, and then hash it again.

  • So you can make chains of commitments

  • and then reveal iterations of it.

  • Actually, I had some slides where

  • you can sort of hash something again and again,

  • and start revealing it incrementally.

  • That might be useful.

  • I actually have stuff like that in software.

  • I've written where you want to reveal secrets.

  • But let's say I want to reveal secrets,

  • but I don't want everyone to have to store all of them.

  • So I can make a chain of hashes, commit to the last one,

  • and then as I reveal successive preimages,

  • you don't have to store all of them.

  • You can just store the latest preimage,

  • and you can reconstruct all the hashes from that.

  • Yes.

  • AUDIENCE: But is it computationally

  • difficult to run double hashes?

  • TADGE DRYJA: So to evaluate--

  • if you want to try this, it's imperceptible.

  • To perform one Sha-256 hash is, I

  • don't know, a billionth of a second or something.

  • You can generally do, like, 100 megabytes

  • to a gigabyte of hash output on a regular CPU.

  • NEHA NARULA: I think she's asking

  • does it make it harder to find a preimage if you hash twice,

  • and the answer's no.

  • TADGE DRYJA: The answer's sort of no.

  • It might.

  • So I don't know, chained MD5, can you still find collisions?

  • I'm not sure.

  • But generally the thinking is, if the hash function is broken,

  • and you can either find collisions or preimages, yeah,

  • maybe it gets a little harder by iterating it.

  • But you should just stop using it

  • and use something that's secure.

  • But yeah, it seems that finding preimages

  • would be harder since it's essentially adding more

  • rounds by hashing it twice.

  • And then there are some attacks, so it's fairly out there.

  • But it's called length extension attacks

  • due to how hash functions are constructed,

  • where if you do say, OK, I'm going to take the hash

  • and then take the hash of that, you

  • do prevent certain types of attacks that are fairly niche.

  • But a length extinction attack in

  • a Merkle-Damgard construction will be prevented by this.

  • So generally, no.

  • Generally, you don't need to do this.

  • But there are different constructions where you're

  • going to hash a bunch of times.

  • I don't have the slides here but, like a Merkle tree

  • is a binary tree of hashes where you're

  • taking the hashes of these things

  • and then hashing it again and again,

  • and that's a really useful data structure.

  • And a blockchain is essentially a chain of hashes.

  • And that's what we'll talk about next week.

  • But yeah.

  • OK.

  • So I'm going to go a little faster.

  • So that's an interesting use case where

  • you can commit and reveal.

  • And yeah, adding randomness so you can't guess the preimage.

  • This is called a hash-based message authentication

  • code where part of it is secret and part of it is not.

  • And this is getting towards a signature, where I've committed

  • to something, and then I reveal it,

  • and everyone knows, yeah, that must be what

  • he committed to the day before.

  • It's not quite a signature, but it's getting to that direction.

  • And so next I'm going to talk about signatures.

  • What is a signature?

  • It's useful, and it's a message signed by someone.

  • And so I'll define what a signature

  • is through the functions that it uses.

  • There's three functions will allow

  • you to create a signature scheme,

  • generate keys, sign, and verify.

  • And these different things.

  • Generate keys, you make a secret key and a public key.

  • And so the idea is there's some public key which is

  • your identity , and there's some secret key which you only

  • control.

  • And you use that to prove your identity

  • and prove that these messages are signed by you.

  • So yeah, you generate a key pair.

  • The holder of the secret key can sign a message.

  • And then anyone possessing a public key

  • can verify a message signature pair.

  • So I'll go into detail on these three functions.

  • And this applies generally.

  • So I'm going to talk about a hash-based signature in detail,

  • but there are many different signature schemes.

  • DSA, ElGamal, RSA signatures, elliptic curve signatures.

  • There's tons of different cool math systems that

  • allow these kinds of functions.

  • And I'll talk about in some ways,

  • this is one of the simplest ones.

  • So yeah, there's these three functions.

  • The first one is generate keys.

  • And it returns a private key public key pair.

  • And it generally doesn't take any arguments,

  • but it takes in randomness.

  • You need to flip coins.

  • You need to find random one and zero bits.

  • And it has to be long enough that no one else can

  • guess what your private key is.

  • So you have a private key, public key.

  • Public key is public.

  • You tell everyone.

  • Private key is more secret key.

  • Actually, I think in the code, I always say secret key.

  • It's usually better to say secret key because at least it

  • starts with a letter that's not p.

  • OK.

  • And then the signing function, where

  • you take your secret key and your message,

  • and it signs a message and returns a signature.

  • All these things are just strings of ones and zeros.

  • It's just a bunch of bytes.

  • Public key, a private key, a signature, a message.

  • These are all just bytes.

  • And then the verify function, which is the most complex.

  • A verify function takes a public key that you've seen,

  • a message, and a signature.

  • And it returns a Boolean whether this was valid or not.

  • So it returns a single bit.

  • If it's zero, it says, yeah, these two things

  • don't match up.

  • Maybe the message just changed, or maybe the signature

  • has changed, or maybe it's from a different public key

  • or something.

  • But if all three of these are correct,

  • and the signing function was the private key--

  • the secret key associated with this public key--

  • was signed to this message and produce this signature,

  • then it will return true.

  • And so you get into the math properties

  • of what does it mean to forge a signature,

  • and can they be forgeable computationally?

  • Eventually a lot of these things, since it's bits,

  • you could eventually guess the forgery.

  • But maybe that takes two to the 256 attempts or something.

  • OK.

  • So any questions about the basic structure of what

  • constitutes a signature scheme?

  • Mostly make sense?

  • And you can see how this is useful.

  • You can publish a public key and say, hey, I'm Tadge.

  • This is my public key.

  • And in fact, on my business card, I have a RSA public key.

  • And so if people get my business card and then I sign a message

  • and email it to them, they could be sure that, oh,

  • this is probably the same guy.

  • Nobody ever cares.

  • But it's useful for the stuff we were talking about

  • before with Chaumian cash, where Alice needs to authenticate

  • to the bank, and one way to do it is to sign a message

  • and say, hey, I'm Alice, give me a coin.

  • And then Alice can sign a message to Bob and so on.

  • So this is really useful as a basic building block for all

  • these kinds of messages.

  • So I'll talk in the last 14 minutes

  • about signatures from hashes.

  • This is doable.

  • Using just hash functions, you can construct a signatures key.

  • And in fact, that's the first problem set.

  • And you implement a signature system using only hashes.

  • And the hash function is already defined for you.

  • It's in the standard library.

  • It's just Sha-256, the same thing bitcoin uses.

  • And this is called Lamport signatures.

  • Leslie Lamport wrote about this late '70s.

  • I forget exactly when the paper came out.

  • But this was one of the earliest cryptographic signature

  • schemes.

  • And it's kind of cool.

  • And another fun thing is it's quantum resistant.

  • So if you know about quantum computers,

  • quantum computers kind of ruin all the fun

  • in terms of cryptography.

  • All the cool things we can do with cryptography-- not all,

  • but most of them get ruined by quantum.

  • Computers but hash functions are quite

  • resistant to quantum computers because they're not

  • based on any fun math.

  • They're based on this black magic

  • of just XORing and shifting numbers around.

  • That's a huge oversimplification.

  • But yeah, so those hash functions

  • are generally seen to be quantum-resistance.

  • So if you have a signature scheme that

  • only uses hash functions, well, it still

  • works, even if someone invents a quantum computer

  • and can break all these other things,

  • like RSA and elliptic curves.

  • So there's actually renewed interest

  • in these kinds of systems recently.

  • OK.

  • So how do you make a signature scene with just hash functions?

  • So how do you generate a key, in this case?

  • So a public key and a private key you want to generate.

  • So first we generate our private key.

  • Now these squares are 32 bytes each,

  • and you generate 256 of them on this row,

  • 256 of them on that row.

  • So you're generating 256 times two, or 512 32-byte blocks.

  • And these blocks are each 256 bits or 32 bytes.

  • So in total, that's what, 8K?

  • Eight kilobytes, I think.

  • Pretty big.

  • But anyway, you're saying, OK, here's my private key.

  • It's all completely random.

  • I just take slash dev slash urandom or whatever,

  • just flip coins 8,000 times, or however many this is total,

  • and generate all these different blocks

  • and store them on my hard drive and keep it secret.

  • Then I want to generate the public key.

  • So for each of these 32-byte blocks,

  • I take the hash of it, which will also be 32 bytes.

  • So there's now 512 hashes, 256 on this row, 256 on this row.

  • The green will be my public key.

  • And the gray one is my secret key.

  • So they all look the same.

  • They all look like just a bunch of random ones and zeros.

  • The gray ones actually are a bunch of random ones and zeros.

  • The green ones are actually hashes, though,

  • of all the gray ones.

  • And I publish the green ones.

  • Just to serialize it, I just put in a row.

  • I say, OK, here's this first 32-bit, second, third, fourth,

  • and then go to this row or whatever scheme you want.

  • So how is this useful?

  • Now everyone knows a bunch of hashes,

  • and I know a bunch of the preimages.

  • So now it's sort of this commit reveal thing, where

  • if I reveal to you this, you can verify that, oh,

  • yeah, that mapped to this one later on.

  • Any questions so far about this process?

  • Seems sort of useless but fairly straightforward.

  • OK.

  • Then I want to sign.

  • So first, to sign a message, I'm going

  • to take the hash of the message to sign.

  • And this is often done.

  • It's done in bitcoin.

  • It's done in most signature schemes, where I want

  • a fixed length number to sign.

  • It's annoying to have to say, well,

  • what if I want to sign a megabyte long file,

  • or what if I want to sign of 10-byte long string?

  • You want to standardize it.

  • So whatever I'm signing, it's always 256 bits long.

  • So if I want to just sign the message hi, first

  • I take the hash of the message hi, which in Sha-256, this

  • is the hash of hi.

  • And so I look at this as 256 bits, and I say,

  • OK, I'm going to pick the private key blocks to reveal

  • based on the bits here.

  • So the first bit here is a one, because it's an 8.

  • And so I'll reveal.

  • And I indicated before that there's

  • this zero row and this one row.

  • And now what that means is, well, the first bit

  • of my message to sign is a one.

  • So I'm going to reveal this gray square.

  • And the next bit, the next four bits, actually,

  • since it's an eight, are going to be zero.

  • So I'll reveal this and then I'll

  • reveal this, this, and this.

  • And I just made it up.

  • But yeah.

  • So for example, if I'm signing, and it starts with 01101110,

  • I reveal this preimage, this preimage, this preimage,

  • this preimage, these three, this one.

  • And so I reveal preimages based on the bit representation

  • of the message I'm trying to sign, and then

  • give everyone these.

  • So my signature will just be this sequence.

  • I can go in row order here.

  • Yeah, it's probably a lot easier.

  • So I go in sequence.

  • I say, OK, here's the first 32 bytes of my signature.

  • Here's the next, here's the next, here's the next.

  • And so my signature ends up being 256 blocks long,

  • each of which are 256 bits.

  • So it's like 8K.

  • The keys are 16K and this is 8K or something.

  • Fairly big but totally doable on a computer today.

  • Eight kilobytes is no big deal.

  • OK.

  • Now to verify, take the signature,

  • hash each block of the signature,

  • and see that it maps into that part of the public key.

  • So the people who are verifying the signature,

  • they have your public key.

  • They have all the green squares.

  • And now they have been given a signature, which

  • is these gray squares, and they say, OK,

  • well, let me hash this one.

  • Oh, it maps to that, so it maps to a zero.

  • Oh, this maps to a one, this maps to a one,

  • this maps to a zero.

  • And they can go through and say yeah,

  • this is a signature on that message.

  • In the case of Lamport signatures,

  • you can actually determine what the message is just

  • from the signature in the public key.

  • If you're given this and you're not

  • told whether it's a one or a zero, well, just compare.

  • Hash it and compare to these two green ones.

  • You'll be able to see.

  • And that's a useful signature because no one

  • can forge that because no one knows

  • these preimages except for the person who

  • holds the secret key.

  • So given your public key, I can't forge a signature

  • from you.

  • Once the signature is issued, I also can't forge a signature.

  • The only bit sequence I know is the one that you revealed.

  • And so I know part of your private key.

  • I know half of it.

  • But that half only lets me sign the message you just signed.

  • So I can't really do anything extra with this.

  • So this is a usable signature scheme.

  • I think I just showed it.

  • But any downsides that you can think of with this?

  • AUDIENCE: You can only sign one.

  • TADGE DRYJA: Yeah, you can only sign one.

  • Is that what you were--

  • AUDIENCE: You could also send the same message

  • on to someone else with different signatures.

  • TADGE DRYJA: Yeah, but signatures are sort of public.

  • So yes, you're saying that you can

  • sign a message once and give it to a bunch of people.

  • And that's sort of a feature, not a bug, I guess.

  • There are different signature schemes where you want,

  • I only want this signature to be valid to this person.

  • There's different ways to do that

  • with Diffie-Hellman key exchange and stuff.

  • But the signature scheme we've talked about here

  • with these three functions, the public key is really public,

  • and anyone can verify.

  • And that's something we want.

  • If you don't want that, there's other ways to do it.

  • But yeah, the big one is, wait, you can only sign once.

  • Once you generate a key pair, your private key,

  • your public key, and you tell everyone these green squares,

  • if you're try to sign again, you will reveal more pieces

  • of your private key.

  • So if I sign two different messages,

  • sometimes it's the same bit.

  • Sometimes it's different bits.

  • And now I start revealing more pieces of my private key.

  • And now people can start to forge signatures

  • because I can say, OK, well, the first bit,

  • I can sign anything on the first bit.

  • I'm still constrained here and here and here.

  • But in several locations, I can sign whichever bit I want.

  • And so the basic thing is, if there's one signature,

  • I can't forge anything.

  • If you give me two signatures, since it's generally random,

  • on average, half of the bits of the signature

  • will be constrained.

  • So in this case, if it's 256 bits long and you sign twice,

  • I probably still can't forge anything because 128 bits,

  • I have the freedom to pick either.

  • And the other 128 bits, I'm stuck with the one or the zero

  • and I don't get to choose.

  • So that means most messages I want to sign,

  • I won't be able to because if I tried two to the 128 attempts,

  • I'll be able to find a forged signature.

  • But that's a lot.

  • And so maybe you can sign twice.

  • But again, it's probabilistic.

  • You might get unlucky and reveal quite a bit

  • more than 128 bits, where you get both.

  • But on average-- and then once you have three signatures, OK,

  • now I've probably revealed 3/4 of the locations

  • you're going to have both the one and zero row.

  • And you can start-- and this starts

  • to be practical because in this case,

  • you'd need a 2 two the 64 attempts to forge a signature.

  • And that's doable on today's computers.

WOMAN: The following content is provided under a Creative

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it