Placeholder Image

Subtitles section Play video

  • In the previous video we were talking about

  • transformers this architecture that uses attention to give

  • Unprecedented ly good performance on sort of language modeling tasks and some other tasks as well

  • but when were looking at language modeling and that was in preparation to make a video about

  • GPG 2, which is this very giant language model that has been there was recently

  • Well, it was recently not released actually by open AI the way that they generated the data set for this is pretty cool

  • to get enough text they went to Reddit and

  • They pulled every website that is linked to from reddit. Do we have any idea of how many days lots?

  • Literally, everything was everything that had more than three karma

  • I think or maybe more than two karma something like that like

  • Anything that had somebody had thought to post around it and at least two or three people who had thought was good enough to upload

  • They scraped the text from that. It's pretty much just a transformer. It's not the the

  • Architecture is not especially novel. They haven't done any like amazing new

  • new discovery, but

  • What they realized was?

  • Transformers it seems like

  • the more data you give them the better they do and the bigger you make them the better they do and

  • Everything that we built up until this point is clearly not

  • Like we haven't hit the limits of what this can do

  • We they thought we think we're probably

  • Bottle necked on data and maybe network size

  • So what happens if we'd like to turn that 211 what happens if we just give this all?

  • The data and make a really big one. It makes sense to talk about the acronym right so it's a generative pre-training

  • Transformer so generative same as generative adversarial network. It generates outputs to generate samples

  • Your pre-trained is this thing. I was talking about all of the different things

  • You can use a language model for right you can do you can do translation. You can try and resolve ambiguities

  • You can do summarization. You can answer questions. You can use the probabilities for augmenting other systems

  • So yeah, there's a bunch of different benchmarks for these different tasks

  • that you might want your language model to do and

  • This is what we talked about in the grid worlds video of having these like standardized problems with standardized metrics and standardized data sets

  • So that if you're comparing two different methods, you know that you're actually comparing apples to apples

  • And this is like very important it gives you numbers on these things. It's often quite difficult

  • Expected to like you're generating samples of text and it's like how plausible is this text? How realistic does it look like?

  • How do you put a number on that it's kind of difficult. So there's all of these standardized metrics and

  • the thing that

  • People came to realize which actually I mean I say that as though it's like some amazing discovery

  • It's fairly obvious. If you train your system in a like an unsupervised way on a large corpus of just general English text and

  • then you take that and

  • Train that with the data from this benchmark or the data from that benchmark

  • You can like fine-tune it so you start with something which has like a decent

  • Understanding of how English works more or less and then you say now I'm going to give you these

  • Samples for like question answering or I'm going to build a system using that to solve to go for this benchmark

  • So it's pre trained you start with something. That's like a general-purpose language model and then you from that a

  • Fine-tuned it to whichever

  • Actual benchmark or problem you're trying to solve

  • and this

  • Can give you better performance than to starting from nothing and training to each of the benchmarks from scratch

  • make sense

  • and so

  • The point of the GPT 2 paper the thing that makes it cool is they said okay if we make a really huge one

  • What if we?

  • don't

  • Fine tune it at all

  • What if we just make a giant model and then just try and run it on the benchmarks without messing with it?

  • Without showing it any of their specialized data for that benchmark. Just the raw

  • general-purpose language model, how does that perform and it turns out

  • surprisingly well, so this is a

  • Very very large data set for text

  • It's about 40 gigabytes

  • which

  • Actually doesn't sound like very much but like for text text that's insane, right? It's

  • somebody said that this was the size of

  • Google's entire index of the Internet in 98

  • So like it's yeah, it's a lot of text

  • and they trained it on that and they ended up with a

  • 1.5 billion parameter model, but which is like a previous state of the art system was 345 million

  • This is 1.5 billion

  • So they've just made the thing much much bigger and it performs really well some of their samples that they published quite

  • captured the public imagination

  • You could say and now that we've talked a little about the problems that

  • Neural networks or any language model really?

  • Has with a long term dependency

  • we can now realise just how impressive these samples are because when you look at them as a you know,

  • If you look at them uninitiated, you're like yeah, that's pretty realistic

  • It seems to like make sense and it's cool. But when you look at it knowing how language models work, it's like

  • very impressive the the coherence and the

  • Consistency and the long-range dependencies so we can look at this one that got everybody's attention the unicorns one

  • right

  • So they prompted it with in a shocking finding scientists discovered a herd of unicorns

  • living in a remote previously unexplored valley in the Andes Mountains

  • Even more surprising to the researchers was the fact that the unicorns spoke perfect English

  • And from there you then say you go to your language model gbgt, and you say given that we started with this

  • What's the next word and what's the word after that and so on?

  • So it goes on the scientist named the population after their distinctive horn of its unicorn

  • These four horned silver white unicorns were previously unknown to science

  • We do have a clue here as a human being unicorns for horned doesn't quite make sense

  • But nonetheless we're going okay

  • Now after almost two centuries the mystery of what sparked this odd phenomenon is finally solved. Dr

  • Budetti Jorge Jorge Perez

  • Jo are G an evolutionary biologist from the University of La Paz

  • This is impressive because we've mentioned the Andes Mountains in our prompt and so now it's saying okay

  • This is clearly, you know in a shocking finding. This is a science press release news article

  • It's seen enough of those because it has every single one that was ever linked to from reddit, right?

  • So it knows how these go it knows. Okay third paragraph

  • This is when we talk about the scientist, we interview the scientist, right? Okay

  • First word of the scientist paragraph, dr. Obviously, right because this is the now we're in the name of the scientist

  • What name are we going to give?

  • It needs to be a name

  • conditioning on the fact that we have the Andes Mountains

  • So we need to get where we're in South America

  • The name probably should be Spanish or maybe Portuguese

  • So we get we get dr. Perez here

  • And then evolutionary biologist makes sense because we're talking about animals

  • from the University of La Paz again

  • This is the first sentence like when you have that first clause that introduces the scientist you always say where they're from

  • So we say from the University of and then university names tend to be the name of a city

  • What's the city where we have the Andes Mountains, so we're going to Bolivia lapaz. Perfect

  • And the thing that's cool about this is it's remembered all of these things that were quite a long time ago several sentences ago

  • Well, it hasn't remembered them. It's paid attention to them across that distance, which is impressive

  • But also this is encoding a bunch of understand understanding a bunch of information about the real world

  • Right all that was given all it knows is statistical relationships between words, but the way that it comes out to us

  • Is that it knows?

  • Where the Andes Mountains are what kind of names people in that area have what their cities are what the universities are all of those

  • Facts about the real world because in order to have a really good language model it turns out you have to kind of implicitly encode

  • information about the world because

  • We use language to talk about the world and knowing what's likely to come next

  • Requires actual real world understanding and that's something that we see in some of the other

  • Things that they got it to do you can see the real world understanding coming through

  • Let's keep going

  • University of a person several companions were exploring the Andes Mountains when they found a small valley with no other animals or humans peres see

  • We're hanging on to him. Yep. We're referring to him again

  • but now we've changed it to be just the surname because that's the

  • format that people use in news articles Peres noticed that the valley had what appeared to be a natural fountain surrounded by two peaks of

  • Rock and silver snow presently others, then ventured further into the valley a round about here in our article

  • We should have a quote from the scientist right quote

  • By the time we reached the top of one peak the water looked blue with some crystals on top and we're talking about this fountain

  • I guess it's natural fountain. We're referring back to the previous int. It's like everything is

  • Relying on in contingent on earlier parts of the text while examining there by snipped paragraph while examining these bizarre

  • Creatures the scientists discovered that the creatures also spoke some fairly regular English know when I read that I like, okay

  • this is now unusually good because that's the second sentence of the lead right where six paragraphs in and

  • It knows about this point. I've covered the first sentence of this

  • initial paragraph

  • now it's time to talk about this second sentence of the lead even more surprising to the research of us of the fact that they

  • spoke English and

  • It completely ignored the speaking English part until it got to the part of the news article where that comes in

  • You've gone six whole paragraphs

  • the idea of

  • Accurately remembering that the unicorn speak perfect

  • English is like that's very impressive to me and then it goes into its gets a little bit unhinged

  • Starts talking about it's likely that the only way of knowing for sure if unicorns are indeed

  • The descendants of a lost alien race is through DNA. That's read it really

  • Well, it's not actually stuff on reddit. It's stuff linked to from reddit. But yeah, this is this is news articles men

  • They seem to be able to communicate in English quite well

  • Which I believe is a sign of evolution or at least a change in social organization said the scientist

  • That's his evolutionary biology there. Right? Right, right. Yeah, we know here's an evolutionary biologist. So so the the

  • coherence of this text is

  • really dependent on its ability to

  • Condition what it's generating on

  • Things that it's generated a long time ago

  • So yeah

  • So it can generate really nice news articles and it can generate all kinds of text things that it anything that is

  • Sufficiently well represented in the original data set. So that's GPG - it's a really

  • Unusually powerful and like versatile

  • language model that can do all of these different natural language processing

  • Tasks without actually being trained specifically on those tasks

  • It's really and that's that's why it's impressive

  • It's not that it's a it's a brand new architecture or a brand new approach or whatever

  • It's just when you make these things really huge and give them tremendously large amounts of data

  • The results are really impressive

  • In the original data set. So it will it will write you the Lord of the Rings fan fiction

  • It will write you cake recipes if we're like, there's all kinds of examples of different samples. Here's a recipe for

  • Some kind of peppermint chocolate cake and it's got a bunch of different

In the previous video we were talking about

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it

B1

獨角獸AI - Computerphile (Unicorn AI - Computerphile)

  • 5 0
    林宜悉 posted on 2021/01/14
Video vocabulary