Placeholder Image

Subtitles section Play video

  • Hello, welcome to the 12 days of OpenAI.

  • We're going to try something that as far as we know, no tech company has done before, which is every day for the next 12, every weekday, we are going to launch or demo some new thing that we built.

  • And we think we've got some great stuff for you starting today.

  • We hope you'll really love it.

  • And you know, we'll try to make this fun and fast and not take too long, but it'll be a way to show you what we've been working on and a little holiday present from us.

  • So we'll jump right into this first day.

  • Today, we actually have two things to launch.

  • The first one is the full version of O1.

  • We have been very hard at work.

  • We've listened to your feedback.

  • You like O1 Preview, but you want it to be smarter and faster and be multimodal and be better at instruction following, a bunch of other things.

  • So we've put a lot of work into this.

  • And for scientists, engineers, coders, we think they will really love this new model.

  • I'd like to show you quickly about how it performs.

  • So you can see the jump from GPT-4.0 to O1 Preview across math, competition coding, GPQA, Diamond, and you can see that O1 is a pretty big step forward.

  • It's also much better in a lot of other ways, but raw intelligence is something that we care about.

  • Coding performance in particular is an area where people are using the model a lot.

  • So in just a minute, these guys will demo some things about O1.

  • They'll show you how it does at speed, how it does at really hard problems, how it does with multimodality.

  • But first, I want to talk just for a minute about the second thing we're launching today.

  • A lot of people, power users of ChatGPT at this point, they really use it a lot and they want more compute than $20 a month can buy.

  • So we're launching a new tier, ChatGPT Pro.

  • And Pro has unlimited access to our models and also things like advanced voice mode.

  • It also has a new thing called O1 Pro Mode.

  • So O1 is the smartest model in the world now, except for O1 being used in Pro Mode.

  • And for the hardest problems that people have, O1 Pro Mode lets you do even a little bit better.

  • So you can see a competition math.

  • You can see a GPQA Diamond.

  • And these boosts may look small, but in complex workflows where you're really pushing the limits of these models, it's pretty significant.

  • I'll show you one more thing about the Pro Mode.

  • So one thing that people really have said they want is reliability.

  • And here, you can see how the reliability of an answer from Pro Mode compares to O1.

  • And this is an even stronger delta.

  • And again, for our Pro users, we've heard a lot about how much people want this.

  • ChatGPT Pro is $200 a month, launches today.

  • Over the course of these 12 days, we have some other things to add to it that we think you'll also really love.

  • But unlimited model use and this new O1 Pro Mode.

  • So I want to jump right in and we'll show some of those demos that we talked about.

  • And these are some of the guys that helped build O1 with many other people behind them on the team.

  • Thanks, Sam.

  • Hi, I'm Hyungwon.

  • I'm Jason.

  • And I'm Max.

  • We're all research scientists who worked on building O1.

  • O1 is really distinctive because it's the first model we've trained that thinks before it responds.

  • Meaning it gives much better and often more detailed and more correct responses than other models you might have tried.

  • O1 is being rolled out today to all Plus and soon-to-be Pro subscribers on ChatGPT, replacing O1 Preview.

  • O1 model is faster and smarter than the O1 Preview model, which we launched in September.

  • After the launch, many people asked about the multi-modal input, so we added that.

  • So now the O1 model live today is able to reason through both images and text jointly.

  • As Sam mentioned, today we're also going to launch a new tier of ChatGPT called ChatGPT Pro.

  • ChatGPT Pro offers unlimited access to our best models like O1, 4.0, and Advanced Voice.

  • ChatGPT Pro also has a special way of using O1 called O1 Pro Mode.

  • With O1 Pro Mode, you can ask the model to use even more compute to think even harder on some of the most difficult problems.

  • We think the audience for ChatGPT Pro will be the power users of ChatGPT.

  • Those who are already pushing the models to the limits of their capabilities on tasks like math, programming, and writing.

  • It's been amazing to see how much people are pushing O1 Preview, how much people who do technical work all day get out of this, and we're really excited to let them push it further.

  • We also really think that O1 will be much better for everyday use cases, not necessarily just really hard math and programming problems.

  • In particular, one piece of feedback we received about O1 Preview constantly was that it was way too slow.

  • It would think for 10 seconds if you said hi to it, and we fixed that.

  • That was really annoying.

  • It was kind of funny, honestly.

  • It really thought, it cared.

  • And so we fixed that.

  • O1 will now think much more intelligently.

  • If you ask it a simple question, it'll respond really quickly.

  • And if you ask it a really hard question, it'll think for a really long time.

  • We ran a pretty detailed suite of human evaluations for this model, and what we found was that it made major mistakes about 34% less often than O1 Preview while thinking fully about 50% faster.

  • And we think this will be a really, really noticeable difference for all of you.

  • So I really enjoy just talking to these models.

  • I'm a big history buff, and I'll show you a really quick demo of, for example, a sort of question that I might ask one of these models.

  • So right here, on the left, I have O1.

  • On the right, I have O1 Preview, and I'm just asking it a really simple history question.

  • List the Roman emperors of the 2nd century.

  • Tell me about their dates, what they did.

  • Not hard, but, you know, GPT-40 actually gets this wrong a reasonable fraction of the time.

  • And so I've asked O1 this.

  • I've asked O1 Preview this.

  • I tested this offline a few times, and I found that O1, on average, responded about 60% faster than O1 Preview.

  • This could be a little bit variable, because right now we're in the process of swapping all our GPUs from O1 Preview to O1.

  • So actually, O1 thought for about 14 seconds.

  • O1 Preview, still going.

  • There's a lot of Roman emperors.

  • There's a lot of Roman emperors.

  • Yeah, 40 actually gets this wrong a lot of the time.

  • There are a lot of folks who rolled for, like, 6 days, 12 days, a month, and it sometimes forgets those.

  • Can you do them all from memory, including the 6-day people?

  • No.

  • Yep, so here we go.

  • O1 thought for about 14 seconds.

  • O1 Preview thought for about 33 seconds.

  • These should both be faster once we finish deploying, but we want this to go live right now.

  • Exactly.

  • So, yeah, we think you'll really enjoy talking to this model.

  • We found that it gave great responses.

  • It thought much faster.

  • It should just be a much better user experience for everyone.

  • So one other feature we know that people really wanted for everyday use cases that we've had requested a lot is multimodal inputs and image understanding, and Hyungwon is going to talk about that now.

  • Yep.

  • To illustrate the multimodal input and reasoning, I created this toy problem with some hand-drawn diagrams and so on.

  • So here it is.

  • It's hard to see, so I already took a photo of this, and so let's look at this photo in a laptop.

  • So once you upload the image into the chat GPT, you can click on it to see the zoomed-in version.

  • So this is a system of a data center in space.

  • So maybe in the future we might want to train AI models in the space.

  • I think we should do that, but the power number looks a little low.

  • One gigawatt. Okay.

  • But the general idea, I think.

  • Rookie numbers.

  • Yeah, rookie numbers.

  • Yeah.

  • So we have a sun right here taking power on this solar panel, and then there's a small data center here.

  • That's exactly what they look like.

  • Yeah.

  • GPU racks.

  • And then pump.

  • Nice pump here.

  • And one interesting thing about operation in space is that on Earth we can do air cooling, water cooling to cool down the GPUs, but in space there's nothing there, so we have to radiate this heat into the deep space.

  • And that's why we need this giant radiator cooling panel.

  • And this problem is about finding the lower bound estimate of the cooling panel area required to operate this one gigawatt data center.

  • Probably going to be very big.

  • Yeah.

  • Let's see how big it is.

  • Let's see.

  • So that's the problem.

  • I'm going to this prompt, and yeah, this is essentially asking for that.

  • So let me hit go, and the model will think for seconds.

  • By the way, most people don't know.

  • I've been working with Hyungwon for a long time.

  • Hyungwon actually has a Ph.D. in thermodynamics, which is totally unrelated to AI, and you always joke that you haven't been able to use your Ph.D. work in your job until today.

  • So you can trust Hyungwon on this analysis.

  • Finally, finally.

  • Thanks for hyping up.

  • Now I really have to get this right.

  • Okay, so the model finished thinking.

  • Only 10 seconds.

  • It's a simple problem.

  • So let's see how the model did it.

  • So power input.

  • So first of all, this one gigawatt, that was only drawn in the paper.

  • So the model was able to pick that up nicely, and then radiative heat transfer only.

  • That's the thing I mentioned.

  • So in space, nothing else, and then some simplifying choices.

  • And one critical thing is that I intentionally made this problem underspecified, meaning that the critical parameter is the temperature of the cooling panel.

  • I left it out so that we can test out the model's ability to handle ambiguity and so on.

  • So the model was able to recognize that this is actually an unspecified but important parameter, and it actually picked the right range of temperature, which is about the room temperature, and with that, it continues to the analysis and does a whole bunch of things and then found out the area, which is 2.42 million square meters.

  • Just to get a sense of how big this is, this is about 2% of the land area of San Francisco.

  • This is huge.

  • Not bad.

  • Not bad, yeah.

  • Oh, okay.

  • Yeah, so I guess this is reasonable.

  • I'll skip through the rest of the details, but I think the model did a great job making nice, consistent assumptions that make the required area as little as possible.

  • And so, yeah, so this is the demonstration of the multimodal reasoning.

  • And this is a simple problem, but O1 is actually very strong, and on standard benchmarks like MMU and MathVista, O1 actually has the state-of-the-art performance.

  • Now Jason will showcase the Pro mode.

  • Great.

  • So I want to give a short demo of ChatGPT-O1 Pro mode.

  • People will find O1 Pro mode the most useful for, say, hard math, science, or programming problems.

  • So here I have a pretty challenging chemistry problem that O1 preview gets usually incorrect, and so I will let the model start thinking.

  • One thing we've learned with these models is that for these very challenging problems, the model can think for up to a few minutes.

  • I think for this problem, the model usually thinks anywhere from one minute to up to three minutes.

  • And so we have to provide some entertainment for people while the model is thinking.

  • So I'll describe the problem a little bit, and then if the model is still thinking when I'm done, I've prepared a dad joke for us to fill the rest of the time.

  • I hope it thinks for a long time.

  • You can see the problem asks for a protein that fits a very specific set of criteria.

  • So there are six criteria, and the challenge is each of them asks for pretty chemistry domain-specific knowledge that the model would have to recall.

  • And the other thing to know about this problem is that none of these criteria actually give away what the correct answer is.

  • So for any given criteria, there could be dozens of proteins that might fit that criteria, and so the model has to think through all the candidates and then check if they fit all the criteria.

  • Okay, so you can see the model actually was faster this time, so it finished in 53 seconds.

  • You can click and see some of the thought process that the model went through to get the answer.

  • You can see it's thinking about different candidates like neuroligin initially.

  • And then it arrives at the correct answer, which is retinochisin, which is great.

  • Okay, so to summarize, we saw from Max that O1 is smarter and faster than O1 preview.

  • We saw from Hyungwon that O1 can now reason over both text and images.

  • And then finally, we saw with ChatterBot Pro mode, you can use O1 to reason about the hardest science and math problems.

  • Yep, there's more to come for the ChatterBot Pro tier.

  • We're working on even more compute-intensive tasks to power longer and bigger tasks for those who want to push the model even further.

  • And we're still working on adding tools to the O1 model, such as web browsing, file uploads, and things like that.

  • We're also hard at work to bring O1 to the API.

  • We're going to be adding some new features for developers, structured outputs, function calling, developer messages, and API image understanding, which we think you'll really enjoy.

  • We expect this to be a great model for developers and really unlock a whole new frontier of agentic things you guys can build.

  • We hope you love it as much as we do.

  • That was great.

  • Thank you guys so much.

  • Congratulations to you and the team on getting this done.

  • We really hope that you'll enjoy O1 and Pro tier.

  • We have a lot more stuff to come.

  • Tomorrow we'll be back with something great for developers, and we'll keep going from there.

  • Before we wrap up, can we hear your joke?

  • Yes.

  • So I made this joke this morning.

  • The joke is this.

  • So Santa was trying to get his large language model to do a math problem, and he was prompting it really hard, but it wasn't working.

  • How did he eventually fix it?

  • No idea.

  • He used reindeer enforcement learning.

  • Thank you very much.

Hello, welcome to the 12 days of OpenAI.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it