Building OpenAI o1 - VoiceTube: Learn English through videos!

Subtitles section Play video

We're starting a series of new models with the new name O1 and this is to highlight the fact that you might feel different when you use O1 as compared to previous models such as GPT-4-O.
So, as others will explain later, O1 is a reasoning model, so it will think more before answering your question.
We are releasing two models, O1 Preview, which is to preview what's coming for O1, and O1 Mini, which is a faster, smaller and faster model that is trained with a similar framework as O1.
So, we hope you like our new naming scheme, O1.
So what is reasoning anyway?
So one way of thinking of reasoning is that there are times where we ask questions and we need answers immediately because they're simple questions.
For example, if you ask what's the capital of Italy, you know the answer is Rome and you don't really have to think about it much.
But if you wonder about a complex puzzle or you want to write a really good business plan, you want to write a novel, you probably want to think about it for a while.
And the more you think about it, the better the outcome.
So reasoning is the ability of turning thinking time into better outcomes, whatever the task you're doing.
It's been going on for a long time, but I think what's really cool about research is there's that aha moment.
There's that particular point in time where something surprising happens and things really click together.
Are there any times for you all when you had that aha moment?
There was a first moment when the moment was hot off the press.
We started talking to the model and people were like, wow, this model is really great and started doing something like that.
And I think that there was a certain moment in our training process where we put more computes in our L than before and trained first while generating coherent chains of thought.
And we saw, wow, this looks like something meaningfully different than before.
And I think for me, this is the moment.
I think related to that, when we think about training a model for reasoning, one thing that immediately jumps to mind is you could have humans write out their thought process and train on that.
When aha moment for me was when we saw that if you train the model using RL to generate and hone its own chain of thoughts, it can do even better than having humans write chains of thought for it.
And that was an aha moment that you could really scale this and explore models reasoning that way.
For a lot of the time that I've been here, we've been trying to make the models better at solving math problems, as an example.
And we've put a lot of work into this.
And we've come up with a lot of different methods.
But one thing that I kept, like, every time I would read these outputs from the models, I'd always be so frustrated that the model just would never seem to question what was wrong or when it was making mistakes or things like that.
But one of these early O1 models, when we trained it and we actually started talking to it, we started asking it these questions and it was scoring higher on these math tests we were giving it, we could look at how it was reasoning.
And you could just see that it started to question itself and have really interesting reflection.
And that was a moment for me where I was like, wow, like, we've uncovered something different.
This is going to be something new.
And it was just like one of these coming together moments that was really powerful.
Thank you and congrats on releasing this.

No results