Placeholder Image

Subtitles section Play video

  • So, I want to talk about a prompt that GPT-4L really struggles with, but our new model O1 preview can do pretty well.

  • And the prompt is simple, it's write a six-line poem about squirrels playing koalas at soccer that meets the following constraints.

  • In line two, the last word should end with I.

  • In line three, the second word begins with U.

  • In line five, the second to last word is eucalyptus.

  • And in the final line, each word has two syllables.

  • So first we'll try with GPT-4L.

  • And we'll see that the answer from GPT-4L meets some of the constraints, but not all of them.

  • The reason it's hard for GPT-4L is it has to get it correct on the first try.

  • It can't check that it meets the constraints and then revise the poem.

  • Now let's try the same poem with O1 preview.

  • And we'll see that differing from GPT-4L, O1 preview starts thinking before giving the final answer.

  • And you can view a summary of the thinking process of the model.

  • So first you can see it's starting to think about different words for rhyming.

  • Then you can see it wants to make sure the last word matches I.

  • It thinks about words like alibi.

  • It's analyzing word endings, and it's thinking about words like ski.

  • Then it's piecing together phrases, but it thinks they don't quite fit.

  • It's thinking about phrases where the second word starts with U.

  • Then it's tweaking the words to fit the two-syllable rule for line six.

  • It's digging into various two-syllable word combinations.

  • Then it's checking whether the poem aligns with all the guidelines.

  • It's working through the poem to analyze the soccer aspect.

  • And now let's look at the final poem.

  • So in the second line, the word safari does end with I.

  • In the third line, the second word unleash does begin with U.

  • In the second to last line, eucalyptus, the second to last word is eucalyptus.

  • And finally, in the final line, under moonlight creature scatter, indeed, each word has two syllables.

  • So this is an example of a prompt where, because the model can generate candidates and do reasoning before giving the final answer, it's able to give a higher quality response.

So, I want to talk about a prompt that GPT-4L really struggles with, but our new model O1 preview can do pretty well.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it