Subtitles section Play video
Yesterday, China released a state-of-the-art, free and open-source, chain-of-thought reasoning model with performance that rivals OpenAI's o1, which I'm stupidly paying $200 a month for right now.
You see, there's two types of people in the tech world right now.
In one camp, we have the pessimists, who think that AI is overhyped and plateaued with GPT-3.5.
In the other camp, we have the optimists, who think we're about to see the emergence of an artificial superintelligence that will propel humanity into Ray Kurzweil's technological singularity.
Nobody truly knows where things are going, but one thing to remember is that pessimists sound smart, while optimists make money.
But sometimes, it's hard to be an AI optimist because you need to trust hype jedis like Sam Altman and closed AI companies like OpenAI.
Well, luckily, on the same day that TikTok's ban was removed, China gave the world a gift in return in the form of DeepSeek R1.
And in today's video, you'll learn exactly how to use it like a senior prompt engineer.
It is January 21st, 2025, and you're watching The Code Report.
Yesterday, the course of history changed forever.
And no, I'm not talking about the return of the king, but rather the release of DeepSeek, which is an MIT licensed chain of thought model that you can use freely and commercially to make money in your own applications.
This model came out while Sam Altman was busy at Trump's inauguration, which is a perfect time to use this new meme template, where Zuckerberg appears to detect a rack overflow in this artificial binary code owned by Jeff Bezos.
He's going to have some explaining to do with his wife, but Sam Altman also rained on the AI optimist parade recently when he said that the AI hype was No, they have not achieved AGI internally.
And that's pretty obvious with how buggy ChatGPT is.
Like recently, a security researcher figured out how to get ChatGPT to DDoS websites for you.
All you have to do is provide it with a list of similar URLs that point to the same website, and it will crawl them all in parallel, which is something that no truly intelligent being would do.
That being said, the release of o1 a few months ago was another step forward in the AI race.
But it didn't take long for open source to catch up, and that's what we have with DeepSeek R1.
As you can see from its benchmarks, DeepSeek R1 is on par with OpenAI o1, and even exceeds it in some benchmarks like math and software engineering.
But let me remind you once again that you should never trust benchmarks.
Just recently, this company, Epic AI, which provides a popular math benchmark, only recently disclosed that they've been funded by OpenAI, which feels a bit like a conflict of interest.
I don't care about benchmarks anyway and just go off of vibes, so let's go ahead and try out DeepSeek R1 right now.
And they have a web-based UI, but you can also use it in places like Hugging Face or download it locally with tools like Ollama.
And that's what I did for its 7 billion parameter model, which weighs about 4.7 gigabytes.
However, if you want to use it in its full glory, it'll take over 400 gigabytes and some pretty heavy duty hardware to run it with 671 billion parameters.
But if you want something that's on par with o1 Mini, you want to go with 32 billion parameters.
Now, one thing that makes DeepSeek different is that it doesn't use any supervised fine-tuning.
Instead, it uses direct reinforcement learning.
But what does that even mean?
Well, normally, with supervised fine-tuning, you show the model a bunch of examples and explain how to solve them step by step, then evaluate the answers with another model or a human.
But R1 doesn't do that and pulls itself up by its own bootstraps using direct or pure reinforcement learning, where you give the model a bunch of examples without showing it the solution first, then it tries a bunch of things on its own and learns or reinforces itself by eventually finding the right solution, just like a real human with reasoning capabilities.
DeepSeek also released a paper that describes the reinforcement learning algorithm.
It looks complicated, but basically, for each problem, the AI tries multiple times to generate answers, which are called outputs.
The answers are then grouped together and given a reward score, so the AI learns to adjust its approach for answers with a higher score.
That's pretty cool, and we can see the model's actual chain of thought if we go ahead and prompt it here with Ollama.
When prompting a chain of thought model like R1 or o1, you want to keep the prompt as concise and direct as possible, because unlike other models like GPT-4, the idea is that it does thinking on its own.
Like, if I ask it to solve a math problem, you'll notice that it first shows me all the thinking steps, and then after that thinking process is done, it'll show the actual solution.
That's pretty cool, but you might be wondering when to use a chain of thought model instead of a regular large language model.
Well, basically, the chain of thought models are much better when it comes to complex problem solving, things like advanced math problems, puzzles, or coding problems that require detailed planning.
But if you want to build the future with AI, you need to learn it from the ground up, and you can do that today for free thanks to this video's sponsor, Brilliant.
Their platform provides interactive hands-on lessons that demystify the complexity of deep learning.
With just a few minutes of effort each day, you can understand the math and computer science behind this seemingly magic technology.
I'd recommend starting with Python, then check out their full How Large Language Models Work course if you really want to look under the hood of ChatGPT.
Try everything Brilliant has to offer for free for 30 days by going to brilliant.org slash fireship or use the QR code on screen.
This has been The Code Report, thanks for watching, and I will see you in the next one.