DeepSeek是在“抄袭”美国吗？它的出现对中美AI竞争的影响 DeepSeek ｜ LLM ｜Open AI ｜中国｜美国｜人工智能竞争｜开源模型 20250126金融汪

Subtitles section Play video

the AI race between China and the U.S. and what's at stake?
Okay so first of all China has a lot of disadvantages in competing with the U.S.
Number one is the fact that they don't get access to all the hardware that we have access to here.
So they're kind of working with lower-end GPUs than us.
It's almost like working with the previous generation GPUs crappily.
So and the fact that the bigger models tend to be more smarter naturally puts them at a disadvantage.
But the flip side of this is that necessity is the mother of invention.
Because they had to go figure out workarounds, they actually ended up building something a lot more efficient.
It's like saying, hey look you guys really got to get a top-notch model and I'm not going to give you resources.
I mean figure out something, right?
Unless it's impossible, unless it's mathematically possible to prove that it's impossible to do so, you can always try to like come up with something more But that is likely to make them come up with a more efficient solution than America.
And of course they've open sourced it so we can still adopt something like that here.
But that kind of talent they're building to do that will become an edge for them over time, right?
The leading open source model in America is Meta's Lama family.
It's really good.
It's kind of like a model that you can run on a computer.
But even though it got pretty close to GPT-4 and Asana at the time of its release, the model that was closest in quality was a giant 405B, not the 70B that you could run on your computer.
And so that was still not a small, cheap, fast, efficient, open source model that rivaled the most powerful closed models from Okinawa and Tropic.
Nothing from America.
Nothing from Mistral either.
And then these guys come out with like a crazy model that's like 10x cheaper in API pricing than GPT-40 and 15x cheaper than Sonnet, I believe.
Really fast, 60 tokens per second.
And pretty much equal or better in some benchmarks than worse than some others, but like roughly in that ballpark of 4OS quality.
And they did it all with like approximately just 2048H800 GPUs, which is actually equivalent to like somewhere around 1500 or 1500H100 GPUs.
That's like 20 to 30x lower than the amount of GPUs that GPT-4 is usually trained on.
And they did roughly $5 million in total compute budget.
They did it with so little money and such an amazing model.
Gave it away for free.
Wrote a technical paper.
And definitely it makes us all question like, okay, like if we have the equivalent of DOGE for like model training, this is an example of that, right?
Right.
Efficiency is what you're getting at.
So fraction of the price, fraction of the time, dumbed down GPUs essentially.
What was your surprise when you understood what they had done?
So my surprise was that when I actually went through the technical paper, the amount of clever solutions they came up with.
First of all, they trained a experts model.
It's not that easy to train.
There's a lot of like, the main reason people find it difficult to catch up with OpenAI, especially the MOE architecture, is that there's a lot of irregular loss spikes.
The numerics are not stable.
So often like you've got to restart the training checkpoint again.
And a lot of infrastructure needs to be built for that.
And they came up with very clever solutions to balance that without adding additional hacks.
And they also figured out 8-bit training, at least for some of the numerics.
And they cleverly figured out which has to be in higher precision, which has to be in lower precision.
And to my knowledge, I think floating point 8 training is not that well understood.
Most of the training in America is still running.
And maybe OpenAI and some people are trying to explore that, but it's pretty difficult to get it right.
So because necessities, I'm going to mention, because they don't have that much memory, that many GPUs, they figured out a lot of numerical stability stuff that makes their work.
And they claimed in the paper that the majority of the training was stable, which means they can always rerun those training runs again on more data or better data.
And then it only trained for 60 days.
So it's pretty amazing.
It's nice to say you were surprised.
So I was definitely surprised.
Usually the wisdom, or I would say the myth, is that Chinese are just good at copying.
So if we stop writing research papers in America, if we stop describing the details of our infrastructure, our architecture, and stop open sourcing, they're not going to be able to catch up.
But the reality is, some of the details in DeepSea 3 are so good that I wouldn't be surprised if Meta took a look at it and incorporated some of that in Llamas 4.
I wouldn't necessarily say copy.
It's all sharing science, engineering.
But the point is, it's changing.
It's not like China is copycat.
They're also innovating.
We don't know exactly the data that it was trained on, even though it's open source.
We know some of the ways and things it was trained on, but not everything.
And there's this idea that it was trained on public chat GPT outputs, which would mean it just was copied.
But you're saying it goes beyond that.
There's real innovation.
Yeah, look, they've trained it on 14.8 trillion tokens.
The internet has so much chat GPT.
If you actually go to any LinkedIn post or X post now, most of the comments are written by AI.
You can just see it.
People are just trying to write.
Even with an X, there's a Grok tweet enhancer.
Or in LinkedIn, there's an AI enhancer.
Or in Google Docs and Word, there are AI tools to rewrite your stuff.
So if you do something there and copy-paste it somewhere on the internet, it's naturally going to have some elements of a chat GPT-like training.
And there's a lot of people who don't even bother to strip away that I'm a language model part.
So they just paste it somewhere.
It's very difficult to control for this.
I think XAI has spoken about this too.
So I wouldn't disregard their technical accomplishment just because for some prompts, like, who are you?
Or which model are you in response to that?
It doesn't even matter in my opinion.
For a long time, we thought, I don't know if you agreed with us, China was behind in AI.
What does this do to that race?
Can we say that China is catching up or hasn't caught up?
I mean, if we say the meta is catching up to open AI and entropic, if you make that claim, then the same claim can be made for China catching up to America.
A lot of papers from China that have tried to replicate O1.
In fact, I saw more papers from China after O1 announcement that tried to replicate it than from America.
And the amount of compute DeepSeq has access to is roughly similar to what PhD students in the US have access to.
This is not meant to criticize others, even for ourselves.
For Perplexity, we decided not to train models because we thought it's a very expensive thing.
And we thought there's no way to catch up with the rest.
Will you incorporate DeepSeq into Perplexity?
We already are beginning to use it.
I assume they have an API and they have open source of AI so we can host it ourselves too.
And it's good to try to start using that because it actually allows us to do a lot of the things at lower cost.
But what I'm kind of thinking is beyond that, it's just like, okay, these guys actually could train such a great model with a good team.
And there's no excuse anymore for companies in the US, including ourselves, to not try to do something like that.
You hear a lot in public from a lot of thought leaders and generative AI, on the research side, on the entrepreneurial side.
Elon Musk and others say that China can't catch up.
The stakes are too big, the geopolitical stakes.
Whoever dominates AI is going to dominate the economy, dominate the world.
It's been talked about in those massive terms.
Are you worried about what China proved it was able to do?
Firstly, I don't know if Elon ever said China can't catch up.
I'm not aware of that.
Just the threat of China.
He's only identified the threat of letting China.
Sam Altman has said similar things.
We can't let China win.
I think you've got to decouple what someone like Sam says to what is in his self interest.
My point is, whatever you did to not let them catch up didn't even matter.
They ended up catching up anyway.
Necessity is the mother of invention, like you said.
Exactly.
What's more dangerous than trying to do all the things to not let them catch up?
What's more dangerous is they have the best open source model, and all the American developers are building on that.
That's more dangerous, because then they get to own the mindshare, the ecosystem, the entire American AI ecosystem.
In general, it's known that once open source is caught up or improved over closed source software, all developers migrate to that.
It's historically known.
When LLAMA was being built and becoming more widely used, there was this question, should we trust Zuckerberg?
But now the question is, should we trust China?
Should we trust open source?
It's not about who is in Zuckerberg.
Does it matter then if it's Chinese, if it's open source?
It doesn't matter in the sense that you still have full control.
You run it as your own weights on your own computer.
You are in charge of the model.
It's not a great look for our own talent to rely on software built by others.
There's always a point where open source can stop being open source too.
The licenses are very favorable today, but over time, they can always change the license.
It's important that we actually have people here in America building, and that's why Meta is so important.
I still think Meta will build a better model than DeepSecret 3 and open source, and what they call LLAMA 4 or 3.something.
It doesn't matter.
What is more key is that we don't try to focus all our energy on banning them, stopping them, and just try to out-compete and win.
That's just the American way of doing things.
Just be better.
It feels like we hear a lot more about these Chinese companies who are developing in a similar way, a lot more efficiently, a lot more cost-effectively, right?
Again, it's hard to fake scarcity, right?
If you raise $10 billion and you are decided to spend 80% of it on a computer cluster, it's hard for you to come up with the exact same solution that $5 million would do, and there's no need to berate those who are putting more money.
They're trying to do it as fast as they can.
When we say open source, there are so many different versions.
Some people criticize Meta for not publishing everything, and even DeepSpeak itself isn't totally transparent.
Sure.
You can go to the limits of open source and say, I should exactly be able to replicate your training run, but first of all, how many people even have the resources to do that?
I think the amount of detail they've shared in the technical report, actually Meta did that too, by the way.
Meta's Lama 3.3 technical report is incredibly detailed and very great for science.
The amount of detailed data these people are sharing is already a lot more than what the other companies are doing right now.
When you think about how much it costs DeepSpeak to do this, less than $6 million, think about what OpenAI has spent to develop GPT models.
What does that mean for the closed source model, ecosystem trajectory, momentum?
What does it mean for OpenAI?
It's very clear that we'll have an open source version of 4.0, or even better than that, and much cheaper than that, open source, completely in this year.
Made by OpenAI?
Probably not.
Most likely not.
I don't think they care if it's not made by them.
I think they've already moved to a new paradigm called the O1 family of models.
Ilya Sutskiy came and said, pre-training is a wall.
He didn't exactly use the word, but he clearly said the age of pre-training is a wall.
Many people have said that.
That doesn't mean scaling is a wall.
I think we're scaling on different dimensions now.
The amount of time the model spends thinking at this time, reinforcement learning, trying to make the model, if it doesn't know what to do for a new prompt, it'll go and reason and collect data and interact with the world, use a bunch of tools.
I think that's where things are headed.
I feel like OpenAI is more focused on that right now.
Instead of just a bigger, better model of reasoning capacity.
But didn't you say that DeepSeek is likely to turn their attention to reasoning?
A hundred percent.
I think they will.
That's why I'm pretty excited about what they'll produce next.
I guess my question is, what's OpenAI's moat now?
I still think that no one else has produced a system similar to the O1 yet, exactly.
I know that there's debates about whether O1 is actually worth it.
Maybe a few prompts, it's really better.
But most of the time, it's not producing any differentiated output from Sonnet.
But at least the results they showed in O3 where they had competitive coding performance, almost like an AI software engineer level.
Isn't it just a matter of time before the internet is filled with reasoning data?
Again, it's possible.
Nobody knows yet.
So until it's done, it's still uncertain.
So maybe that uncertainty is their moat, that no one else has the same reasoning capability yet.
By the end of this year, will there be multiple players even in the reasoning arena?
I absolutely think so.
So are we seeing the commoditization of large language models?
I think we'll see a similar trajectory, just like how in pre-training and post-training, that's our system for getting commoditized.
Where this year will be a lot more commoditization there.
I think the reasoning kind of models will go through a similar trajectory, where in the beginning, one or two players, they know how to do it.
But over time, like...
And who knows, right?
Because opening AI could make another advancement to focus on.
But right now, reasoning is their moat.
But if advancements keep happening again and again and again, I think the meaning of the word advancement also loses some of its value, right?
Totally.
Even now, it's very difficult, right?
Because there's pre-training advancements, and then we've moved into a different thing.
Yeah.
So what is guaranteed to happen is whatever models exist today, that level of reasoning, that level of multimodal capability, in like 5 to 10x cheaper models, open source, all that's going to happen.
It's just a matter of time.
What is unclear is if something like a model that reasons at test time will be extremely cheap enough that we can just all run it on our phones.
I think that's not clear to me yet.
It feels like so much of the landscape has changed with what DeepSeq was able to prove.
Could you call it China's chat to DT moment?
Possible.
I think it certainly probably gave them a lot of confidence that we're not really behind.
No matter what you do to restrict or compute, we can always figure out some workarounds.
And yeah, I'm sure the team feels pumped about the results.
How does this change the investment landscape?
The hyperscalers that are spending tens of billions of dollars a year on CapEx, they just ramped it up huge.
And OpenAI and Anthrobic that are raising billions of dollars for GPUs, essentially, that what DeepSeq told us is you don't need.
You don't necessarily need that.
Yeah.
I mean, look, I think it's very clear that they're going to go even harder on reasoning because they understand that whatever they were building in the previous two years is getting extremely cheap.
But it doesn't make sense to go justify raising that amount.
Is the spending proposition the same?
Do they need the same amount of high-end GPUs?
Or can you reason using the lower-end ones that DeepSeq is using?
Again, it's hard to say no until proven it's not.
But I guess in the spirit of moving fast, you would want to use the high-end chips.
And you would want to move faster than your competitors.
I think the best talent still wants to work in the team that made it happen first.
There's always some glory to who did this actually, who's a real pioneer versus who's a fast follower, right?
That was kind of like Sam Altman's tweet.
Kind of veiled response to what DeepSeq has been able to.
He kind of implied that they just copied and anyone can copy, right?
Yeah.
But then you can always say that everybody copies everybody in this field.
You can say Google did the transformer first.
It's not OpenAI.
And OpenAI just copied it.
Google built the first large language models.
They didn't prioritize it.
But OpenAI did it in a prioritized way.
So you can say all this in many ways.
It doesn't matter.
I remember asking you being like, why don't you want to build the model?
That's the glory.
And a year later, just one year later, you look very, very smart to not engage in that extremely expensive race that has become so competitive.
And you kind of have this lead now in what everyone wants to see now, which is like real world applications, killer applications of generative AI.
Talk a little bit about that decision and how that's sort of guided you where you see perplexity going from here.
Look, one year ago, I don't even think we had something like...
This is what like 2024 beginning, right?
I feel like we didn't even have something like Sonic 3.5. We had GPT-4, I believe.
And it was kind of nobody else was able to catch up to it.
Yeah.
But there was no multimodal, nothing.
And my sense was like, OK, if people with way more resources and way more talent cannot catch up, it's very difficult to play that game.
So let's play a different game.
Anyway, people want to use these models.
And there's one use case of asking questions and getting accurate answers with sources, with real time information, accurate information.
There's still a lot of work there to do outside the model and making sure the product works reliably, keep scaling it up to usage, keep building custom UIs.
There's a lot of work to do.
And we focus on that.
And we would benefit from all the tailwinds of models getting better and better.
That's essentially what happened.
In fact, I would say Sonic 3.5 made our products so good in the sense that if you use Sonic 3.5 as the model choice within Perplexity, it's very difficult to find a hallucination.
I'm not saying it's impossible, but it dramatically reduced the rate of hallucinations, which meant the problem of question answering, asking a question, getting an answer, doing a fact check, research, going and asking anything out there, because almost all the information is on the web, was such a big unlock.
And that helped us grow 10x over the course of the year in terms of usage.
And you've made huge strides in terms of users.
And we hear on CNBC a lot.
Big investors who are huge fans.
Jensen Huang himself, right?
You mentioned that in his keynote the other night.
He's a pretty regular user.
He's not just saying it.
He's actually a pretty regular user.
So a year ago, we weren't even talking about monetization because you guys were just so new and you wanted to get yourselves out there and build some scale.
But now you are looking at things like that, increasingly an ad model, right?
Yeah, we're experimenting with it.
I know there's some controversy on why should we do ads, whether you can have a truthful answer engine despite having ads.
And in my opinion, we've been pretty proactively thoughtful about it, where we said, OK, as long as the answer is always accurate, unbiased, and not corrupted by someone's advertising budget, only you get to see some sponsored questions.
And even the answers to the sponsored questions are not influenced by them.
And questions are also not picked in a way where it's manipulative.
Sure, there are some things that the advertiser also wants, which is they want you to know about their brand.
And they want you to know the best parts of their brand.
Just like how if you're introducing yourself to someone, you want them to see the best parts of you, right?
So that's all there.
But you still don't know how to click on a sponsored question.
You can ignore it.
And we are only charging them CPM right now.
So we ourselves are not even incentivized to make you click yet.
So I think considering all this, we're actually trying to get it right long term instead of going the Google way of forcing you to click on links.
I remember when people were talking about the commoditization of models a year ago, and you thought, oh, it was controversial.
But now it's not controversial.
It's kind of like that's happening.
You're keeping your eye on that.
It's smart.
By the way, we benefit a lot from model commoditization.
Except we also need to figure out something to offer to the paid users, like a more sophisticated research agent that can do multi-step reasoning, go and do 15 minutes worth of searching, and give you an analysis, an analyst type of answer.
All that's going to come.
All that's going to stay in the product.
Nothing changes there.
But there's a ton of questions every free user asks day-to-day basis that needs to be quick, fast answers.
It shouldn't be slow.
And all that will be free, whether you like it or not.
It has to be free.
That's what people are used to.
And that means figuring out a way to make that free traffic also monetizable.
So you're not trying to change user habits.
But it's interesting, because you are kind of trying to teach new habits to advertisers.
They can't have everything that they have in a Google Time Blue Links search.
What's the response been from them so far?
Are they willing to accept some of the trade-offs?
Yeah.
That's why they're trying stuff.
Intuit is working with us.
And then there's many other brands.
All these people are working with us to test.
They're also excited about it.
Everyone knows that, whether they like it or not, 5 to 10 years from now, most people are going to be asking AIs, most of the things, and not on the traditional search engine.
Everybody understands that.
Everybody wants to be early adopters of the new platforms, new UX, and learn from it, and build things together.
They're not viewing it as like, OK, you guys go figure out everything else.
And then we'll come later.
I'm smiling, because it goes back perfectly to the point you made when you first sat down today, which is necessity is the mother of all invention, right?
And that's what advertisers are essentially looking at.
The field is changing.
We have to learn to adapt with it.
OK, Arvind, I took up so much of your time.
Thank you so much for taking the time.

DeepSeek是在“抄袭”美国吗？它的出现对中美AI竞争的影响 DeepSeek ｜ LLM ｜Open AI ｜ 中国 ｜美国 ｜人工智能竞争 ｜开源模型 20250126金融汪

DeepSeek是在“抄袭”美国吗？它的出现对中美AI竞争的影响 DeepSeek ｜ LLM ｜Open AI ｜中国｜美国｜人工智能竞争｜开源模型 20250126金融汪