Nvidia's 2024 Computex Keynote: Everything Revealed in 15 Minutes

Subtitles section Play video

Just last week, Google announced that they've put
CUDF in the cloud and Accelerate Pandas.
Pandas is the most popular data science library in the world.
Many of you in here probably already use Pandas.
It's used by 10 million data scientists in the world, downloaded 170 million times each month.
It is the Excel that is the spreadsheet of data scientists.
Well, with just one click, you can now use Pandas in Colab, which is Google's cloud data centers platform, accelerated by CUDF.
The speed up is really incredible.
Let's take a look.
That was a great demo, right?
Didn't take long.
This is Earth 2.
The idea that we would create a digital twin of the Earth, that we would go and simulate the Earth so that we could predict the future of our planet to better avert disasters or better understand the impact of climate change so that we can adapt better, so that we could change our habits now.
This digital twin of Earth is probably one of the most ambitious projects that the world's ever undertaken.
And we're taking large steps every single year.
And I'll show you results every single year.
But this year, we made some great breakthroughs.
Let's take a look.
On Monday, the storm will veer north again and approach Taiwan.
There are big uncertainties regarding its path.
Different paths will have different levels of impact on Taiwan.
Someday, in the near future, we will have continuous weather prediction at every square kilometer on the planet.
We will have continuous weather prediction at every square kilometer on the planet.
You will always know what the climate's going to be.
You will always know.
And this will run continuously because we've trained the AI.
And the AI requires so little energy.
In the late 1890s, Nikola Tesla invented an AC generator.
We invented an AI generator.
The AC generator generated electrons.
NVIDIA's AI generator generates electrons.
The AI generator generates tokens.
Both of these things have large market opportunities.
It's completely fungible in almost every industry.
And that's why it's a new industrial revolution.
And now we have a new factory, a new computer.
And what we will run on top of this is a new type of software.
And we call it NIMS, NVIDIA Inference Microservices.
Now, what happens is the NIM runs inside this factory.
And this NIM is a pre-trained model.
It's an AI.
Well, this AI is, of course, quite complex in itself.
But the computing stack that runs AIs are insanely complex.
When you go and use chat GPT, underneath their stack is a whole bunch of software.
Underneath that prompt is a ton of software.
And it's incredibly complex because the models are large, billions to trillions of parameters.
It doesn't run on just one computer.
It runs on multiple computers.
It has to distribute the workload across multiple GPUs, tensor parallelism, pipeline parallelism, data parallelism, all kinds of parallelism, expert parallelism, all kinds of parallelism, distributing the workload across multiple GPUs, processing it as fast as possible.
Because if you are in a factory, if you run a factory, your throughput directly correlates to your revenues.
Your throughput directly correlates to quality of service.
And your throughput directly correlates to the number of people who can use your service.
We are now in a world where data center throughput utilization is vitally important.
It was important in the past, but not vitally important.
It was important in the past, but people don't measure it.
Today, every parameter is measured.
Start time, up time, utilization, throughput, idle time, you name it.
Because it's a factory.
When something is a factory, its operations directly correlate to the financial performance of the company.
And so we realized that this is incredibly complex for most companies to do.
So what we did was we created this AI in a box.
And the container's an incredible amount of software.
Inside this container is CUDA, CUDNN, TensorRT,
Triton for inference services.
It is cloud native, so that you could autoscale in a Kubernetes environment.
It has management services and hooks, so that you can monitor your AIs.
It has common APIs, standard APIs, so that you could literally chat with this box.
We now have the ability to create large language models and pre-trained models of all kinds.
And we have all of these various versions, whether it's language-based, or vision-based, or imaging-based.
We have versions that are available for health care, digital biology.
We have versions that are digital humans that I'll talk to you about.
And the way you use this, just come to AI.NVIDIA.com.
And today, we just posted up in Hugging Face the LLAMA3 NIM, fully optimized.
It's available there for you to try.
And you can even take it with you.
It's available to you for free.
And finally, AI models that reproduce lifelike appearances, enabling real-time path traced subsurface scattering to simulate the way light penetrates the skin, scatters, and exits at various points, giving skin its soft and translucent appearance.
NVIDIA ACE is a suite of digital human technologies packaged as easy-to-deploy, fully-optimized microservices, or NIMs.
Developers can integrate ACE NIMs into their existing frameworks, engines, and digital human experiences.
Nemotron SLM and LLM NIMs to understand our intent and orchestrate other models.
Riva Speech NIMs for interactive speech and translation.
Audio-to-Face and Gesture NIMs for facial and body animation.
And Omniverse RTX with DLSS for neural rendering of skin and hair.
And so we installed every single RTX GPU with Tensor Core processing.
And now we have 100 million GeForce RTX AI PCs in the world.
And we're shipping 200.
And this Computex, we're featuring four new amazing laptops.
All of them are able to run AI.
Your future laptop, your future PC, will become an AI.
It'll be constantly helping you, assisting you in the background.
Ladies and gentlemen, this is Blackwell.
Blackwell is in production.
Incredible amounts of technology.
This is our production board.
This is the most complex, highest performance computer the world's ever made.
This is the gray CPU.
And these are, you could see, the CPUs are the most powerful, the most powerful, and these are, you could see, each one of these Blackwell dies, two of them connected together.
You see that?
It is the largest die, the largest chip the world makes.
And then we connect two of them together with a 10 terabyte per second link.
So this is a DGX Blackwell.
This has, this is air-cooled, has eight of these GPUs inside.
Look at the size of the heat sinks on these GPUs.
About 15 kilowatts, 15,000 watts, and completely air-cooled.
This version supports x86, and it goes into the infrastructure that we've been shipping hoppers into.
However, if you would like to have liquid cooling, we have a new system.
And this new system is based on this board, and we call it MGX for modular.
And this modular system, you won't be able to see this.
Can they see this?
Can you see this?
You can?
Are you?
OK.
I see.
OK.
And so this is the MGX system, and here's the two Blackwell boards.
So this one node has four Blackwell chips.
These four Blackwell chips, this is liquid cooled.
Nine of them, nine of them, well, 72 of these, 72 of these GPUs, 72 of these GPUs are then connected together with a new NVLink.
This is NVLink switch fifth generation.
And the NVLink switch is a technology miracle.
This is the most advanced switch the world's ever made.
The data rate is insane.
And these switches connect every single one of these Blackwells to each other so that we have one giant 72 GPU Blackwell.
Well, the benefit of this is that in one domain, one GPU domain, this now looks like one GPU.
This one GPU has 72 versus the last generation of eight, so we increased it by nine times.
The amount of bandwidth we've increased by 18 times.
The AI flops we've increased by 45 times, and yet the amount of power is only 10 times.
This is 100 kilowatts, and that is 10 kilowatts.
This is one GPU.
Ladies and gentlemen, DGX GPU.
The back of this GPU is the NVLink spine.
The NVLink spine is 5,000 wires, two miles, and it's right here.
This is an NVLink spine, and it connects 72 GPUs to each other.
This is an electrical mechanical miracle.
The transceivers makes it possible for us to drive the entire length in copper.
And as a result, this switch, the NVLink switch, driving the NVLink spine in copper makes it possible for us to save 20 kilowatts in one rack.
20 kilowatts can now be used for processing.
Just an incredible achievement.
We have code names in our company, and we try to keep them very secret.
Oftentimes, most of the employees don't even know.
But our next generation platform is called Rubin.
The Rubin platform, I'm not going to spend much time on it.
I know what's going to happen.
You're going to take pictures of it, and you're going to go look at the fine prints, and feel free to do that.
So we have the Rubin platform, and one year later, we'd have the Rubin Ultra platform.
All of these chips that I'm showing you here are all in full development, 100% of them.
And the rhythm is one year at the limits of technology, all 100% architecturally compatible.
So this is basically what NVIDIA is building.
A robotic factory is designed with three computers.
Train the AI on NVIDIA AI.
You have the robot running on the PLC systems for orchestrating the factories.
And then you, of course, simulate everything inside Omniverse.
Well, the robotic arm and the robotic AMRs are also the same way, three computer systems.
The difference is the two Omniverses will come together.
So they'll share one virtual space.
When they share one virtual space, that robotic arm will become inside the robotic factory.
And again, three computers, and we provide the computer, the acceleration layers, and pre-trained AI models.
Well, I think we have some robots that we'd like to welcome.
Here we go.
About my size.
And we have some friends to join us.
So the future of robotics is here, the next wave of AI.
And of course, Taiwan builds computers with keyboards.
You build computers for your pocket.
You build computers for data centers in the cloud.
In the future, you're going to build computers that walk and computers that roll around.
And so these are all just computers.
And as it turns out, the technology is very similar to the technology of building all of the other computers that you already built today.
So this is going to be a really extraordinary journey for us.