Subtitles section Play video Print subtitles Over the last year, everyone has been talking about: Generative AI. Generative AI. Generative AI. Generative AI. I'm like, "Wait, why am I doing this? I just wait for the AI to do it." Driving the boom are AI chips. Some no bigger than the size of your palm, and the demand for them has skyrocketed. We originally thought the total market for data center, AI accelerators would be about 150 billion, and now we think it's gonna be over 400 billion. As AI gains popularity, some of the world's tech titans are racing to design chips that run better and faster. Here's how they work and why tech companies are betting they're the future. This is "The Tech Behind AI Chips." This is Amazon's chip lab in Austin, Texas, where the company designs AI chips to use in AWS's servers. Right out of manufacturing, we get something that is called the wafer. Ron Diamant is the chief architect of Inferentia and Trainium, the company's custom AI chips. These are the compute elements or the components that actually perform the computation. Each of these rectangles, called dice, is a chip. Each die contains tens of billions of microscopic semiconductors called transistors that communicate inputs and outputs. Think about one millionth of a centimeter, that's roughly the size of each one of these transistors. All chips use semiconductors like this. What makes AI chips different from CPUs, the kind of chip that powers your computer or phone, is how they're packaged. Say, for example, you want to generate a new image of a cat. CPUs have a smaller number of powerful cores. The units that make up the chip that are good at doing a lot of different things, these cores process information sequentially. So one calculation after another. So to create a brand new image of a cat, it would only produce a couple pixels at a time. But an AI chip has more cores that run in parallel, so it can process hundreds or even thousands of those cat pixels all at once. These cores are smaller and typically do less than CPU cores, but are specially designed for running AI calculations. But those chips can't operate on their own. That compute die then gets integrated into a package, and that's what people typically think about when they think about the chip. Amazon makes two different AI chips, named for its two essential functions, training and inference. Training is where an AI model is set millions of examples of something, images of cats, for instance, to teach it what a cat is and what it looks like. Inference is when it uses that training to actually generate an original image of a cat. Training is the most difficult part of this process. We typically train not on one chip, but rather on tens of thousands of chips. In contrast, inference is typically done on 1 to 16 chips. Processing all of that information demands a lot of energy, which generates heat. And we're able to use this device here in order to force a certain temperature to the chip, and that's how we're able to test that the chip is reliable at very low temperatures and very high temperatures. To help keep chips cool, they're attached to heat sinks, pieces of metal with vents that help dissipate heat. Once they're packaged, the chips are integrated into servers for Amazon's AWS cloud. So the training cards will be mounted on this baseboard, eight of them in total, and they are interconnected between them at a very high bandwidth and low latency. So this allows the different training devices inside the server to work together on the same training job. So if you are interacting with an AI chatbot, your text, your question will hit the CPUs, and the CPUs will move the data into the Inferentia2 devices, which will collectively perform a gigantic computation. Basically performing the AI model, will respond to the CPU with the result, and the CPU will send the result back to you. Amazon's chips are just one type competing in this emerging market, which is currently dominated by the biggest chip designer, Nvidia. Nvidia is still a chip provider to all different types of customers who have to run different workloads. And then the next category of competitor that you have is the major cloud providers. Microsoft, Amazon AWS, and Google are all designing their own chips because they can optimize their computing workloads for the software that runs on their cloud to get a performance edge, and they don't have to give Nvidia its very juicy profit margin on the sale of every chip. -But right now, generative AI is still a young technology. It's mostly used in consumer-facing products like chatbots and image generators, but experts say that hype cycles around technology can pay off in the end. While there might be something like a dot-com bubble for the current AI hype cycle, at the end of the dot-com bubble was still the internet. And I think we're in a similar situation with generative AI. The technology's rapid advance means that chips and the software to use them are going to have to keep up. Amazon says it uses a mixture of its own chips and Nvidia's chips to give customers multiple options. Microsoft says it's following a similar model. For those cloud providers, the question is, how much of their computing workloads for AI is gonna be offered through Nvidia versus their own custom AI chips? And that's the battle that's playing out in corporate boardrooms all over the world. Amazon released a new version of Trainium in November. Diamant says he doesn't see the AI boom slowing down anytime soon. We've been investing in machine learning and artificial intelligence for almost two decades now, and we're just seeing a step-up in pace of innovation and capabilities that these models are enabling us. So our investment in AI chips is here to stay with a significant step-up in capabilities from generation to generation.
B1 ai narrator chip amazon generative nvidia Inside the Making of an AI Chip | WSJ Tech Behind 36342 305 林宜悉 posted on 2024/01/06 More Share Save Report Video vocabulary