Subtitles section Play video Print subtitles [MUSIC PLAYING] KEVIN SATZINGER: Thank you. My name is Kevin Satzinger, and I'm here today to share with you our latest results on Quantum Supremacy: Benchmarking the Sycamore Processor. The promise of quantum computing is that it can solve certain useful problems that are simply beyond the reach of classical computing. And most of those applications are here in the useful error corrected machine, which is still years of research in the future. But we also hope that in the near term, in this blue region, we will be able to find useful applications. But, of course, before a quantum computer can do something useful that is intractable for classical computing, it must first do anything at all that is intractable for a classical computer. And so a real question that has been on our minds in this whole industry for the last decade is, can we cross this line? And this is something that our group has been focusing on for the past several years. It's something that was given a name by John Preskill in 2012. He called it quantum supremacy, to perform tasks with controlled quantum systems going beyond what can be accomplished with ordinary digital computers. He went on to ask, is controlling large-scale quantum systems merely really, really hard or is it ridiculously hard? Well, I am pleased to say that it is merely really, really hard. And we demonstrated this with the paper that we published last fall where we showed crossing this line for the first time. And looking at this slide, I see there's a lot of words on there. So maybe a picture would be a better way to reflect this. And I want to emphasize that this is a huge team effort. And I'm very thankful to each and every person, each and every member of this team who contributed to this work. And I'm honored to be here today representing them. You can find our paper on Nature. It's open access. And also at this archive link, we have updated versions of our supplementary information. At the centerpiece of this paper was a new processor called Sycamore. And it's positioned here right at this boundary between classically simulatable and beyond classically simulatable. And that's what I'm going to be presenting about today. First, the march from graduate school in the upper left toward industrial technology that can take on a supercomputer. And then how that plays into this research direction towards useful error corrected machine. For there is this talk, I'll be following this outline. So we'll look first at the Sycamore processor. Second, how to calibrate and benchmark Sycamore. And finally, quantum supremacy itself. So let's get started with Sycamore. And before we get to Sycamore, I want to flash back a few years to what the state of the art was in 2015. This is a chip from the Santa Barbara group. It's about a centimeter in size. And it has nine transmon qubits in a linear array. And what I'd like you to observe is that the control wiring takes up about half of the area of this chip. And when you look at this device, it is not at all obvious how we could scale it up to a two-dimensional array with 50-plus qubits. It's not a matter of copy/paste. We need to really re-engineer the system in order to make it scalable. And one of the key technologies that made it possible to make such a two-dimensional array was moving to a scalable flip-chip design, where we have two chips instead of one. So one chip, this top chip, will be solely responsible for a two-dimensional array of qubits. And the other chip will take care of all the readout and all the control wiring. And by dividing the responsibilities like this, we'll be able to have a more scalable design. One of the key technologies that makes this possible is these indium bumps, which provide a superconducting interconnect and a mechanical connection between the two chips. This is a photograph showing a small processor prototype that demonstrated this flip-chip technology. So there are four chips in this photograph. At the bottom is a qubit chip that has a small 2 by 3 array of superconducting qubits in the center. And the rest of the chip is just covered in those indium bumps. In the center is a separate chip that's responsible for readout and control and interfacing with the outside world. And what we do is we take the qubit chip and flip it over on top of the control chip, align them, and press them together. And that completed assembly is what's at the top of the photograph, where we have two chips stacked together and ready to use. Now, this is how you can make a quantum processor. But one of the lessons that I want to share with you is that there's a lot of hardware infrastructure that goes into making one of these experiments actually work. So let me share with you a couple of highlights. One is packaging. This is basically anything that goes between the quantum processor and our dilution refrigerator. And in our case, we have the processor embedded in a circuit board with superconducting metal. And the circuit board interfaces with about 150 coaxial connectors around the perimeter. We also encase the processor an electromagnetic shielding to protect it from the outside world. We then take this package and bolt it to the bottom of a dilution refrigerator, which will cool it down to about 10 millikelvin and also is responsible for delivering signals through a couple hundred coaxial cables that go from room temperature down into the device. Another key piece of hardware infrastructure to get these experiments to work is our electronics. These are room temperature electronics that generate the control signals for our processors. And we use custom scalable electronics. Here's an example of one of our boards. In the center, is a field programmable gate array that's responsible for driving eight digital-to-analog converters circled around it, which then drive these eight channels. These can output arbitrary waveforms from 0 to 400 megahertz. And we can also up-convert those with mixer to microwave frequencies like around 6 gigahertz. This is just one card. We can have many of these cards in a crate, like depicted here. And then several of these crates in racks working together in concert to control one of our processors. Now I'd like to turn to Sycamore itself. One of the key advances of Sycamore is that it's a new tunable coupling architecture. This is a new feature where we're able to turn the qubit interactions on and off at will. So the qubits can be independent of each other most of the time. But then when we want them to interact and get entanglement, we can turn on the coupling for a brief period of time. This, it turns out, was immensely helpful to making the full system work and was really a key breakthrough in order to get this processor to perform. We did this with a scalable two-dimensional architecture, where we introduce an extra coupler transmon between each pair of qubits where qubit itself is also a transmon. And this is depicted in this device schematic here, where we have 54 qubits and 88 couplers, one between each pair of qubits. I want to share with you a little bit of data showing how these couplers really work. So let's look at a simple experiment where we have two qubits next to each other at the same frequency. What we're going to do is excite one qubit and then have the two qubits interact for a period of time, the vertical axis, subject to a certain coupler bias, the horizontal axis. And let's look at the center first. So, in the center here, the coupler is at its maximum frequency. And so there is a few megahertz of coupling between the two qubits. And what happens is this photon leisurely swaps back and forth between the two qubits. That's what these oscillations are. But as we march to the right, as the couplers' frequency comes down, there is this divergence where there is no more swapping. This is where the coupling is turned off and the two qubits can act independently. This is a very valuable place. This is where we operate ordinarily. But sometimes you want the qubits to interact. And to do that, we'll post the coupler a little bit further to the right so that we have very strong coupling for a brief period of time between the two qubits. I'll end this section with this nice photograph of our Sycamore processor. And a nice symmetry that I could highlight is that the center chip here with the qubits is about a centimeter in size, which is the same size of the chip we looked at the beginning of this section. So now let's move on to calibration and benchmarking. Suppose that I handed you all of these electronics-- the fridge and cables, the packaging and the processor. It is not a trivial matter to turn all of that stuff into a quantum computer. And calibration is the process of learning how to execute quantum logic with our hardware so that we can go from all of our stuff to a system that can run an arbitrary quantum algorithm in the same way that you could play music on a finely tuned piano. But this is not a trivial task. There are around 100 parameters for each qubit. We need to choose a frequency for every qubit and coupler. And there are strong interactions between those frequencies and biases. And then we need to tune up all of the gates and readout for each qubit and pair of qubits. Now, if you have just a few qubits, you can park a graduate student in front of a computer for a couple of days, and they'll be able to work it out. But if you have 5o-plus qubits, you need a much more scalable solution. And in order to solve this problem, we encode our calibration routines into a graph. And this allows us to solve the problem using standard graph traversal algorithms. Pictured here is an example calibration sequence for two qubits. And this network is really a graph distilling decades of research from groups all around the world learning how to make these processors work. In this graph, each node represents a physics experiment, where we acquired data, and then the analysis that we use in order to figure out what the data says and decide what to do next. This is an example for two qubits, but there are literally thousands of these nodes when we want to calibrate Sycamore. So it's crucial that we use algorithms to work through this graph, to calibrate the device, and then maintain the calibrations. To give a flavor of how this works, we start on the left learning some device parameters. And as we work to the right, we iterate back and forth between single-qubit gates and readout until eventually we get around to two-qubit gates at the very end. A key step in setting up our device is choosing a frequency for each of the qubits. And we're going to kind of follow a two-step program here. First, we're going to measure the qubit lifetime as a function of frequency. This is an example of data set for one qubit from Sycamore. And then we're going to use that data and all the information we know about our device and how it performs in order to choose optimal frequencies for each qubit. This is a pretty rich subject, but let me just give you a flavor for how this precedes. For example, each qubit is least sensitive to control noise at its maximum frequency. So we might want to park all of the qubits at their maximum frequencies, although there is some variation in those frequencies across the device. But then if we consider pulse distortions, for a two-qubit gates, it's actually nicer if the qubits are close to their neighbors so they don't have to move very far in frequency in order to interact. So this will kind of smooth out this shape here so that the qubits will be close to their neighbors. But you don't want to be too close because there can be stray interactions, parasitic couplings between qubits and their neighbors and next nearest neighbors. And this might suggest kind of a checkerboard type pattern like we see here. And finally, we include the information from this qubit lifetime versus frequency data to find an optimum where we expect each qubit to have good performance. And solving this problem is not a trivial matter. It's very difficult for a human to solve it. But we have developed good heuristic algorithms that find us good solutions to this problem. Now, once we've set up all of our qubits at their desired frequencies, we need to actually carry out the calibrations. And I'm just going to share with you a couple of nice snapshots of what that calibration data looks like throughout the process. And an early crucial qubit experiment is Rabi oscillations, where we oscillate back and forth between the qubit zero state and one state. And this is very nice because it allows us to choose the initial amplitude for a pi pulse for our qubits. This is an example for one qubit. And here we plotted the data for all 53 qubits that we're measuring on this device. And we can see, although there is some diversity in these data sets, our algorithms can readily pick out this first oscillation that gives us one of our earliest calibrations. Another critical calibration is readout. So an example experiment here is readout clouds, where we prepare each qubit in the zero state and then measure the readout signal. That's the blue cloud. And then we do the same preparing the qubit in the one state using our pi pulse, and that's the red cloud. And we want separation between these two clouds in order to achieve single shot readout of the qubit state. This is an example for one qubit and across the full device. I'll mention-- I'm not going to go deeper into readout, but this is quite a sophisticated subject. And managing to achieve good readout on all the qubits simultaneously is very challenging. And even just benchmarking the readout for 50 qubits at the same time is not a trivial task, either. The final calibration data I'll share with you is for calibrating two-qubit gates. And I'll talk a little more about two-qubit gates in a moment. But basically, we want an iSwap between the two qubits, where we have a resonant exchange of the qubits' photons. So what we do is for each pair, we tune the coupler amplitude so that we have the correct amount of coupling between the two qubits, about 20 megahertz in this case, so that they'll completely swap a single photon. And that's that first maximum there. And here we've plotted 86 of these plots. It's one for each pair of qubits across the device. Now, I said I wanted to talk a little bit more about two-qubit gates on Sycamore. Our basic idea is to use fast resonant gates, where we pulse the coupling on for around 10 to 20 nanoseconds so that the two qubits can interact. And for two-transmon qubits, the natural interactions that will take place are an iSwap interaction and a conditional phase. And we can map the gates onto this two-dimensional space here with those two axes. And you can see some of your favorite familiar gates, like iSwap square root iSwap and CZ on this plot. Now, for quantum supremacy, we did something a little unusual. We did what we would call an iSwap-like gate on all of our pairs. This is the data that I was just showing you. And we actually determine the unitary specifically for each pair of qubits. But fear not, although this is a little unorthodox, you can take two of those iSwap-like gates and compile them into a CNOT, which is classic textbook gate, along with single qubit rotations. I'll mention as well, as a part of his PhD thesis, Brooks Foxen actually filled in this whole space for a pair of qubits and was able to execute an arbitrary gate in this space. And you can read about that at his paper here. Now, suppose we've tuned up all of our two-qubit gates and we want to benchmark them to figure out how well we're performing. There are many different techniques to do this, but I'm going to focus on one of those, which is cross-entropy benchmarking, because it closely relates to what we do in the quantum supremacy experiment. The first step is to choose a bunch of sequences that we're going to measure. And these are sequences of our two-qubit gate interleaved with randomly selected single-qubit gates. Like depicted here. We'll then take each of these sequences and run them on our quantum processor maybe a few thousand times each in order to compute these experimental probabilities associated with each of these sequences. Now, on the other hand, we can also use a classical computer in order to basically run the same circuit and come up with the theoretical expected probabilities. And those depend on the unitary that we're trying to run. But say we have an idea of what unitary we mean to run with our two-cubic gate. So we have two probability distributions. One that we measured with our experiment. And one that is computed. And we can compare these two probability distributions to estimate the fidelity, which is like how well are we executing the expected quantum gate. And this is where cross-entropy actually comes in because that's the metric we use to compare the two probability distributions. There's something special here. We can have a feedback loop between these two sections in order to find the unitary model that maximizes this fidelity. And when we do that, we're finding the two-qubit unitary that most precisely expresses what exactly we're doing in our two-cubic gate. So we're basically learning what is the two-qubit gate that we're actually doing. This is very useful because it helps us identify coherent errors. For example, suppose you were trying to do a CZ gate and it turns out that the optimal unitary actually has a little bit of swapping in it. This is telling you where your errors are coming from so that you can learn from that and make your calibrations better. In the case of quantum supremacy, we use this tool to identify what is the unitary that each qubit pair is actually running, and we're going to use that information when we do that experiment. There's another important detail here, which is simultaneous operations. And so, in addition to benchmarking each qubit one at a time or each pair one at a time, we get a benchmark all of the qubits operating at the same time and all of the pairs operating simultaneously, as well. This kind of simultaneous operation is essential to near-term applications because, because of noise, we have limited algorithm depth. It's also a necessity for our long-term plans for quantum error correction. And the key question here is, does the processor work when we're doing operations simultaneously? There is one other detail that I want to highlight here. You can't actually run all of the pairs simultaneously because each qubit participates in more than one pair. So we divvy them up into different layers that we can benchmark simultaneously. So let's look at some benchmarking results from Sycamore. And I'll start with the single-qubit gates. And we're plotting the distributions of errors across the device for 53 qubits on the left. So if we do the single-qubits one at a time, their gates have an error of about 0.15%, which is pretty good. And if we do them all simultaneously, it only modestly increases to 0.16%. This is great because it suggests that there are not stray interactions between the two qubits or crosstalk that would cause these gates to mess up. As we turn to the two-qubit gates, in the isolated case, we see an error of 0.36%, which is really outstanding. And in the simultaneous case, this modestly increases to about 0.6%, which we're really still quite proud of. And this increase in error could be attributed, for example, to stray interactions between pairs of qubits that are becoming slightly entangled with each other. We also have readout, which I won't delve into too deeply here, where we have a few percent error per qubit. I'll make a note for this simultaneous two-qubit gate benchmarks, the unitaries of the two-qubit gates actually change slightly when we run them all at once. And we can measure that and take it into account in our experiments. Instead of plotting the distributions, we can also plot these single- and two-qubit gate errors on the device layout itself. And we were really excited when we got this to work and got to see this for the first time. These are the errors associated with the simultaneous operations for each qubit. And also each coupler is colored in with the simultaneous error for each two-qubit gate. Now, this was basically looking at pairs one at a time, not having entanglement across the whole device. But an important thing that we want to do is to evaluate the performance of the full processor doing a realistic algorithm. And once you have 50-plus qubits, this technique of cross-entropy benchmarking we were using, where we were comparing probability distributions, is no longer tractable. And that's because of the simple fact that two to the 53 is about 10 to the 16 different amplitude or probabilities that we would be trying to deal with. And it is simply not possible to resolve all of those probabilities. So instead, we're going to reframe the question of cross-entropy benchmarking to, can we sample from the expected distribution? So we'll get samples, bit strings, and we'll compare them to the probabilities that we're expecting based on what we thought we were doing. In this case, we'll make some observations, like maybe a million or so bit strings for a circuit, and then we can take the bit strings that we actually observed and compute the expected probability, the theoretical probability that's associated with each one. And this allows us to estimate the fidelity using linear cross-entropy. That's what this equation is here. This is the only equation in our paper, I think. And this fidelity stated plainly is basically how often we sample high-probability bit strings. With that, let's move on to the final part of my presentation, which is quantum supremacy itself. So what we need, if we're trying to pit our quantum processor against a classical supercomputer, is a well-defined computational task. And the task that we chose is to sample the output of a pseudo-random quantum circuit. And the point here is that for a sufficiently large circuit, the classical computing cost of performing this task becomes prohibitively large. And to give you an idea of what these circuits look like, this is very similar to what we were doing before, but now we're getting entanglement across the whole device. So we have layers of randomly selected single-qubit gates and then layers of two-qubit gates. There is a bit of a problem here that you might be picking up on based on something that I said a couple of minutes ago. I said that we're going to observe a bunch of bit strings and then compute the ideal or expected theoretical probability associated with each one. So on the one hand, we want to beat classical computers. But on the other hand, this test seems to need them to check our fidelity to see if our quantum computer is doing the right thing. And we can look at that schematically with this cartoon, where we plot fidelity as a function of the computational difficulty for our quantum processor. You can think of that like the number of qubits, for example. And as we add in more qubits, this fidelity goes down because there are more opportunities for errors to happen. But we can still check our results and see what the fidelity is using this technique. But at some point, we reach this boundary where we're no longer able to check our work. So this is a problem, but we have a solution to it. And that is to control the difficulty of the circuits that we're executing. The most natural and obvious way to do this is by changing the size of the circuit. So a smaller circuit will be much easier to run classically. But there are some more subtle techniques that allow us to run large circuits and still check our work. One of them is to remove a few gates. Instead of running every single gate in the circuit, we take a cut down the middle of the device and remove just a handful of gates from the very beginning of the computation. This decreases the entanglement between the two halves of the device and makes it much easier to simulate or compute with a classical computer. But it's still essentially the same for a quantum processor because we've just removed of gates from the beginning. Another even more subtle way to change this difficulty is by changing the order in which we execute the gates. It turns out if instead of doing this hard sequence, we do this very subtly different sequence of gates, there is a slight simplification in the way that the entanglement propagates across the device that can be exploited in order to perform the classical computation more efficiently. So the idea is to use these circuits at the top as a proxy for our hardest circuits, which have all of the gates running in the difficult order. And what we'll be able to do is evaluate our performance with these circuits and then infer what our performance is on these, which are very similar from the point of view of the processor, the quantum processor. Now let's look at what this data should look like. So before we run the experiment, let's make a model for what we expect to happen. And we'll make a very simple model here where we say our fidelity at the end is going to be the product of the fidelity of each and every gate that we're going to execute times the fidelity of the readout at the end. So this is using the numbers that we were looking at in the previous section on benchmarking in order to estimate how well we expect the device to do. On the left of this plot, we have a dozen qubits. We're going around 14 cycles. So there are a couple hundred operations that we're doing. And the fidelity of the full circuit on the full system is about 40% is what we expect. And that's actually really good for hundreds of operations. As we move to the right, we add more and more qubits. So there are more opportunities for errors. And this fidelity will decrease. For example, at 50 qubits, it's at about 1%, which is still large enough that we can measure it reliably. So this is our model. This is what we expect to happen with a very simple model. Now let's look at some results from experimental data. And there are two data sets that I'm plotting here. One uses the simplified circuits, where we've removed a few gates from the very beginning of the circuit. And another uses the full circuit, where all the gates are present. But in this data, we're using the easy ordering of the gates so that there's kind of a backdoor so that the classical computer can still run this computation. What we see is a remarkable agreement between all three of these. First of all, our prediction, our very simple model, matches the data extremely well. And this is great because it means that our very simple error model is able to reflect the system performance going from a dozen qubits up to more than 50 qubits. Another crucial lesson here is that the simplified circuits and full circuits are also consistent with each other. And this makes sense because for the quantum processor, they're only slightly different. There's just a few gates missing for the simplified circuits out of perhaps 1,000 or more. But it's great to see experimentally that these agree with each other. One last note here, although these are the easy circuits, once we get to 53 qubits, it still becomes pretty computationally intensive for the classical computer to check our work. For example, that last red data point took about five hours to verify with a million cores on a supercomputer. But let's pull out all the stops now and use our hard circuits. And this is what that data looks like. You'll notice that there's something missing here. We no longer have the full circuits represented because although we took that data, we cannot process it. So we have our model prediction, whereas we increase the number of cycles, how many operations we do, the fidelity decreases. And we have the data points from our simplified circuits. Each of those data points represents 10 circuits, a total of 10 million samples, and took our quantum processor about 200 seconds in order to execute. And I can summarize the last data point. It has 53 qubits, 20 cycles, over 1,000 single-qubit gates, and 400 two-qubit gates. We predict from our model a fidelity of 0.2%, which is remarkably consistent with what we observe for our simplified circuit. I'll mention as well that this data is all freely available online at this URL. The last chapter in this story is the classical competition. And an important lesson here is that all known classical algorithms to perform this task require memory or runtime that is exponential in the number of qubits. In fact, they all require exponential runtime. And in order to test this and see how well we're performing compared to classical supercomputers, we actually ran benchmarks on several supercomputers like these pictured here. There are a couple of algorithms that I want to highlight that we considered. One is the Schrodinger algorithm. This is what you might come up with if you understand quantum mechanics and want to run a simulation of a quantum circuit. Basically, you put the full wave function into the memory of your computer and then do some linear algebra to do its unitary evolution. And this is pretty rough because it has a memory requirement that is exponential in the number of qubits. In this case, it would require about 65 petabytes in order to run this computation on 53 qubits, which exceeds the memory of any computer in the world. I'll also mention that the runtime is exponential in the number of qubits, although it is linear in the number of cycles. Another algorithm that we explored is the Schrodinger-Feynman algorithm. And this is a really nice algorithm because it allows us to trade off between the memory requirement space and the runtime. And instead of having the full system in memory, we just have subsystems in memory at any given time. And basically what we do is we see how much memory we have access to and use as much of it as we can in order to make the runtime as short as we can. And in the case of solving this particular computational problem, we're using around a petabyte of memory, which is a huge amount but something that's totally achievable for a supercomputer. But this trades off with the runtime. We estimate that to solve the same problem that our Sycamore processor did in 200 seconds would take thousands of years. And this runtime is also exponential now in the number of cycles that we perform. And let me emphasize here that these numbers, these estimates are based on real benchmarks that we performed using our quantum circuits, doing subsets of them on the Summit supercomputer. So let me conclude now. We looked first at the Sycamore processor and the crucial hardware infrastructure that makes these experiments possible. We then reviewed how to calibrate and benchmark this processor and compared it to the world's fastest supercomputers to demonstrate quantum supremacy. I want to thank one more time this huge team that worked so hard in order to make these results possible. And I'm really honored to be a member of this team and to share these results with you today. Thank you.
B2 qubits qubit quantum processor sycamore fidelity Quantum supremacy: Benchmarking the Sycamore processor (QuantumCasts) 3 0 林宜悉 posted on 2020/04/23 More Share Save Report Video vocabulary