Quantum supremacy: Benchmarking the Sycamore processor (QuantumCasts)

Subtitles section Play video

[MUSIC PLAYING]
KEVIN SATZINGER: Thank you.
My name is Kevin Satzinger, and I'm here
today to share with you our latest
results on Quantum Supremacy: Benchmarking the Sycamore
Processor.
The promise of quantum computing is
that it can solve certain useful problems that
are simply beyond the reach of classical computing.
And most of those applications are here
in the useful error corrected machine,
which is still years of research in the future.
But we also hope that in the near term, in this blue region,
we will be able to find useful applications.
But, of course, before a quantum computer
can do something useful that is intractable for classical
computing, it must first do anything
at all that is intractable for a classical computer.
And so a real question that has been on our minds
in this whole industry for the last decade is,
can we cross this line?
And this is something that our group has been focusing on
for the past several years.
It's something that was given a name by John Preskill in 2012.
He called it quantum supremacy, to perform tasks
with controlled quantum systems going
beyond what can be accomplished with ordinary digital
computers.
He went on to ask, is controlling large-scale quantum
systems merely really, really hard
or is it ridiculously hard?
Well, I am pleased to say that it is merely really,
really hard.
And we demonstrated this with the paper
that we published last fall where we showed crossing
this line for the first time.
And looking at this slide, I see there's
a lot of words on there.
So maybe a picture would be a better way to reflect this.
And I want to emphasize that this is a huge team effort.
And I'm very thankful to each and every person, each
and every member of this team who contributed to this work.
And I'm honored to be here today representing them.
You can find our paper on Nature.
It's open access.
And also at this archive link, we
have updated versions of our supplementary information.
At the centerpiece of this paper was a new processor
called Sycamore.
And it's positioned here right at this boundary
between classically simulatable and
beyond classically simulatable.
And that's what I'm going to be presenting about today.
First, the march from graduate school in the upper
left toward industrial technology that
can take on a supercomputer.
And then how that plays into this research
direction towards useful error corrected machine.
For there is this talk, I'll be following this outline.
So we'll look first at the Sycamore processor.
Second, how to calibrate and benchmark Sycamore.
And finally, quantum supremacy itself.
So let's get started with Sycamore.
And before we get to Sycamore, I want
to flash back a few years to what the state of the art
was in 2015.
This is a chip from the Santa Barbara group.
It's about a centimeter in size.
And it has nine transmon qubits in a linear array.
And what I'd like you to observe is
that the control wiring takes up about half
of the area of this chip.
And when you look at this device,
it is not at all obvious how we could scale it up
to a two-dimensional array with 50-plus qubits.
It's not a matter of copy/paste.
We need to really re-engineer the system in order
to make it scalable.
And one of the key technologies that
made it possible to make such a two-dimensional array
was moving to a scalable flip-chip design, where we
have two chips instead of one.
So one chip, this top chip, will be solely responsible
for a two-dimensional array of qubits.
And the other chip will take care of all the readout and all
the control wiring.
And by dividing the responsibilities like this,
we'll be able to have a more scalable design.
One of the key technologies that makes this possible
is these indium bumps, which provide a superconducting
interconnect and a mechanical connection between the two
chips.
This is a photograph showing a small processor
prototype that demonstrated this flip-chip technology.
So there are four chips in this photograph.
At the bottom is a qubit chip that has a small 2 by 3
array of superconducting qubits in the center.
And the rest of the chip is just covered in those indium bumps.
In the center is a separate chip that's
responsible for readout and control
and interfacing with the outside world.
And what we do is we take the qubit chip
and flip it over on top of the control chip,
align them, and press them together.
And that completed assembly is what's
at the top of the photograph, where we have two chips stacked
together and ready to use.
Now, this is how you can make a quantum processor.
But one of the lessons that I want
to share with you is that there's a lot of hardware
infrastructure that goes into making one of these experiments
actually work.
So let me share with you a couple of highlights.
One is packaging.
This is basically anything that goes
between the quantum processor and our dilution refrigerator.
And in our case, we have the processor
embedded in a circuit board with superconducting metal.
And the circuit board interfaces with about 150
coaxial connectors around the perimeter.
We also encase the processor an electromagnetic
shielding to protect it from the outside world.
We then take this package and bolt it
to the bottom of a dilution refrigerator, which
will cool it down to about 10 millikelvin
and also is responsible for delivering signals
through a couple hundred coaxial cables
that go from room temperature down into the device.
Another key piece of hardware infrastructure
to get these experiments to work is our electronics.
These are room temperature electronics
that generate the control signals for our processors.
And we use custom scalable electronics.
Here's an example of one of our boards.
In the center, is a field programmable gate
array that's responsible for driving eight
digital-to-analog converters circled
around it, which then drive these eight channels.
These can output arbitrary waveforms from 0
to 400 megahertz.
And we can also up-convert those with mixer
to microwave frequencies like around 6 gigahertz.
This is just one card.
We can have many of these cards in a crate, like depicted here.
And then several of these crates in racks working together
in concert to control one of our processors.
Now I'd like to turn to Sycamore itself.
One of the key advances of Sycamore
is that it's a new tunable coupling architecture.
This is a new feature where we're
able to turn the qubit interactions on
and off at will.
So the qubits can be independent of each other most of the time.
But then when we want them to interact and get entanglement,
we can turn on the coupling for a brief period of time.
This, it turns out, was immensely
helpful to making the full system work
and was really a key breakthrough
in order to get this processor to perform.
We did this with a scalable two-dimensional architecture,
where we introduce an extra coupler
transmon between each pair of qubits where qubit itself
is also a transmon.
And this is depicted in this device schematic
here, where we have 54 qubits and 88 couplers,
one between each pair of qubits.
I want to share with you a little bit of data showing
how these couplers really work.
So let's look at a simple experiment
where we have two qubits next to each other
at the same frequency.
What we're going to do is excite one qubit
and then have the two qubits interact for a period of time,
the vertical axis, subject to a certain coupler
bias, the horizontal axis.
And let's look at the center first.
So, in the center here, the coupler
is at its maximum frequency.
And so there is a few megahertz of coupling between the two
qubits.
And what happens is this photon leisurely swaps back and forth
between the two qubits.
That's what these oscillations are.
But as we march to the right, as the couplers' frequency comes
down, there is this divergence where
there is no more swapping.
This is where the coupling is turned off
and the two qubits can act independently.
This is a very valuable place.
This is where we operate ordinarily.
But sometimes you want the qubits to interact.
And to do that, we'll post the coupler a little bit further
to the right so that we have very strong coupling
for a brief period of time between the two qubits.
I'll end this section with this nice photograph of our Sycamore
processor.
And a nice symmetry that I could highlight
is that the center chip here with the qubits
is about a centimeter in size, which
is the same size of the chip we looked
at the beginning of this section.
So now let's move on to calibration and benchmarking.
Suppose that I handed you all of these electronics-- the fridge
and cables, the packaging and the processor.
It is not a trivial matter to turn all of that stuff
into a quantum computer.
And calibration is the process of learning
how to execute quantum logic with our hardware
so that we can go from all of our stuff
to a system that can run an arbitrary quantum
algorithm in the same way that you could play music
on a finely tuned piano.
But this is not a trivial task.
There are around 100 parameters for each qubit.
We need to choose a frequency for every qubit and coupler.
And there are strong interactions
between those frequencies and biases.
And then we need to tune up all of the gates
and readout for each qubit and pair of qubits.
Now, if you have just a few qubits,
you can park a graduate student in front
of a computer for a couple of days,
and they'll be able to work it out.
But if you have 5o-plus qubits, you
need a much more scalable solution.
And in order to solve this problem,
we encode our calibration routines into a graph.
And this allows us to solve the problem using standard graph
traversal algorithms.
Pictured here is an example calibration sequence
for two qubits.
And this network is really a graph
distilling decades of research from groups
all around the world learning how
to make these processors work.
In this graph, each node represents
a physics experiment, where we acquired data, and then
the analysis that we use in order
to figure out what the data says and decide what to do next.
This is an example for two qubits,
but there are literally thousands of these nodes
when we want to calibrate Sycamore.
So it's crucial that we use algorithms
to work through this graph, to calibrate the device,
and then maintain the calibrations.
To give a flavor of how this works,
we start on the left learning some device parameters.
And as we work to the right, we iterate back and forth
between single-qubit gates and readout
until eventually we get around to two-qubit gates
at the very end.
A key step in setting up our device
is choosing a frequency for each of the qubits.
And we're going to kind of follow a two-step program here.
First, we're going to measure the qubit lifetime
as a function of frequency.
This is an example of data set for one qubit from Sycamore.
And then we're going to use that data and all the information
we know about our device and how it performs in order
to choose optimal frequencies for each qubit.
This is a pretty rich subject, but let me just give you
a flavor for how this precedes.
For example, each qubit is least sensitive to control noise
at its maximum frequency.
So we might want to park all of the qubits
at their maximum frequencies, although there
is some variation in those frequencies across the device.
But then if we consider pulse distortions,
for a two-qubit gates, it's actually nicer
if the qubits are close to their neighbors
so they don't have to move very far in frequency
in order to interact.
So this will kind of smooth out this shape
here so that the qubits will be close to their neighbors.
But you don't want to be too close because there
can be stray interactions, parasitic couplings
between qubits and their neighbors and next nearest
neighbors.
And this might suggest kind of a checkerboard type pattern
like we see here.
And finally, we include the information
from this qubit lifetime versus frequency data
to find an optimum where we expect each qubit to have
good performance.
And solving this problem is not a trivial matter.
It's very difficult for a human to solve it.
But we have developed good heuristic algorithms
that find us good solutions to this problem.
Now, once we've set up all of our qubits
at their desired frequencies, we need to actually carry out
the calibrations.
And I'm just going to share with you
a couple of nice snapshots of what that calibration
data looks like throughout the process.
And an early crucial qubit experiment
is Rabi oscillations, where we oscillate back and forth
between the qubit zero state and one state.
And this is very nice because it allows
us to choose the initial amplitude for a pi
pulse for our qubits.
This is an example for one qubit.
And here we plotted the data for all 53 qubits
that we're measuring on this device.
And we can see, although there is some diversity in these data
sets, our algorithms can readily pick out
this first oscillation that gives us one of our earliest
calibrations.
Another critical calibration is readout.
So an example experiment here is readout clouds,
where we prepare each qubit in the zero state
and then measure the readout signal.
That's the blue cloud.
And then we do the same preparing the qubit in the one
state using our pi pulse, and that's the red cloud.
And we want separation between these two clouds
in order to achieve single shot readout of the qubit state.
This is an example for one qubit and across the full device.
I'll mention-- I'm not going to go deeper into readout,
but this is quite a sophisticated subject.
And managing to achieve good readout on all the qubits
simultaneously is very challenging.
And even just benchmarking the readout
for 50 qubits at the same time is not a trivial task, either.
The final calibration data I'll share with you
is for calibrating two-qubit gates.
And I'll talk a little more about two-qubit gates
in a moment.
But basically, we want an iSwap between the two qubits,
where we have a resonant exchange of the qubits'
photons.
So what we do is for each pair, we
tune the coupler amplitude so that we have the correct amount
of coupling between the two qubits, about 20 megahertz
in this case, so that they'll completely
swap a single photon.
And that's that first maximum there.
And here we've plotted 86 of these plots.
It's one for each pair of qubits across the device.
Now, I said I wanted to talk a little bit more
about two-qubit gates on Sycamore.
Our basic idea is to use fast resonant gates,
where we pulse the coupling on for around 10 to 20 nanoseconds
so that the two qubits can interact.
And for two-transmon qubits, the natural interactions
that will take place are an iSwap interaction
and a conditional phase.
And we can map the gates onto this two-dimensional space
here with those two axes.
And you can see some of your favorite familiar gates,
like iSwap square root iSwap and CZ on this plot.
Now, for quantum supremacy, we did something a little unusual.
We did what we would call an iSwap-like gate
on all of our pairs.
This is the data that I was just showing you.
And we actually determine the unitary specifically
for each pair of qubits.
But fear not, although this is a little unorthodox,
you can take two of those iSwap-like gates
and compile them into a CNOT, which
is classic textbook gate, along with single qubit rotations.
I'll mention as well, as a part of his PhD thesis,
Brooks Foxen actually filled in this whole space
for a pair of qubits and was able to execute an arbitrary
gate in this space.
And you can read about that at his paper here.
Now, suppose we've tuned up all of our two-qubit gates
and we want to benchmark them to figure out
how well we're performing.
There are many different techniques to do this,
but I'm going to focus on one of those, which is cross-entropy
benchmarking, because it closely relates
to what we do in the quantum supremacy experiment.
The first step is to choose a bunch of sequences
that we're going to measure.
And these are sequences of our two-qubit gate interleaved
with randomly selected single-qubit gates.
Like depicted here.
We'll then take each of these sequences
and run them on our quantum processor
maybe a few thousand times each in order
to compute these experimental probabilities associated
with each of these sequences.
Now, on the other hand, we can also use a classical computer
in order to basically run the same circuit
and come up with the theoretical expected probabilities.
And those depend on the unitary that we're trying to run.
But say we have an idea of what unitary
we mean to run with our two-cubic gate.
So we have two probability distributions.
One that we measured with our experiment.
And one that is computed.
And we can compare these two probability distributions
to estimate the fidelity, which is
like how well are we executing the expected quantum gate.
And this is where cross-entropy actually
comes in because that's the metric we
use to compare the two probability distributions.
There's something special here.
We can have a feedback loop between these two sections
in order to find the unitary model that
maximizes this fidelity.
And when we do that, we're finding the two-qubit unitary
that most precisely expresses what exactly
we're doing in our two-cubic gate.
So we're basically learning what is the two-qubit gate
that we're actually doing.
This is very useful because it helps
us identify coherent errors.
For example, suppose you were trying to do a CZ gate
and it turns out that the optimal unitary actually has
a little bit of swapping in it.
This is telling you where your errors are coming from so
that you can learn from that and make your calibrations better.
In the case of quantum supremacy,
we use this tool to identify what
is the unitary that each qubit pair is actually running,
and we're going to use that information when
we do that experiment.
There's another important detail here,
which is simultaneous operations.
And so, in addition to benchmarking each qubit one
at a time or each pair one at a time,
we get a benchmark all of the qubits operating
at the same time and all of the pairs operating simultaneously,
as well.
This kind of simultaneous operation
is essential to near-term applications
because, because of noise, we have limited algorithm depth.
It's also a necessity for our long-term plans
for quantum error correction.
And the key question here is, does the processor
work when we're doing operations simultaneously?
There is one other detail that I want to highlight here.
You can't actually run all of the pairs
simultaneously because each qubit participates
in more than one pair.
So we divvy them up into different layers
that we can benchmark simultaneously.
So let's look at some benchmarking
results from Sycamore.
And I'll start with the single-qubit gates.
And we're plotting the distributions
of errors across the device for 53 qubits on the left.
So if we do the single-qubits one at a time,
their gates have an error of about 0.15%,
which is pretty good.
And if we do them all simultaneously,
it only modestly increases to 0.16%.
This is great because it suggests that there are not
stray interactions between the two qubits
or crosstalk that would cause these gates to mess up.
As we turn to the two-qubit gates, in the isolated case,
we see an error of 0.36%, which is really outstanding.
And in the simultaneous case, this modestly
increases to about 0.6%, which we're really
still quite proud of.
And this increase in error could be attributed, for example,
to stray interactions between pairs
of qubits that are becoming slightly
entangled with each other.
We also have readout, which I won't delve into too deeply
here, where we have a few percent error per qubit.
I'll make a note for this simultaneous two-qubit gate
benchmarks, the unitaries of the two-qubit gates
actually change slightly when we run them all at once.
And we can measure that and take it
into account in our experiments.
Instead of plotting the distributions,
we can also plot these single- and two-qubit gate errors
on the device layout itself.
And we were really excited when we got this to work
and got to see this for the first time.
These are the errors associated with the simultaneous
operations for each qubit.
And also each coupler is colored in with the simultaneous error
for each two-qubit gate.
Now, this was basically looking at pairs one at a time,
not having entanglement across the whole device.
But an important thing that we want to do
is to evaluate the performance of the full processor doing
a realistic algorithm.
And once you have 50-plus qubits,
this technique of cross-entropy benchmarking we were using,
where we were comparing probability distributions,
is no longer tractable.
And that's because of the simple fact that two to the 53
is about 10 to the 16 different amplitude
or probabilities that we would be trying to deal with.
And it is simply not possible to resolve
all of those probabilities.
So instead, we're going to reframe
the question of cross-entropy benchmarking to,
can we sample from the expected distribution?
So we'll get samples, bit strings,
and we'll compare them to the probabilities
that we're expecting based on what we thought we were doing.
In this case, we'll make some observations,
like maybe a million or so bit strings for a circuit,
and then we can take the bit strings that we actually
observed and compute the expected probability,
the theoretical probability that's
associated with each one.
And this allows us to estimate the fidelity using
linear cross-entropy.
That's what this equation is here.
This is the only equation in our paper, I think.
And this fidelity stated plainly is basically
how often we sample high-probability bit strings.
With that, let's move on to the final part
of my presentation, which is quantum supremacy itself.
So what we need, if we're trying to pit our quantum processor
against a classical supercomputer,
is a well-defined computational task.
And the task that we chose is to sample
the output of a pseudo-random quantum circuit.
And the point here is that for a sufficiently large circuit,
the classical computing cost of performing this task
becomes prohibitively large.
And to give you an idea of what these circuits look like,
this is very similar to what we were doing before,
but now we're getting entanglement
across the whole device.
So we have layers of randomly selected single-qubit gates
and then layers of two-qubit gates.
There is a bit of a problem here that you
might be picking up on based on something
that I said a couple of minutes ago.
I said that we're going to observe a bunch of bit strings
and then compute the ideal or expected
theoretical probability associated with each one.
So on the one hand, we want to beat classical computers.
But on the other hand, this test seems
to need them to check our fidelity
to see if our quantum computer is doing the right thing.
And we can look at that schematically
with this cartoon, where we plot fidelity
as a function of the computational difficulty
for our quantum processor.
You can think of that like the number of qubits, for example.
And as we add in more qubits, this fidelity
goes down because there are more opportunities
for errors to happen.
But we can still check our results
and see what the fidelity is using this technique.
But at some point, we reach this boundary
where we're no longer able to check our work.
So this is a problem, but we have a solution to it.
And that is to control the difficulty of the circuits
that we're executing.
The most natural and obvious way to do this
is by changing the size of the circuit.
So a smaller circuit will be much easier to run classically.
But there are some more subtle techniques
that allow us to run large circuits
and still check our work.
One of them is to remove a few gates.
Instead of running every single gate in the circuit,
we take a cut down the middle of the device
and remove just a handful of gates
from the very beginning of the computation.
This decreases the entanglement between the two halves
of the device and makes it much easier to simulate or compute
with a classical computer.
But it's still essentially the same for a quantum processor
because we've just removed of gates from the beginning.
Another even more subtle way to change this difficulty
is by changing the order in which we execute the gates.
It turns out if instead of doing this hard sequence,
we do this very subtly different sequence of gates,
there is a slight simplification in the way
that the entanglement propagates across the device that
can be exploited in order to perform
the classical computation more efficiently.
So the idea is to use these circuits
at the top as a proxy for our hardest circuits,
which have all of the gates running in the difficult order.
And what we'll be able to do is evaluate our performance
with these circuits and then infer
what our performance is on these, which
are very similar from the point of view
of the processor, the quantum processor.
Now let's look at what this data should look like.
So before we run the experiment, let's
make a model for what we expect to happen.
And we'll make a very simple model
here where we say our fidelity at the end
is going to be the product of the fidelity of each
and every gate that we're going to execute
times the fidelity of the readout at the end.
So this is using the numbers that we
were looking at in the previous section on benchmarking
in order to estimate how well we expect the device to do.
On the left of this plot, we have a dozen qubits.
We're going around 14 cycles.
So there are a couple hundred operations that we're doing.
And the fidelity of the full circuit on the full system
is about 40% is what we expect.
And that's actually really good for hundreds of operations.
As we move to the right, we add more and more qubits.
So there are more opportunities for errors.
And this fidelity will decrease.
For example, at 50 qubits, it's at about 1%,
which is still large enough that we can measure it reliably.
So this is our model.
This is what we expect to happen with a very simple model.
Now let's look at some results from experimental data.
And there are two data sets that I'm plotting here.
One uses the simplified circuits,
where we've removed a few gates from the very beginning
of the circuit.
And another uses the full circuit, where
all the gates are present.
But in this data, we're using the easy ordering of the gates
so that there's kind of a backdoor
so that the classical computer can still run this computation.
What we see is a remarkable agreement between all three
of these.
First of all, our prediction, our very simple model,
matches the data extremely well.
And this is great because it means
that our very simple error model is able to reflect the system
performance going from a dozen qubits up to more
than 50 qubits.
Another crucial lesson here is that the simplified circuits
and full circuits are also consistent with each other.
And this makes sense because for the quantum processor,
they're only slightly different.
There's just a few gates missing for the simplified circuits
out of perhaps 1,000 or more.
But it's great to see experimentally that these
agree with each other.
One last note here, although these are the easy circuits,
once we get to 53 qubits, it still
becomes pretty computationally intensive
for the classical computer to check our work.
For example, that last red data point
took about five hours to verify with a million cores
on a supercomputer.
But let's pull out all the stops now and use our hard circuits.
And this is what that data looks like.
You'll notice that there's something missing here.
We no longer have the full circuits represented
because although we took that data, we cannot process it.
So we have our model prediction, whereas we increase
the number of cycles, how many operations we do,
the fidelity decreases.
And we have the data points from our simplified circuits.
Each of those data points represents 10 circuits,
a total of 10 million samples, and took our quantum processor
about 200 seconds in order to execute.
And I can summarize the last data point.
It has 53 qubits, 20 cycles, over 1,000 single-qubit gates,
and 400 two-qubit gates.
We predict from our model a fidelity
of 0.2%, which is remarkably consistent with what we observe
for our simplified circuit.
I'll mention as well that this data is all freely available
online at this URL.
The last chapter in this story is the classical competition.
And an important lesson here is that all
known classical algorithms to perform this task
require memory or runtime that is
exponential in the number of qubits.
In fact, they all require exponential runtime.
And in order to test this and see how well we're performing
compared to classical supercomputers,
we actually ran benchmarks on several supercomputers
like these pictured here.
There are a couple of algorithms that I want to highlight
that we considered.
One is the Schrodinger algorithm.
This is what you might come up with if you understand quantum
mechanics and want to run a simulation of a quantum
circuit.
Basically, you put the full wave function
into the memory of your computer and then do some linear algebra
to do its unitary evolution.
And this is pretty rough because it
has a memory requirement that is exponential in the number
of qubits.
In this case, it would require about 65 petabytes
in order to run this computation on 53 qubits, which
exceeds the memory of any computer in the world.
I'll also mention that the runtime
is exponential in the number of qubits,
although it is linear in the number of cycles.
Another algorithm that we explored
is the Schrodinger-Feynman algorithm.
And this is a really nice algorithm
because it allows us to trade off
between the memory requirement space and the runtime.
And instead of having the full system in memory,
we just have subsystems in memory at any given time.
And basically what we do is we see how much memory we
have access to and use as much of it
as we can in order to make the runtime as short as we can.
And in the case of solving this particular computational
problem, we're using around a petabyte
of memory, which is a huge amount but something that's
totally achievable for a supercomputer.
But this trades off with the runtime.
We estimate that to solve the same problem
that our Sycamore processor did in 200 seconds
would take thousands of years.
And this runtime is also exponential now
in the number of cycles that we perform.
And let me emphasize here that these numbers, these estimates
are based on real benchmarks that we performed
using our quantum circuits, doing subsets of them
on the Summit supercomputer.
So let me conclude now.
We looked first at the Sycamore processor
and the crucial hardware infrastructure that makes
these experiments possible.
We then reviewed how to calibrate and benchmark
this processor and compared it to the world's fastest
supercomputers to demonstrate quantum supremacy.
I want to thank one more time this huge team
that worked so hard in order to make these results possible.
And I'm really honored to be a member of this team
and to share these results with you today.
Thank you.