Subtitles section Play video
There's been some noise over the past week about and a paper that's come out and an exploit the papers called port
contention for fun and profit people be referring it Port Smash. So what it does is it actually
you got open ssl running and it's using a private key and you've got another program which they call that spy program which runs alongside
It and is able to extract the private key from the open ssl program even though it shouldn't be able to do that
So I thought it was interesting to have a little chat about the way it's exploiting the cpu so again like
spectrum meltdown and quite a few of the exploits that have turned up over the past year its
exploiting the fact that people have tried to make the CPUs run faster and faster and sort of squeeze
every last ounce of speed out of the actual cpu technology that's there and
what this is specifically targeting is what's put into most intel cpus and AMD
which is hyper threading. So what is hyper threading well normally when we think about
a computer system we have a cpu in there and
originally that CPU would execute one single stream of instructions and process data with them
you could have two CPUs in there's got some
Multiprocessor system or a multi-core system depending on how you wire them up and then you could have two separate streams of instructions
being executed and
the way that those CPUs are designed is
you have three stages that each instruction has to sort of go through that in the cpu that's for them it's a smaller stage but
We can think about this of three broad stages we have to sort of fetch the instruction from memory
then we decode it to work out what we actually wanted to do and then we execute it and
To make the cpu run as fast as possible then you end up with various
execution units in your cpu which do various things there might be an algorithmic and logic unit which will do addition and subtraction and various
logical operations. There might be bits that can load and store
values from memory. There might be bits that can do various other sorts of calculations multiplications and so on address
calculations floating point operations vector
processing and so on so you have lots of these
execution units in your machine and one of the things you got was sort of a superscalar architecture where you'd
fetch two instructions and execute them at the same time
providing that they were using different parts that you could sort of fetch a value from memory while adding a value onto another
register as long as they're using separate registers and so on. So the idea is you've got if we sort of draw a
picture you've got some sort of logic here which we'll call decode and you've got going into that a stream of
instructions coming from memory. So you're feeding them in there and this is actually breaking them up into a series what
of what we call micro operations that do different things, so one
x86 instruction may get broken up into
multiple micro operations for example to load a value from memory add that value onto a value in a register and store that result
out back into the same memory location it's all three operations so it gets split so which use different
execution that operations. Some have to happen sequentially some can be done in parallel depending on what you're doing
So we end up with a series of execution operations - so let's say we've got an ALU and
We might have say a division unit in there
We might have another one with an ALU it might have some things to do - vector type stuff
we've got another one which has got another ALU and a multiplication unit on there and
there's various ports that these are connected to -- so you've got a sort of port
One here which connects to this set of operations
Port two will say here and this is a generalized version which is connected to these operations
Q:Are these physical ports like physical wires?
Erm they'll be parts with inside the CPU so the way that things are connected up... and this block is a sort of
scheduler which is getting the decoded
micro-ops from this section and
sort of sending them to the right ports as they're being ... as they're available and so on to cause the right operations to happen in
the best order to make most use of the system. You'd have a few more over here that says this has got a load port
And so on so what you can do is you can start pulling the multiple instructions here and as long as they're not depending on
values that previous instructions have created
and haven't completed yet then you can sort of schedule them on different
parts the unit - so if you had one
instruction which adds value one on to EAX you could put it on to this port the next insert is adding something onto B EBX
You could put it onto that port (they're registers within the CPU) and they could execute at the same time. But the problem
you've got is that
sometimes you get a sequence of instructions which either a
sequential so you add one to a value in a register then you multiply that register by two
And then so on - you've got to execute them and things and so you can't always make full use of
your
Available
execution units down here in the CPU
So the idea which happened many
many years ago and sort of fell out of favor and then was brought back with the Pentium 4 in the mid
2000s and has existed through on various CPUs both from AMD and
Intel is hyperthreading - you say well ok this is only a single core but let's make it present itself as
if it was two cores
Two logical cores we've got one physical core with one set of execution units but we have it appear to the operating system as two
logical cores so the operating system can have two - as far as its concerned two - independent bits programs threads whatever
Executing on there and so they'll be two streams of instructions executing and so we'd have another
stream of
instructions coming in to the decode logic and then
the CPUs got a better chance of keeping things running at the same time because you can either run an instruction from here
But if you can't schedule that it might be out of scheduled instruction from the other stream of instructions. You may get some interesting
things so for example on this one that we've drawn we've only got one
multiplier we've only got one load and store unit. If we have both of these trying to do a
multiply then one will have to wait for
the other to complete and the sort of way that CPU might do that it's a sort of round-robin that the first
clock cycle this one gets the multiply on the second clock cycle that one will get the multiply and so on. So that's the basic
idea behind hyperthreading - you've got two
logical processors that are used by the operations to schedule the jobs on your computer
but they're executed by one physical core on the CPU.
Q: So hyper threading is different to multi-threading?
So multi threading is the idea that you split your program or your programs into multiple threads of operation
and then they get scheduled either by the operating system on to different
CPU cores if you've got multiple ones or onto one single core by sort of executing a bit of
thread one than a bit of thread two you than a bit of thread three
effectively like you could watch multiple programs on YouTube at once by chopping between the different programs and watching sort of bits after the other
Be quite garbled watching multiple computer files in that sort of way. So unlike a normal photograph/In a very basic sense if you've got/
Bletchley Park/So that's a way of doing things in software and programming/yeah
It's/hyper threading is a bit more Hardware So the idea is there, okay well you've got these different threads of execution
okay if you've got multiple
Cores multiple processing units then you can schedule your each of those threads onto
Each of the cores and have them executing at the same time
but a few limitations on access to memory and things because and so on
With hyper threading you say okay we'll have the idea we got two
threads of execution
happening at the same time
But we've actually only got one physical set of units to do it so it's the hardware that's doing the scheduling because it can
do a finer grain than the operating system can. The operating system is still scheduling across those two
logical cores but the hardware can then say well actually
this one is trying to multiply this is trying to add I can run them at the same time
whereas this is trying to
Multiply and this is trying to multiply I need to sequence it so it can actually start to do a finer grain
sort of threading operation and sort of
knit them together
Q: So where's the problem come in then? So the problem comes in the
let's say we've got a program where we want to find some information about what it's doing and let's say this program here
we want to know what sort of instructions it's executing well what we could do for example
Is if we wanted to find out if it was executing multiply instructions on the example we've got here we've only got one
multiply unit so if this is
Trying to execute multiple instructions and this is trying to execute multiply instructions then they're going to have to take turns to execute
those multiply instruction on the other and if the one we're trying to find out on isn't executing multiply instructions then
This one will be able to execute multiple instructions one after the other so what the port smash paper have done is
that they've written their program that will
execute certain types of instructions in a loop so they have a repetition of about 64 let's say it's
these various different ones but so is the 64 add instructions to make use of all the ALUs on Intel CPU - there's four of
them that it can make use of
say just four
continuous adds we should all exceute at the same time if nothing else was running on that CPU and it times how long they
take to execute
It does that and it gets an idea of how long they take to execute and then you run the same thing at the same
Time as the other program is running and if it takes more time to execute
than the other program then you know that program must be also executing some add instructions and
So what you can do is by looking at which of these
bits are being used by running instructions then you can find out what type of instructions are being executed
on the other side
Now the reason why it's called port smash is because
We've drawn this a time one multiply it but that's also on the same part as an ALU
for example and what they actually do is that these are all connected to one
port of the scheduler within the CPU and so if we wanted to say use the multiply bit
of this CPU then we have to run out of port 2 which means the ALU on port 2 can't be used as well
can use one of the things in
this column same for example here if we want to use a divide we can't do any ALU processing or vector processing
so we could run instructions that we know will tie up one of these specific ports or will tie up a group of them and
Then we can see whether the other program providing we can get it scheduled onto the same physical execution unit which isn't
Impossible to do is also trying to use parts of the system on that point what the port smash
example program does is cleverly uses certain instructions which tie up a particular port on the
CPU core
To see whether that one is being used by the other program and by measuring the time we can see whether
That has been done so we've got this side channel where we can see
We can get insight into what the other process is doing as a black box we say ok it must be trying to execute this
type of instructions because it's interfering with our use of this port or it isn't
interfering with this use of this port. So what they do is that they run this alongside
OpenSSL doing its encryption of the task that's been set to do and it can measure what type of instructions it's trying to execute
What it ends up with is a series of
timing sequences that shows how long things are taking at particular points or sometimes it be running it full-speed some points it'll be running slower
and that gives it what they call a noisy signal which some signal processing they apply to it they can use to actually extract
the private key that was being used by open SSL purely from watching the timings that are going there. So what they've demonstrated is that
by running a program they can sort of monitor enough information because they can see what the other CPU is doing
by what their program is doing Ie if the other program is trying to multiply at the same time as they're trying to multiply and
there's only one multiply unit that it will slow both programs down and you can detect that
They can start to work out what operations the other program must be doing and then start to work out what that would
mean in terms of what that program is doing and backtrack from that to actually extract information that
ideally they shouldn't be able to access
So the upshot of this is that one of the recommendations is that perhaps in certain circumstances you might want to turn off
hyper-threading either completely and just go back to having four physical cores that only execute for separate threads rather than four physical cores
executing eight logical threads or the very least modify things so that the operating system has the ability to turn
hyper-threading on and off on each processor core
depending on what process is running on this because for some processes it doesn't matter and extracting information from it wouldn't be that
important but from others use of encryption programs you really don't want this sort of side channel there.
Q:Is this operating system specific
or is this
what's the deal there then?
It's not operating system specific it will be
CPU specific so the example they've got is for the Intel skylake in KB Lake
CPU families you could probably do something similar with other CPUs that implement hyper threading
You would have to calibrate your system depending on that but that's not a problem
It's not implementation specific you just have to tailor it to the machine are you looking at.
Q:Is it a practical thing
for hackers to do this? Is it easy or them to do?
The example codes there you can download it off github run it and Det run the demo on a Linux machine I don't have one
with the right sort of CPU here to
Demo it unfortunately there is potential to do this there are
limitations on what you can do with it you need to have your spy program running on the same physical core as the
Other program otherwise you won't have full access to the information
I'm sure in the right circumstances you could use it to get information out if it hasn't already been done, so
if we hit this
Boom it goes off and sets a few things up the screen goes black but if I switch back to my other one, I type
su again
it's logged me in as root and of course