Subtitles section Play video Print subtitles There's been some noise over the past week about and a paper that's come out and an exploit the papers called port contention for fun and profit people be referring it Port Smash. So what it does is it actually you got open ssl running and it's using a private key and you've got another program which they call that spy program which runs alongside It and is able to extract the private key from the open ssl program even though it shouldn't be able to do that So I thought it was interesting to have a little chat about the way it's exploiting the cpu so again like spectrum meltdown and quite a few of the exploits that have turned up over the past year its exploiting the fact that people have tried to make the CPUs run faster and faster and sort of squeeze every last ounce of speed out of the actual cpu technology that's there and what this is specifically targeting is what's put into most intel cpus and AMD which is hyper threading. So what is hyper threading well normally when we think about a computer system we have a cpu in there and originally that CPU would execute one single stream of instructions and process data with them you could have two CPUs in there's got some Multiprocessor system or a multi-core system depending on how you wire them up and then you could have two separate streams of instructions being executed and the way that those CPUs are designed is you have three stages that each instruction has to sort of go through that in the cpu that's for them it's a smaller stage but We can think about this of three broad stages we have to sort of fetch the instruction from memory then we decode it to work out what we actually wanted to do and then we execute it and To make the cpu run as fast as possible then you end up with various execution units in your cpu which do various things there might be an algorithmic and logic unit which will do addition and subtraction and various logical operations. There might be bits that can load and store values from memory. There might be bits that can do various other sorts of calculations multiplications and so on address calculations floating point operations vector processing and so on so you have lots of these execution units in your machine and one of the things you got was sort of a superscalar architecture where you'd fetch two instructions and execute them at the same time providing that they were using different parts that you could sort of fetch a value from memory while adding a value onto another register as long as they're using separate registers and so on. So the idea is you've got if we sort of draw a picture you've got some sort of logic here which we'll call decode and you've got going into that a stream of instructions coming from memory. So you're feeding them in there and this is actually breaking them up into a series what of what we call micro operations that do different things, so one x86 instruction may get broken up into multiple micro operations for example to load a value from memory add that value onto a value in a register and store that result out back into the same memory location it's all three operations so it gets split so which use different execution that operations. Some have to happen sequentially some can be done in parallel depending on what you're doing So we end up with a series of execution operations - so let's say we've got an ALU and We might have say a division unit in there We might have another one with an ALU it might have some things to do - vector type stuff we've got another one which has got another ALU and a multiplication unit on there and there's various ports that these are connected to -- so you've got a sort of port One here which connects to this set of operations Port two will say here and this is a generalized version which is connected to these operations Q:Are these physical ports like physical wires? Erm they'll be parts with inside the CPU so the way that things are connected up... and this block is a sort of scheduler which is getting the decoded micro-ops from this section and sort of sending them to the right ports as they're being ... as they're available and so on to cause the right operations to happen in the best order to make most use of the system. You'd have a few more over here that says this has got a load port And so on so what you can do is you can start pulling the multiple instructions here and as long as they're not depending on values that previous instructions have created and haven't completed yet then you can sort of schedule them on different parts the unit - so if you had one instruction which adds value one on to EAX you could put it on to this port the next insert is adding something onto B EBX You could put it onto that port (they're registers within the CPU) and they could execute at the same time. But the problem you've got is that sometimes you get a sequence of instructions which either a sequential so you add one to a value in a register then you multiply that register by two And then so on - you've got to execute them and things and so you can't always make full use of your Available execution units down here in the CPU So the idea which happened many many years ago and sort of fell out of favor and then was brought back with the Pentium 4 in the mid 2000s and has existed through on various CPUs both from AMD and Intel is hyperthreading - you say well ok this is only a single core but let's make it present itself as if it was two cores Two logical cores we've got one physical core with one set of execution units but we have it appear to the operating system as two logical cores so the operating system can have two - as far as its concerned two - independent bits programs threads whatever Executing on there and so they'll be two streams of instructions executing and so we'd have another stream of instructions coming in to the decode logic and then the CPUs got a better chance of keeping things running at the same time because you can either run an instruction from here But if you can't schedule that it might be out of scheduled instruction from the other stream of instructions. You may get some interesting things so for example on this one that we've drawn we've only got one multiplier we've only got one load and store unit. If we have both of these trying to do a multiply then one will have to wait for the other to complete and the sort of way that CPU might do that it's a sort of round-robin that the first clock cycle this one gets the multiply on the second clock cycle that one will get the multiply and so on. So that's the basic idea behind hyperthreading - you've got two logical processors that are used by the operations to schedule the jobs on your computer but they're executed by one physical core on the CPU. Q: So hyper threading is different to multi-threading? So multi threading is the idea that you split your program or your programs into multiple threads of operation and then they get scheduled either by the operating system on to different CPU cores if you've got multiple ones or onto one single core by sort of executing a bit of thread one than a bit of thread two you than a bit of thread three effectively like you could watch multiple programs on YouTube at once by chopping between the different programs and watching sort of bits after the other Be quite garbled watching multiple computer files in that sort of way. So unlike a normal photograph/In a very basic sense if you've got/ Bletchley Park/So that's a way of doing things in software and programming/yeah It's/hyper threading is a bit more Hardware So the idea is there, okay well you've got these different threads of execution okay if you've got multiple Cores multiple processing units then you can schedule your each of those threads onto Each of the cores and have them executing at the same time but a few limitations on access to memory and things because and so on With hyper threading you say okay we'll have the idea we got two threads of execution happening at the same time But we've actually only got one physical set of units to do it so it's the hardware that's doing the scheduling because it can do a finer grain than the operating system can. The operating system is still scheduling across those two logical cores but the hardware can then say well actually this one is trying to multiply this is trying to add I can run them at the same time whereas this is trying to Multiply and this is trying to multiply I need to sequence it so it can actually start to do a finer grain sort of threading operation and sort of knit them together Q: So where's the problem come in then? So the problem comes in the let's say we've got a program where we want to find some information about what it's doing and let's say this program here we want to know what sort of instructions it's executing well what we could do for example Is if we wanted to find out if it was executing multiply instructions on the example we've got here we've only got one multiply unit so if this is Trying to execute multiple instructions and this is trying to execute multiply instructions then they're going to have to take turns to execute those multiply instruction on the other and if the one we're trying to find out on isn't executing multiply instructions then This one will be able to execute multiple instructions one after the other so what the port smash paper have done is that they've written their program that will execute certain types of instructions in a loop so they have a repetition of about 64 let's say it's these various different ones but so is the 64 add instructions to make use of all the ALUs on Intel CPU - there's four of them that it can make use of say just four continuous adds we should all exceute at the same time if nothing else was running on that CPU and it times how long they take to execute It does that and it gets an idea of how long they take to execute and then you run the same thing at the same Time as the other program is running and if it takes more time to execute than the other program then you know that program must be also executing some add instructions and So what you can do is by looking at which of these bits are being used by running instructions then you can find out what type of instructions are being executed on the other side Now the reason why it's called port smash is because We've drawn this a time one multiply it but that's also on the same part as an ALU for example and what they actually do is that these are all connected to one port of the scheduler within the CPU and so if we wanted to say use the multiply bit of this CPU then we have to run out of port 2 which means the ALU on port 2 can't be used as well can use one of the things in this column same for example here if we want to use a divide we can't do any ALU processing or vector processing so we could run instructions that we know will tie up one of these specific ports or will tie up a group of them and Then we can see whether the other program providing we can get it scheduled onto the same physical execution unit which isn't Impossible to do is also trying to use parts of the system on that point what the port smash example program does is cleverly uses certain instructions which tie up a particular port on the CPU core To see whether that one is being used by the other program and by measuring the time we can see whether That has been done so we've got this side channel where we can see We can get insight into what the other process is doing as a black box we say ok it must be trying to execute this type of instructions because it's interfering with our use of this port or it isn't interfering with this use of this port. So what they do is that they run this alongside OpenSSL doing its encryption of the task that's been set to do and it can measure what type of instructions it's trying to execute What it ends up with is a series of timing sequences that shows how long things are taking at particular points or sometimes it be running it full-speed some points it'll be running slower and that gives it what they call a noisy signal which some signal processing they apply to it they can use to actually extract the private key that was being used by open SSL purely from watching the timings that are going there. So what they've demonstrated is that by running a program they can sort of monitor enough information because they can see what the other CPU is doing by what their program is doing Ie if the other program is trying to multiply at the same time as they're trying to multiply and there's only one multiply unit that it will slow both programs down and you can detect that They can start to work out what operations the other program must be doing and then start to work out what that would mean in terms of what that program is doing and backtrack from that to actually extract information that ideally they shouldn't be able to access So the upshot of this is that one of the recommendations is that perhaps in certain circumstances you might want to turn off hyper-threading either completely and just go back to having four physical cores that only execute for separate threads rather than four physical cores executing eight logical threads or the very least modify things so that the operating system has the ability to turn hyper-threading on and off on each processor core depending on what process is running on this because for some processes it doesn't matter and extracting information from it wouldn't be that important but from others use of encryption programs you really don't want this sort of side channel there. Q:Is this operating system specific or is this what's the deal there then? It's not operating system specific it will be CPU specific so the example they've got is for the Intel skylake in KB Lake CPU families you could probably do something similar with other CPUs that implement hyper threading You would have to calibrate your system depending on that but that's not a problem It's not implementation specific you just have to tailor it to the machine are you looking at. Q:Is it a practical thing for hackers to do this? Is it easy or them to do? The example codes there you can download it off github run it and Det run the demo on a Linux machine I don't have one with the right sort of CPU here to Demo it unfortunately there is potential to do this there are limitations on what you can do with it you need to have your spy program running on the same physical core as the Other program otherwise you won't have full access to the information I'm sure in the right circumstances you could use it to get information out if it hasn't already been done, so if we hit this Boom it goes off and sets a few things up the screen goes black but if I switch back to my other one, I type su again it's logged me in as root and of course
B1 cpu threading port execute multiply hyper threading What's Behind Port Smash? - Computerphile 2 0 林宜悉 posted on 2020/03/27 More Share Save Report Video vocabulary