Placeholder Image

Subtitles section Play video

  • In a previous video, we looked at how CPU's can use caches to speed up accesses to memory.

  • So, the CPU has to fetch things from memory; it might be a bit of data, it might be an instruction

  • And it goes through the cache to try and access it.

  • And the cache keeps a local copy in fast memory to try and speed up the accesses

  • But what we didn't talk about is: What does a CPU do with what it's fetched from memory

  • what is it actually doing and how does it process it?

  • So the CPU is fetching values from memory.

  • We'll ignore the cache for now, because it doesn't matter if the CPU has a cache or not

  • it's still gonna do roughly the same things

  • And we're also gonna look at very old CPU's

  • the sort of things that are in 8-bit machines

  • purely because they're simpler to deal with

  • and simpler to see what's going on

  • The same idea is still applied to an ARM CPU today or an X86 chip

  • or whatever it is you got in your machine.

  • Modern CPU's use what's called the Van Neumann architecture

  • and what this basically means is that you have a CPU

  • and you have a block of memory.

  • And that memory is connected to the CPU by two buses

  • Each is just a collection of several wires that are connecting

  • And again we're looking at old-fashioned macines. On a modern machine it gets a bit more complicated

  • But the idea, the principle, is the same.

  • So we have an addess bus

  • and the idea is that the CPU can generate a number in here in binary

  • to access any particular value in here.

  • So we say that the first one is at adress 0

  • and we're gonna use a 6502 as an example

  • We'll say that the last one is at address 65535 in decimal, or FFFF in hexadecimal

  • So we can generate any of these numbers on 16 bits of this address bus

  • to access any of the individual bytes in this memory

  • How do we get the data between the two? Well we have another bus

  • which is called the data bus, which connects the two together

  • Now the reason why this is a Van Neumann machine

  • is because this memory can contain both the program

  • i.e. the bytes that make up the instructions that the CPU can execute

  • and the data

  • So the same block of memory contain some bytes

  • which contain program instructions

  • some bytes which contain data

  • And the CPU if you wanted to could treat the program as data

  • or treat the data as program

  • Well if you do that then it would probably crash

  • So what we've got here is an old BBC Micro using a 6502 CPU

  • and we're gonna just write a very, very simple machine code program

  • that uses

  • well the operation is saying just to print out the letter C for computerphile

  • So if you assemble it, we're using hexadecimal

  • we've started our program at 084C

  • So that's the address, were our program is being created

  • And our program is very simple

  • It loads one of the CPU's registers

  • which is just basically a temporary data store that you can use

  • and this one is called the accumulator

  • with the ascii code 67 which represents a capital C

  • and then it says: jump to the subroutine at this address

  • which will print out that particular character

  • And then we tell it we want to stop so we gotta return

  • from subroutine. And if we run this

  • and type in the address, so we're at ... 84C

  • then you'll see that it prints out the letter C

  • and then we get a prompt to carry on doing things

  • So our program, we write it in assembly language

  • which we can understand as humans

  • -ish, LDA: Load Accumulator JSR: Jump to subroutine

  • RTS: Return to subroutine

  • You get the idea once you've done it a few times

  • And the computer converts this into a series of numbers, in binary

  • The CPU is working in binary but to make it easier to read we display it as hexadecimal

  • So our program becomes: A9, 43

  • 20 EE FF 60

  • That's the program we've written

  • And the CPU, when it runs it needs to fetch those bytes from memory

  • into the CPU

  • Now, how does it do that?

  • To get the first byte we need to put the address: 084C on the address bus

  • and a bit later on, the memory will send back the byte that represents the instruction: A9

  • Now, how does the CPU know where to get these instructions from?

  • Well, it's quite simple. Inside the CPU

  • there is a register, which we call the program counter, or PC on a 6502

  • or something like an X86 machine it's known as the instruction pointer.

  • And all that does is store the address to the next instruction to execute

  • So when we were starting up here, it would have 084C in it

  • That's the address to the instruction we want to execute

  • So when the CPU wants to fetch the instruction it's gonna execute

  • It puts that address on the address bus

  • and the memory then sends the instruction back to the CPU

  • So the first thing the CPU is gonna do to run our program

  • is to fetch the instruction

  • and the way it does that is by putting the address from

  • the program counter onto the address bus

  • and then fetching the actual instruction

  • So the memory provides it, but the CPU then reads that in

  • on it's input on the data bus

  • Now it needs to fetch the whole instruction that the CPU is gonna execute

  • and on the example we saw there it was relatively straightforward

  • because the instruction was only a byte long

  • Not all CPU's are that simple

  • Some CPU's will vary these things, so this hardware can actually be quite complicated

  • so it needs to work out how long the instruction is

  • So it could be as short as one byte

  • it could be as long on some CPU's as 15 bytes

  • and you sometimes don't know how long it's gonna be until you've read at few of the bytes

  • So this hardware can be relatively trivial

  • So an ARM CPU makes it very, very simple it says: all instructions are 32 bits long

  • So the Archimedes over there can fetch the instruction very, very simply

  • 32 bits

  • On something like an x86, it can be any length up to 15 bytes or so

  • and so this becomes more complicated, you have to sort of work out

  • what it is utnil you've got it

  • But we fetch the instruction

  • So in the example we've got, we've got A9 here

  • So we now need to work out what A9 does

  • Well, we need to decode it into what we want the CPU to actually do

  • So we need to have another bit of our CPU's hardware

  • which we're dedicating to decoding the instruction

  • So we have a part of the CPU which is fetching it

  • and part of the CPU which is then decoding it

  • So it gets A9 into it: So the A9 comes into the decode

  • And it says: Well okay, that's a load instruction.

  • So I need to fetch a value from memory

  • which was the 43

  • the ASCII code for the capital letter C that we saw earlier

  • So we need to fetch something else from memory

  • We need to access memory again, and we need to work out what address

  • that's gonna be.

  • We also then need to, once we've got that value,

  • update the right register to store that value

  • So we've gotta do things in sequence.

  • So part of the Decode logic is to take the single instruction byte,

  • or how long it is,

  • and work out what's the sequence that we need to drive the other bits of the CPU to do

  • And so that also means that we have another bit of the CPU

  • which is the actual bit that does things,

  • which is gonna be all the logic which actually executes instructions

  • So we start off by fetching it

  • and then once we've fetched it we can start decoding it

  • and then we can execute it

  • And the decode logic is responsible for saying:

  • Put the address for where you want to get the value, that you can load into memory from

  • and then store it, once it's been loaded into the CPU

  • So you're doing things in order:

  • We have to fetch it first

  • and we can't decode it until we've fetched it

  • and we can't execute things until we've decoded it

  • So, at any one time, we'll probably find on a simple CPU

  • that quite a few of the bits of the CPU wouldn't actually be doing anything

  • So, while we're fetching the value from memory

  • to work out how we're gonna decode it

  • the decode and the execute logic aren't doing anything

  • They're just sitting there, waiting for their turn

  • And then, when we decode it, it's not fetching anything

  • and it's not executing anything

  • So we're sort of moving through these different states one after the other

  • And that takes different amounts of time

  • If we're fetching 15 bytes it's gonna take longer than if we're fetching one

  • decoding it might well be shorter

  • than if we're fetching something from memory, cos' this is all inside the CPU

  • And the execution depends on what's actually happening

  • So your CPU will work like this: It will go through each phase,

  • then once it's done that, it'll start on the next clock tick

  • all the CPU's are synchronized to a clock,

  • which just keeps things moving in sequence

  • and you can build a CPU. Something like the 6502 worked like that

  • But, as we said, lots of the CPU aren't actually doing anything at any time

  • which is a bit wasteful of the resources

  • So is there another way you can do this?

  • And the answer is yes! You can do what's called

  • a sort of pipe-lined model of a CPU

  • So what you do here is, you still have the same 3 bits of the CPU

  • But you say: Okay, so we gotta fetch (f)

  • instruction one

  • In the next bit of time, I'm gonna start decoding this one

  • So, I'm gonna start decoding instruction one

  • But I'm gonna say: I'm not using the fetch logic here,

  • so I'm gonna have this start to get things ready

  • and, start to do things ahead of schedule

  • I'm also at the same time gonna fetch instruction 2

  • So now I'm doing two things, two bit's of my CPU in use the same time

  • I'm fetching the next instruction, while decoding the first one

  • And once we've done decoding, I can start executing the first instruction

  • So I execute that

  • But at the same time, I can start decoding instruction 2

  • and hopefully, I can start fetching instruction 3

  • So what? It is still taking the same amount of time to execute that first instruction

  • So the beauty is when it comes to executing instruction two

  • it completes exactly one cycle after the other

  • rather than having to wait for it to go through the fetch and decode and execute cycles

  • we can just execute it as soon as we've finished instruction one

  • So each instruction still takes the same amount of time

  • it's gonna take, say, three clock cycles to go through the CPU

  • but because we've sort of pipelined it together

  • they actually appear to execute one after each other

  • so it appears to execute one clock cycle after each other

  • And we could do this again So we could start decoding

  • instruction 3 here

  • at the same time as we're executing instruction two

  • Now there can be problems

  • This works for some instructions, but say this instruction

  • said "store this value in memory"

  • Now you've got a problem

  • You've only got one address bus and one data bus

  • so you can only access or store one thing in memory at a time

  • You can't execute a store instruction and fetch a value from memory

  • So you wouldn't be able to fetch it until the next clock cycle

  • So we fetch instruction four there

  • while executing instruction three

  • But we can't decode anything here

  • So in this clock cycle, we can decode instruction four

  • and fetch instruction five

  • but we can't execute anything

  • We've got what's called a "bubble" in our pipelines,

  • or pipeline store

  • because at this point, the design of the CPU doesn't let us

  • fetch an instruction

  • and execute an instruction at the same time

  • it's ... what is called "pipeline hazards"

  • that you can get when designing a pipeline CPU

  • because the design of the CPU doesn't let you

  • do the things you need to do at the same time

  • at the same time. So you have to

  • delay things, which means that you get a bubble

  • So, you can't quite get up to one instruction per cycle

  • efficiency

  • But you can certainly get closer

  • than you could if you just had everything

  • to do one instruction at a time.

In a previous video, we looked at how CPU's can use caches to speed up accesses to memory.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it