Subtitles section Play video
In a previous video, we looked at how CPU's can use caches to speed up accesses to memory.
So, the CPU has to fetch things from memory; it might be a bit of data, it might be an instruction
And it goes through the cache to try and access it.
And the cache keeps a local copy in fast memory to try and speed up the accesses
But what we didn't talk about is: What does a CPU do with what it's fetched from memory
what is it actually doing and how does it process it?
So the CPU is fetching values from memory.
We'll ignore the cache for now, because it doesn't matter if the CPU has a cache or not
it's still gonna do roughly the same things
And we're also gonna look at very old CPU's
the sort of things that are in 8-bit machines
purely because they're simpler to deal with
and simpler to see what's going on
The same idea is still applied to an ARM CPU today or an X86 chip
or whatever it is you got in your machine.
Modern CPU's use what's called the Van Neumann architecture
and what this basically means is that you have a CPU
and you have a block of memory.
And that memory is connected to the CPU by two buses
Each is just a collection of several wires that are connecting
And again we're looking at old-fashioned macines. On a modern machine it gets a bit more complicated
But the idea, the principle, is the same.
So we have an addess bus
and the idea is that the CPU can generate a number in here in binary
to access any particular value in here.
So we say that the first one is at adress 0
and we're gonna use a 6502 as an example
We'll say that the last one is at address 65535 in decimal, or FFFF in hexadecimal
So we can generate any of these numbers on 16 bits of this address bus
to access any of the individual bytes in this memory
How do we get the data between the two? Well we have another bus
which is called the data bus, which connects the two together
Now the reason why this is a Van Neumann machine
is because this memory can contain both the program
i.e. the bytes that make up the instructions that the CPU can execute
and the data
So the same block of memory contain some bytes
which contain program instructions
some bytes which contain data
And the CPU if you wanted to could treat the program as data
or treat the data as program
Well if you do that then it would probably crash
So what we've got here is an old BBC Micro using a 6502 CPU
and we're gonna just write a very, very simple machine code program
that uses
well the operation is saying just to print out the letter C for computerphile
So if you assemble it, we're using hexadecimal
we've started our program at 084C
So that's the address, were our program is being created
And our program is very simple
It loads one of the CPU's registers
which is just basically a temporary data store that you can use
and this one is called the accumulator
with the ascii code 67 which represents a capital C
and then it says: jump to the subroutine at this address
which will print out that particular character
And then we tell it we want to stop so we gotta return
from subroutine. And if we run this
and type in the address, so we're at ... 84C
then you'll see that it prints out the letter C
and then we get a prompt to carry on doing things
So our program, we write it in assembly language
which we can understand as humans
-ish, LDA: Load Accumulator JSR: Jump to subroutine
RTS: Return to subroutine
You get the idea once you've done it a few times
And the computer converts this into a series of numbers, in binary
The CPU is working in binary but to make it easier to read we display it as hexadecimal
So our program becomes: A9, 43
20 EE FF 60
That's the program we've written
And the CPU, when it runs it needs to fetch those bytes from memory
into the CPU
Now, how does it do that?
To get the first byte we need to put the address: 084C on the address bus
and a bit later on, the memory will send back the byte that represents the instruction: A9
Now, how does the CPU know where to get these instructions from?
Well, it's quite simple. Inside the CPU
there is a register, which we call the program counter, or PC on a 6502
or something like an X86 machine it's known as the instruction pointer.
And all that does is store the address to the next instruction to execute
So when we were starting up here, it would have 084C in it
That's the address to the instruction we want to execute
So when the CPU wants to fetch the instruction it's gonna execute
It puts that address on the address bus
and the memory then sends the instruction back to the CPU
So the first thing the CPU is gonna do to run our program
is to fetch the instruction
and the way it does that is by putting the address from
the program counter onto the address bus
and then fetching the actual instruction
So the memory provides it, but the CPU then reads that in
on it's input on the data bus
Now it needs to fetch the whole instruction that the CPU is gonna execute
and on the example we saw there it was relatively straightforward
because the instruction was only a byte long
Not all CPU's are that simple
Some CPU's will vary these things, so this hardware can actually be quite complicated
so it needs to work out how long the instruction is
So it could be as short as one byte
it could be as long on some CPU's as 15 bytes
and you sometimes don't know how long it's gonna be until you've read at few of the bytes
So this hardware can be relatively trivial
So an ARM CPU makes it very, very simple it says: all instructions are 32 bits long
So the Archimedes over there can fetch the instruction very, very simply
32 bits
On something like an x86, it can be any length up to 15 bytes or so
and so this becomes more complicated, you have to sort of work out
what it is utnil you've got it
But we fetch the instruction
So in the example we've got, we've got A9 here
So we now need to work out what A9 does
Well, we need to decode it into what we want the CPU to actually do
So we need to have another bit of our CPU's hardware
which we're dedicating to decoding the instruction
So we have a part of the CPU which is fetching it
and part of the CPU which is then decoding it
So it gets A9 into it: So the A9 comes into the decode
And it says: Well okay, that's a load instruction.
So I need to fetch a value from memory
which was the 43
the ASCII code for the capital letter C that we saw earlier
So we need to fetch something else from memory
We need to access memory again, and we need to work out what address
that's gonna be.
We also then need to, once we've got that value,
update the right register to store that value
So we've gotta do things in sequence.
So part of the Decode logic is to take the single instruction byte,
or how long it is,
and work out what's the sequence that we need to drive the other bits of the CPU to do
And so that also means that we have another bit of the CPU
which is the actual bit that does things,
which is gonna be all the logic which actually executes instructions
So we start off by fetching it
and then once we've fetched it we can start decoding it
and then we can execute it
And the decode logic is responsible for saying:
Put the address for where you want to get the value, that you can load into memory from
and then store it, once it's been loaded into the CPU
So you're doing things in order:
We have to fetch it first
and we can't decode it until we've fetched it
and we can't execute things until we've decoded it
So, at any one time, we'll probably find on a simple CPU
that quite a few of the bits of the CPU wouldn't actually be doing anything
So, while we're fetching the value from memory
to work out how we're gonna decode it
the decode and the execute logic aren't doing anything
They're just sitting there, waiting for their turn
And then, when we decode it, it's not fetching anything
and it's not executing anything
So we're sort of moving through these different states one after the other
And that takes different amounts of time
If we're fetching 15 bytes it's gonna take longer than if we're fetching one
decoding it might well be shorter
than if we're fetching something from memory, cos' this is all inside the CPU
And the execution depends on what's actually happening
So your CPU will work like this: It will go through each phase,
then once it's done that, it'll start on the next clock tick
all the CPU's are synchronized to a clock,
which just keeps things moving in sequence
and you can build a CPU. Something like the 6502 worked like that
But, as we said, lots of the CPU aren't actually doing anything at any time
which is a bit wasteful of the resources
So is there another way you can do this?
And the answer is yes! You can do what's called
a sort of pipe-lined model of a CPU
So what you do here is, you still have the same 3 bits of the CPU
But you say: Okay, so we gotta fetch (f)
instruction one
In the next bit of time, I'm gonna start decoding this one
So, I'm gonna start decoding instruction one
But I'm gonna say: I'm not using the fetch logic here,
so I'm gonna have this start to get things ready
and, start to do things ahead of schedule
I'm also at the same time gonna fetch instruction 2
So now I'm doing two things, two bit's of my CPU in use the same time
I'm fetching the next instruction, while decoding the first one
And once we've done decoding, I can start executing the first instruction
So I execute that
But at the same time, I can start decoding instruction 2
and hopefully, I can start fetching instruction 3
So what? It is still taking the same amount of time to execute that first instruction
So the beauty is when it comes to executing instruction two
it completes exactly one cycle after the other
rather than having to wait for it to go through the fetch and decode and execute cycles
we can just execute it as soon as we've finished instruction one
So each instruction still takes the same amount of time
it's gonna take, say, three clock cycles to go through the CPU
but because we've sort of pipelined it together
they actually appear to execute one after each other
so it appears to execute one clock cycle after each other
And we could do this again So we could start decoding
instruction 3 here
at the same time as we're executing instruction two
Now there can be problems
This works for some instructions, but say this instruction
said "store this value in memory"
Now you've got a problem
You've only got one address bus and one data bus
so you can only access or store one thing in memory at a time
You can't execute a store instruction and fetch a value from memory
So you wouldn't be able to fetch it until the next clock cycle
So we fetch instruction four there
while executing instruction three
But we can't decode anything here
So in this clock cycle, we can decode instruction four
and fetch instruction five
but we can't execute anything
We've got what's called a "bubble" in our pipelines,
or pipeline store
because at this point, the design of the CPU doesn't let us
fetch an instruction
and execute an instruction at the same time
it's ... what is called "pipeline hazards"
that you can get when designing a pipeline CPU
because the design of the CPU doesn't let you
do the things you need to do at the same time
at the same time. So you have to
delay things, which means that you get a bubble
So, you can't quite get up to one instruction per cycle
efficiency
But you can certainly get closer
than you could if you just had everything
to do one instruction at a time.