Placeholder Image

Subtitles section Play video

  • During the mid 1960s a revolution in miniaturization was kick started.

  • The idea of packing dozens of semiconductor based transistors

  • on to a single silicon chip spawned the integrated circuit.

  • It laid the groundwork for a complete paradigm shift in how

  • modern society would evolve.

  • In less than a decade, this marvel of electronic engineering

  • and materials sciences would lead in an era of advancement

  • incomparable to anything else in human history.

  • In the March of 1971, the commercial launch of a new

  • semiconductor product set

  • the stage for this new era.

  • Composed of a then-incredible 2,300 transistors, the Intel 4004 central

  • processing unit or CPU was released.

  • Initially created as a custom solution to the Japanese company Busicom Corp.

  • for use in the Busicom 141-PF calculator, it was released later

  • that year to the general public.

  • With prophetic irony, the marketing material for

  • the chip touted the slogan

  • Announcing a new era in integrated electronics”.

  • But what made the Intel 4004 so groundbreaking?

  • Take calculator and solve any simply arithmetic operation,

  • let's say 22 divided by 7.

  • What we just did was issue a computer an instruction.

  • Instructions are elementally operations, such as math

  • commands that a CPU executes.

  • Every computer program ever made, from web browsers, to apps, to video

  • games is composed of millions of these instructions.

  • The 4004 was capable of executing between 46,250 to

  • 92,500 instructions per second.

  • For comparison, ENIAC, the first electronic computer built just 25

  • years earlier could only execute 5,000 instructions a second.

  • But what made the 4004 so powerful wasn't just its 1800%

  • increase in processing power - it only consumed 1 watt of

  • electricity, was about ¾” long and cost $5 to produce in today's money.

  • This was miles ahead of ENIAC's, cost of $5.5 million

  • in today's money, 180kW power consumption and 27 ton weight.

  • Fast forward to September 2017, the launch date of

  • the Intel Core i9-7980 XE.

  • This CPU is capable of performing over 80 billion instructions

  • a second, a 900,000 time increase in processing power.

  • What did it take to get here?

  • In this 2 part series we explore the engineering and behind the scenes

  • technology that paved the way for that simple 16 pin chip to evolve

  • in the powerhouse of CPUs today.

  • This is the evolution of processing power.

  • HOW A CPU STORES DATA

  • In order to understand how a CPU derives its processing power, let

  • examine what a CPU actually does and how it interfaces with data.

  • In digital electronics everything is represented by the binarybit”.

  • It's an elemental representation of two possible states.

  • A bit can represent a zero or one, true or false, up or down, on or

  • off, or any other bi-state value.

  • In a CPU, a “bitis physically transmitted as voltage levels.

  • If we combine multiplebitstogether in a group, we can now represent

  • more combinations of discrete states.

  • For example, if we combine eight bits together we form

  • what's known as a byte.

  • A byte can represent 256 different states and can be

  • used to represent numbers.

  • In the case of a byte, any number between 0 and 255 can be expressed.

  • But in a CPU, how we choose to represent data

  • is completely malleable.

  • That same byte, can also represent a number between -128 to 127.

  • Other expressions of that byte may be colors or levels of sound.

  • When we combine multiple bytes together, we create

  • what's known as a word.

  • Words are expressed in their bit capacity.

  • A 32-bit word contains 32-bits.

  • A 64-bit word contains 64 bits and so on.

  • When processors are created, the native word size it operates on

  • forms the core of its architecture.

  • The original Intel 4004 processor operated on a 4-bit word.

  • This means data moving through the CPU transits in

  • chunks of four bits at time.

  • Modern CPUs are typical 64-bit, however 32-bit processors

  • are still quite common.

  • By making use of larger word sizes we can represent more discrete states

  • and consequently larger numbers.

  • A 32-bit word for example, can represent up to 4.2

  • billion different states.

  • Of all the forms data can take inside of a CPU the most important

  • one is that of an instruction.

  • Instructions are unique bits of data, that are decoded and

  • executed by the CPU as operations.

  • An example of a common instruction would be to add two words

  • values together or move a word of data from one location in

  • memory to another location.

  • The entire list of instructions a CPU supports is called

  • its instruction set.

  • Each instruction's binary representation, its machine

  • code is typically assigned a human readable presentation

  • known as a assembly language.

  • If we look at the instruction set of most CPU's, they all tend to

  • focus around performing math or logical operations on data, testing

  • conditions or moving it from one location to another in memory.

  • For all intents and purposes, we can think of a CPU as an

  • instruction processing machine.

  • They operate by looping through three basic steps,

  • fetch, decode, and execute.

  • As CPU designs evolve these three step become dramatically more complicated

  • and technologies are implemented that extend this core model of operation.

  • But in order to fully appreciate these advances, let's first explore

  • the mechanics of basic CPU operation.

  • Known today as theclassic Reduced Instruction Set Computer

  • or [RISC] pipeline”, this paradigm formed the basis for the first CPU

  • designs, such as the Intel 4004.

  • In the fetch phase, the CPU loads the instruction it will

  • be executing into itself.

  • A CPU can be thought of as existing in an information bubble.

  • It pulls instructions and data from outside of itself, performs operations

  • within its own internal environment, and then returns data back.

  • This data is typically stored in memory external of the CPU called

  • Random Access Memory or [RAM].

  • Software instructions and data are loaded into RAM from

  • more permanent sources such as hard drives and flash memory.

  • But at one point in history magnetic tape, punch cards, and

  • even flip switches were used.

  • When a CPU loads a word of data it does it by requesting the

  • contents of a location in RAM.

  • This is called the data's address.

  • The amount of data a CPU can address at one time is determined

  • by its address capacity.

  • A 4 bit address for example, can only directly address 16 locations of data.

  • Mechanisms exist for addressing more data than the CPUs address capacity,

  • but let's ignore these for now.

  • The mechanism by which data moves back and forth to RAM is called a bus.

  • A bus can be thought of as a multi-lane highway between

  • the CPU and RAM is which each bit of data has its own lane.

  • But we also need to transmit the location of the data we're requesting,

  • so a second highway must be added to accommodate both the size of

  • the data word and the address word.

  • These are called the data bus and address bus respectively.

  • In practice these data and address lines are physical electrical

  • connections between the CPU and RAM and often look exactly like a

  • superhighway on a circuit board.

  • When a CPU makes a request for RAM access, a memory control

  • region of the CPU loads the address bus with the memory word

  • address it wishes to access.

  • It then triggers a control line that signals a memory read request.

  • Upon receiving this request the RAM fills the data bus with the contents

  • of the requested memory location.

  • The CPU now sees this data on the bus.

  • Writing data to RAM works in a similar manner, with CPU

  • posting to the data bus instead.

  • When the RAM received a “writesignal, the contents of the data

  • bus is written to the RAM location pointed to by the address bus.

  • The address of the memory location to fetch is stored in the CPU,

  • in a mechanism called a register.

  • A register is a high speed internal memory word that is used as a

  • notepadby CPU operations.

  • It's typically used as a temporary data store for instructions

  • but can also be assigned to vital CPU functions, such as

  • keeping track of the current address being accessed in RAM.

  • Because they are designed innately into the CPU's hardware, most

  • only have a handful of registers.

  • Their word size is generally coupled to the CPU's native architecture.

  • Once a word of memory is read into the CPU, the register that stores

  • the address of that words, known as a Program Counter is incremented.

  • On the next fetch, it retrieves the next instruction, in sequence.

  • Accessing data from RAM is typically the bottleneck of a CPUs operation.

  • This is due to the need to interface with components

  • physically distant from the CPU.

  • On older CPUs this doesn't present much of a problem, but as they

  • get faster the latency of memory access becomes a critical issue.

  • The mechanism of how this is handled is key to the advancement

  • of processor performance and will be examined in part 2 of this

  • series as we introduce cacheing.

  • Once an instruction is fetched the decode phase begins.

  • In classic RISC architecture, one word of memory forms

  • a complete instruction.

  • This changes to a more elaborate methods as CPUs evolve to complex

  • instruction set archicture, which will be introduced

  • in part 2 of this series.

  • When a instruction is decoded, the word is broken down into

  • two parts known as bitfields.

  • These are called an opcode and an operand.

  • A opcode is a unique series of bits that represent a specific

  • function within the CPU.

  • Opcodes generally instruct the CPU to move data to a register, move

  • data between a register and memory, perform math or logic functions

  • on a registers and branching.

  • Branching occurs when an instruction causes a change in

  • the program counter's address.

  • This causes the next fetch to occur at new location in memory as oppose

  • to the next sequential address.

  • When thisjumpto a new program location is guaranteed, it's

  • called an unconditional branch.

  • In other cases a test can be done to determine if a “jumpshould occur.

  • This is known as a conditional branch.

  • The tests that trigger these conditions are usually mathematical,

  • such is if a register or memory location is less than

  • or greater than a number, or if it is zero or non zero.

  • Branching allows program to make decisions and are

  • crucial to the power of a CPU.

  • Opcodes sometimes requires data to perform an its operation on.

  • This part of an instruction is called a operand.

  • Operands are bits piggy backed onto an instruction to be used as data.

  • Let say we wanted to add 5 to a register.

  • The binary representation of the number 5 would be embedded in the

  • instruction and extracted by the decoder for the addition operation.

  • When an instruction has an embedded constant of data within it,

  • its know as an immediate value.

  • In some instructions the operand does not specify the value it

  • self, but contains an address to a location in memory to be accessed.

  • This is common in opcodes that request a memory word

  • to be loaded into a register.

  • This is known as addressing, and can get far more

  • complicated in modern CPUs.

  • Addressing can result in a performance penalty because of the

  • need toleavethe CPU but this is mitigated as CPU design advance.

  • Once we have our opcode and operand the opcode is matched by means of a

  • table and a combination of circuiry where a control unit then configures

  • various operational sections of the CPU to perform the operation.

  • In some modern CPU's the decode phase isn't hardwired, and can be programed.

  • This allows for the changing in how instructions are decoded and the

  • CPU is configured for execution.

  • In the execution phase the now configured CPUs is triggered.

  • This may occur in a single step or a series of steps

  • depending on the opcode.

  • One of the most commonly used sections of a CPU in execution is

  • the Arithmetic Logic Unit or ALU.

  • This block of circuitry is designed to take in two operands and

  • perform either basic arithmetic or bitwise logical operations on them.

  • The result are then outputted along with respective mathematical

  • flags, such as a carry over, an overflow or a zero result.

  • The output of the ALU is then sent to either a register or a location

  • in memory based on the opcode.

  • Let's say an instruction calls for adding a 10 to a register and

  • placing the result in that register.

  • The control unit of the CPU will load the immediate value

  • of the instruction into the ALU,

  • load the value of the register into the ALU and connect the

  • ALU output to the register.

  • On the execute trigger the addition is done and the output

  • loaded into the register.

  • In effect, software distills down to a loop of configuring

  • groups of circuits to interact with each other within a CPU.

  • In a CPU these 3 phases of operation loop continuously, workings its

  • way through the instruction of the computer program loaded in memory.

  • Gluing this looping machine together is a clock.

  • A clock is a repeating pulse use to synchronize a CPU's internal

  • mechanics and its interface with external components.

  • CPU clock rate is measured by the number of pulses per second, or Hertz.

  • The Intel 4004 ran at 740 KHz or 740,000 pulses a second.

  • Modern CPUs can touch clock rates approaching 5GHz, or

  • 5 billion pulses a second.

  • On simpler CPUs a single clock triggers the advance of the

  • fetch, decode, and execute stage.

  • As CPUs get more sophisticated these stages can take several

  • clock cycles to complete.

  • Optimizing these stages and their use of clock cycles are key to

  • increasing processing power and will be discussed in part 2 of this series.

  • The throughput of a CPU, the amount of instructions that can be executed

  • a second determines howfastit is.

  • By increasing the clock rate, we can make a processor go

  • through its stages faster.

  • However as we get faster we encounter a new problem.

  • The period between clock cycles has to allow for enough time

  • for every possible instruction combination to execute.

  • If a new clock pulse happens before an instruction cycle

  • completes, results become unpredictable and the program fails.

  • Furthermore, increasing clock rates has the side effect of increasing

  • power dissipation and a buildup of heat in the CPU causing a

  • degradation of cirucity performance.

  • The battle to run CPU's faster and more efficiently has

  • dominated its entire existence.

  • In the next part of this series we'll explore the expansion of CPU designs

  • from that simple 2,300 transistor device of the 1970s, through the

  • microcomputing boom of the 1980s, and onward to the multi-million transistor

  • designs of the 90s and early 2000s.

  • We'll introduce the rise of pipelining technology, caching,

  • the move to larger bit CISC architecture, and charge forward

  • to multi GHz clock rates.

During the mid 1960s a revolution in miniaturization was kick started.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it