Subtitles section Play video Print subtitles During the mid 1960s a revolution in miniaturization was kick started. The idea of packing dozens of semiconductor based transistors on to a single silicon chip spawned the integrated circuit. It laid the groundwork for a complete paradigm shift in how modern society would evolve. In less than a decade, this marvel of electronic engineering and materials sciences would lead in an era of advancement incomparable to anything else in human history. In the March of 1971, the commercial launch of a new semiconductor product set the stage for this new era. Composed of a then-incredible 2,300 transistors, the Intel 4004 central processing unit or CPU was released. Initially created as a custom solution to the Japanese company Busicom Corp. for use in the Busicom 141-PF calculator, it was released later that year to the general public. With prophetic irony, the marketing material for the chip touted the slogan “Announcing a new era in integrated electronics”. But what made the Intel 4004 so groundbreaking? Take calculator and solve any simply arithmetic operation, let's say 22 divided by 7. What we just did was issue a computer an instruction. Instructions are elementally operations, such as math commands that a CPU executes. Every computer program ever made, from web browsers, to apps, to video games is composed of millions of these instructions. The 4004 was capable of executing between 46,250 to 92,500 instructions per second. For comparison, ENIAC, the first electronic computer built just 25 years earlier could only execute 5,000 instructions a second. But what made the 4004 so powerful wasn't just its 1800% increase in processing power - it only consumed 1 watt of electricity, was about ¾” long and cost $5 to produce in today's money. This was miles ahead of ENIAC's, cost of $5.5 million in today's money, 180kW power consumption and 27 ton weight. Fast forward to September 2017, the launch date of the Intel Core i9-7980 XE. This CPU is capable of performing over 80 billion instructions a second, a 900,000 time increase in processing power. What did it take to get here? In this 2 part series we explore the engineering and behind the scenes technology that paved the way for that simple 16 pin chip to evolve in the powerhouse of CPUs today. This is the evolution of processing power. HOW A CPU STORES DATA In order to understand how a CPU derives its processing power, let examine what a CPU actually does and how it interfaces with data. In digital electronics everything is represented by the binary “bit”. It's an elemental representation of two possible states. A bit can represent a zero or one, true or false, up or down, on or off, or any other bi-state value. In a CPU, a “bit” is physically transmitted as voltage levels. If we combine multiple “bits” together in a group, we can now represent more combinations of discrete states. For example, if we combine eight bits together we form what's known as a byte. A byte can represent 256 different states and can be used to represent numbers. In the case of a byte, any number between 0 and 255 can be expressed. But in a CPU, how we choose to represent data is completely malleable. That same byte, can also represent a number between -128 to 127. Other expressions of that byte may be colors or levels of sound. When we combine multiple bytes together, we create what's known as a word. Words are expressed in their bit capacity. A 32-bit word contains 32-bits. A 64-bit word contains 64 bits and so on. When processors are created, the native word size it operates on forms the core of its architecture. The original Intel 4004 processor operated on a 4-bit word. This means data moving through the CPU transits in chunks of four bits at time. Modern CPUs are typical 64-bit, however 32-bit processors are still quite common. By making use of larger word sizes we can represent more discrete states and consequently larger numbers. A 32-bit word for example, can represent up to 4.2 billion different states. Of all the forms data can take inside of a CPU the most important one is that of an instruction. Instructions are unique bits of data, that are decoded and executed by the CPU as operations. An example of a common instruction would be to add two words values together or move a word of data from one location in memory to another location. The entire list of instructions a CPU supports is called its instruction set. Each instruction's binary representation, its machine code is typically assigned a human readable presentation known as a assembly language. If we look at the instruction set of most CPU's, they all tend to focus around performing math or logical operations on data, testing conditions or moving it from one location to another in memory. For all intents and purposes, we can think of a CPU as an instruction processing machine. They operate by looping through three basic steps, fetch, decode, and execute. As CPU designs evolve these three step become dramatically more complicated and technologies are implemented that extend this core model of operation. But in order to fully appreciate these advances, let's first explore the mechanics of basic CPU operation. Known today as the “classic Reduced Instruction Set Computer or [RISC] pipeline”, this paradigm formed the basis for the first CPU designs, such as the Intel 4004. In the fetch phase, the CPU loads the instruction it will be executing into itself. A CPU can be thought of as existing in an information bubble. It pulls instructions and data from outside of itself, performs operations within its own internal environment, and then returns data back. This data is typically stored in memory external of the CPU called Random Access Memory or [RAM]. Software instructions and data are loaded into RAM from more permanent sources such as hard drives and flash memory. But at one point in history magnetic tape, punch cards, and even flip switches were used. When a CPU loads a word of data it does it by requesting the contents of a location in RAM. This is called the data's address. The amount of data a CPU can address at one time is determined by its address capacity. A 4 bit address for example, can only directly address 16 locations of data. Mechanisms exist for addressing more data than the CPUs address capacity, but let's ignore these for now. The mechanism by which data moves back and forth to RAM is called a bus. A bus can be thought of as a multi-lane highway between the CPU and RAM is which each bit of data has its own lane. But we also need to transmit the location of the data we're requesting, so a second highway must be added to accommodate both the size of the data word and the address word. These are called the data bus and address bus respectively. In practice these data and address lines are physical electrical connections between the CPU and RAM and often look exactly like a superhighway on a circuit board. When a CPU makes a request for RAM access, a memory control region of the CPU loads the address bus with the memory word address it wishes to access. It then triggers a control line that signals a memory read request. Upon receiving this request the RAM fills the data bus with the contents of the requested memory location. The CPU now sees this data on the bus. Writing data to RAM works in a similar manner, with CPU posting to the data bus instead. When the RAM received a “write” signal, the contents of the data bus is written to the RAM location pointed to by the address bus. The address of the memory location to fetch is stored in the CPU, in a mechanism called a register. A register is a high speed internal memory word that is used as a “notepad” by CPU operations. It's typically used as a temporary data store for instructions but can also be assigned to vital CPU functions, such as keeping track of the current address being accessed in RAM. Because they are designed innately into the CPU's hardware, most only have a handful of registers. Their word size is generally coupled to the CPU's native architecture. Once a word of memory is read into the CPU, the register that stores the address of that words, known as a Program Counter is incremented. On the next fetch, it retrieves the next instruction, in sequence. Accessing data from RAM is typically the bottleneck of a CPUs operation. This is due to the need to interface with components physically distant from the CPU. On older CPUs this doesn't present much of a problem, but as they get faster the latency of memory access becomes a critical issue. The mechanism of how this is handled is key to the advancement of processor performance and will be examined in part 2 of this series as we introduce cacheing. Once an instruction is fetched the decode phase begins. In classic RISC architecture, one word of memory forms a complete instruction. This changes to a more elaborate methods as CPUs evolve to complex instruction set archicture, which will be introduced in part 2 of this series. When a instruction is decoded, the word is broken down into two parts known as bitfields. These are called an opcode and an operand. A opcode is a unique series of bits that represent a specific function within the CPU. Opcodes generally instruct the CPU to move data to a register, move data between a register and memory, perform math or logic functions on a registers and branching. Branching occurs when an instruction causes a change in the program counter's address. This causes the next fetch to occur at new location in memory as oppose to the next sequential address. When this “jump” to a new program location is guaranteed, it's called an unconditional branch. In other cases a test can be done to determine if a “jump” should occur. This is known as a conditional branch. The tests that trigger these conditions are usually mathematical, such is if a register or memory location is less than or greater than a number, or if it is zero or non zero. Branching allows program to make decisions and are crucial to the power of a CPU. Opcodes sometimes requires data to perform an its operation on. This part of an instruction is called a operand. Operands are bits piggy backed onto an instruction to be used as data. Let say we wanted to add 5 to a register. The binary representation of the number 5 would be embedded in the instruction and extracted by the decoder for the addition operation. When an instruction has an embedded constant of data within it, its know as an immediate value. In some instructions the operand does not specify the value it self, but contains an address to a location in memory to be accessed. This is common in opcodes that request a memory word to be loaded into a register. This is known as addressing, and can get far more complicated in modern CPUs. Addressing can result in a performance penalty because of the need to “leave” the CPU but this is mitigated as CPU design advance. Once we have our opcode and operand the opcode is matched by means of a table and a combination of circuiry where a control unit then configures various operational sections of the CPU to perform the operation. In some modern CPU's the decode phase isn't hardwired, and can be programed. This allows for the changing in how instructions are decoded and the CPU is configured for execution. In the execution phase the now configured CPUs is triggered. This may occur in a single step or a series of steps depending on the opcode. One of the most commonly used sections of a CPU in execution is the Arithmetic Logic Unit or ALU. This block of circuitry is designed to take in two operands and perform either basic arithmetic or bitwise logical operations on them. The result are then outputted along with respective mathematical flags, such as a carry over, an overflow or a zero result. The output of the ALU is then sent to either a register or a location in memory based on the opcode. Let's say an instruction calls for adding a 10 to a register and placing the result in that register. The control unit of the CPU will load the immediate value of the instruction into the ALU, load the value of the register into the ALU and connect the ALU output to the register. On the execute trigger the addition is done and the output loaded into the register. In effect, software distills down to a loop of configuring groups of circuits to interact with each other within a CPU. In a CPU these 3 phases of operation loop continuously, workings its way through the instruction of the computer program loaded in memory. Gluing this looping machine together is a clock. A clock is a repeating pulse use to synchronize a CPU's internal mechanics and its interface with external components. CPU clock rate is measured by the number of pulses per second, or Hertz. The Intel 4004 ran at 740 KHz or 740,000 pulses a second. Modern CPUs can touch clock rates approaching 5GHz, or 5 billion pulses a second. On simpler CPUs a single clock triggers the advance of the fetch, decode, and execute stage. As CPUs get more sophisticated these stages can take several clock cycles to complete. Optimizing these stages and their use of clock cycles are key to increasing processing power and will be discussed in part 2 of this series. The throughput of a CPU, the amount of instructions that can be executed a second determines how “fast” it is. By increasing the clock rate, we can make a processor go through its stages faster. However as we get faster we encounter a new problem. The period between clock cycles has to allow for enough time for every possible instruction combination to execute. If a new clock pulse happens before an instruction cycle completes, results become unpredictable and the program fails. Furthermore, increasing clock rates has the side effect of increasing power dissipation and a buildup of heat in the CPU causing a degradation of cirucity performance. The battle to run CPU's faster and more efficiently has dominated its entire existence. In the next part of this series we'll explore the expansion of CPU designs from that simple 2,300 transistor device of the 1970s, through the microcomputing boom of the 1980s, and onward to the multi-million transistor designs of the 90s and early 2000s. We'll introduce the rise of pipelining technology, caching, the move to larger bit CISC architecture, and charge forward to multi GHz clock rates.
B1 US cpu instruction data memory register address The Evolution Of CPU Processing Power Part 1: The Mechanics Of A CPU 21 3 joey joey posted on 2021/07/03 More Share Save Report Video vocabulary