Subtitles section Play video
The following content is provided under a Creative
Commons license.
Your support will help MIT Open Courseware
continue to offer high quality educational resources for free.
To make a donation or to view additional materials
from hundreds of MIT courses, visit MIT OpenCourseware
at ocw.mit.edu.
JULIAN SHUN: Good afternoon, everyone.
So today, we have TB Schardl here.
He's going to give us the lecture on C to assembly.
So TB's a research scientist here at MIT
working with Charles Leiserson.
He also taught this class with me last year,
and he got one of the best ratings ever for this class.
So I'm really looking forward to his lecture.
TAO SCHARDL: All right, great.
So thank you for the introduction, Julian.
So I hear you just submitted the beta for project 1.
Hopefully, that went pretty well.
How many of you slept in the last 24 hours?
OK, good.
All right, so it went pretty well.
That sounds great.
Yeah, so today, we're going to be talking about C to assembly.
And this is really a continuation
from the topic of last lecture, where
you saw computer architecture, if I understand correctly.
Is that right?
You looked at computer architecture, x86-64 assembly,
that sort of thing.
So how many of you walked away from that lecture thinking, oh
yeah, x86-64 assembly, this is easy?
This is totally intuitive.
Everything makes perfect sense.
There's no weirdness going on here whatsoever.
How many of you walked away not thinking that?
Thinking that perhaps this is a little bit strange,
this whole assembly language.
Yeah, I'm really in the later cab. x86 is
kind of a strange beast.
There are things in there that make no sense.
Quad word has 8 bytes.
P stands for integer, that sort of thing.
So when we move on to the topic of seeing how C code gets
translated into assembly, we're translating
into something that's already pretty complicated.
And the translation itself isn't going
to be that straightforward.
So we're going to have to find a way to work through that.
And I'll outline the strategy that we'll
be using in the start of this presentation.
But first, let's quickly review.
Why do we care about looking at assembly?
You should have seen this slide from the last lecture.
But essentially, assembly is a more precise representation
of the program than the C code itself.
And if you look at the assembly, that
can reveal details about the program that are not obvious
when you just look at the C code directly.
There are implicit things going on in the C code,
such as type cast or the usage of registers
versus memory on the machine.
And those can have performance implications.
So it's valuable to take a look at the assembly code directly.
It can also reveal what the compiler did or did not
do when it tried to optimize the program.
For example, you may have written a division operation
or a multiply operation.
But somehow, the compiler figured out
that it didn't really need to do a divide
or multiply to implement that operation.
It could implement it more quickly using simpler, faster
operations, like addition and subtraction or shift.
And you would be able to see that
from looking at the assembly.
Bugs can also arise only at a low level.
For example, there may be a bug in the program that only
creates unexpected behavior when you optimize the code at 03.
So that means, when you're debugging and with that OG
or -01, you wouldn't see any unusual behaviors.
But when you crank up the optimization level,
suddenly, things start to fall apart.
Because the C code itself didn't change,
it can be hard to spot those bugs.
Looking at the assembly can help out in that regard.
And when worse comes to worse, if you really
want to make your code fast, it is
possible to modify the assembly code by hand.
One of my favorite uses of looking at the assembly,
though, is actually reverse engineering.
If you can read the assembly for some code,
you can actually decipher what that program does,
even when you only have access to the binary of that program,
which is kind of a cool thing.
It takes some practice to read assembly at that level.
One trick that some of us in Professor Leiserson's research
group have used in the past to say
figure out what Intel's Math Kernel Library
is doing to multiply matrices.
Now, as I mentioned before, at the end of last lecture,
you saw some computer architecture.
And you saw the basics of x86-64 assembly,
including all the stuff, like the instructions,
the registers, the various data types, memory addressing modes,
the RFLAGS registered with those condition codes,
and that sort of thing.
And today, we want to talk about how C code gets implemented
in that assembly language.
OK, well, if we consider how C code becomes assembly
and what that process actually looks like,
we know that there is a compiler involved.
And the compiler is a pretty sophisticated piece
of software.
And, frankly, the compiler has a lot of work
to do in order to translate a C program into assembly.
For example, it has to choose what assembly instructions are
going to be used to implement those C operations.
It has to implement C conditionals and loops--
those if, then, elses and those for and why loops--
into jumps and branches.
It has to choose registers and memory locations
to store all of the data in the program.
It may have to move data among the registers and the memory
locations in order to satisfy various data dependencies.
It has to coordinate all the function calls that
happen when subroutine A calls B and calls C, and then returns,
and so on and so forth.
And on top of that, these days, we
expect our compiler to try really
hard to make that code fast.
So that's a lot of work that the compiler has to do.
And as a result, if we take a look
at the assembly for any arbitrary piece of C code,
the mapping from that C code to the assembly
is not exactly obvious, which makes
it hard to execute this particular lecture and hard to,
in general, read the binary or the assembly for some program
and figure out what's really going on.
So what we're going to do today to understand this translation
process is we're going to take a look at how
that compiler actually reasons about translating
C code into assembly.
Now this is not a compiler class.
6172 is not a class you take if you want to learn
how to build a compiler.
And you're not going to need to know everything
about a compiler to follow today's lecture.
But what we will see is just a little bit about
how the compiler understands a program
and, later on, how the compiler can translate that program
into assembly code.
Now when a compiler compiles a program,
it does so through a sequence of stages, which
are illustrated on this slide.
Starting from the C code, it first pre-processes that code,
dealing with all the macros.
And that produces a pre-process source.
Then the compiler will translate that source code
into an intermediate representation.
For the client compiler that you're using,
that intermediate representation is called LLVM IR.
LLVM being the name of the underlying compiler,
and IR being the creative name for the intermediate
representation.
That LLVM IR is really a sort of pseudo-assembly.
It's kind of like assembly, but as we'll see,
it's actually a lot simpler than x86-64 assembly.
And that's why we'll use it to understand this translation
process.
Now it turns out that the compiler
does a whole lot of work on that intermediate representation.
We're not going to worry about that today.
We'll just skip to the end of this pipeline
when the compiler translates LLVM IR into assembly code.
Now the nice thing about taking a look at the LLVM IR
is that If you're curious, you can actually
follow along with the compiler.
It is possible to ask clang to compile your code
and give you the LLVM IR rather than the assembly.
And the flags to do that are somewhat familiar.
Rather than passing the dash s flag, which, hopefully, you've
already seen, that will translate C code directly
into assembly.
If you pass dash s dash omit LLVM,
that will produce the LLVM IR.
You can also ask clang to translate LLVM IR itself
directly into assembly code, and that process
is pretty straightforward.
You just use the dash S flag once again.
So this is the outline of today's lecture.
First, we're going to start with a simple primer on LLVM IR.
I know that LLVM IR sounds like another language.
Oh, gosh, we have to learn another language.
But don't worry.
This primer, I would say, is simpler than the x86-64 primer.
Based on the slides, for x86-64, that primer
was 20-some slides long.
This primer is six slides, so maybe a little over a quarter.
Then we'll take a look at how the various constructs in the C
programming language get translated into LLVM IR,
including straight line code, C functions, conditionals--
in other words, if, then, else--
loops.
And we'll conclude that section with just a brief mention
of LLVM IR attributes.
And finally, we'll take a look at how LLVM IR gets
translated into assembly.
And for that, we'll have to focus
on what's called the Linux x86-64 calling convention.
And we'll conclude with a case study, where
we see how this whole process works on a very simple code
to compute Fibonacci numbers.
Any questions so far?
All right, let's get started.
Brief primer on LLVM IR--
so I've shown this in smaller font on some previous slides,
but here is a snippet of LLVM IR code.
In particular, this is one function
within an LLVM IR file.
And just from looking at this code,
we can see a couple of the basic components of LLVM IR.
In LLVM IR, we have functions.
That's how code is organized into these chunks--
chunks called functions.
And within each function, the operations of the function
are encoded within instructions.
And each instruction shows up, at least on this slide,
on a separate line.
Those functions operate on what are called LLVM IR registers.
These are kind of like the variables.
And each of those variables has some associated type.
So the types are actually explicit within the IR.
And we'll take a look at the types in more
detail in a couple of slides.
So based on that high-level overview,
we can do a little bit of a comparison between LLVM
IR and assembly language.
The first thing that we see is that it looks kind
of similar to assembly, right?
It still has a simple instruction format.
There is some destination operand, which
we are calling a register.
And then there is an equal sign and then an op code, be it add,
or call, or what have you, and then
some list of source operations.
That's roughly what each instruction looks like.
We can also see that the LLVM IR code, it'll turn out.
The LLVM IR code adopts a similar structure
to the assembly code itself.
And control flow, once again, is implemented
using conditional branches, as well as unconditional branches.
But one thing that we'll notice is that LLVM IR
is simpler than assembly.
It has a much smaller instruction set.
And unlike assembly language, LLVM IR
supports an infinite number of registers.
If you can name it, it's a register.
So in that sense, LLVM's notion of registers
is a lot closer to C's notion of variables.
And when you read LLVM IR, and you see those registers,
you should just think about C variables.
There's no implicit RFLAGS register, and there no implicit
condition codes going on.
Everything is pretty explicit in terms of the LLVM.
There's no explicit stack pointer or frame pointer.
There's a type system that's explicit in the IR itself.
And it's C like in nature, and there
are C-like functions for organizing the code overall.
So let's take a look at each of these components,
starting with LLVM IR registers.
This is basically LLVM's name for a variable.
All of the data in LLVM IR is stored in these variables,
which are called registers.
And the syntax is a percent symbol followed by a name.
So %0, %1, %2, that sort of thing.
And as I mentioned before, LLVM IR registers
are a lot like c variables.
LLVM supports an infinite number of these things,
and each distinct register is just distinguished by its name.
So %0 is different from %1, because they have different
names.
Register names are also local to each LLVM IR function.
And in this regard, they're also similar to C variables.
If you wrote a C program with two functions, A and B,
and each function had a local variable apple,
those are two different apples.
The apple in A is not the same thing as the apple in B.
Similarly, if you had two different LLVM IR functions,
and they both described some register five,
those are two different variables.
They're not automatically aliased.
So here's an example of an LLVM IR snippet.
And what we've done here is just highlighted
all of the registers.
Some of them are being assigned, because they're
on the left-hand side of an equal symbol.
And some of them are being used as arguments when they show up
on the right-hand side.
There is one catch, which we'll see later on, namely
that the syntax for LLVM registers
ends up being hijacked when LLVM needs to refer
to different basic blocks.
We haven't defined basic blocks yet.
We'll see what that's all about in just a couple of slides.
Everyone good so far?
So LLVM IR code is organized into instructions,
and the syntax for these instructions
is pretty straightforward.
We have a register name on the left-hand side,
then an equal symbol, and then and op code,
followed by an operand list.
For example, the top highlight instruction
has register six equal to add of sum arguments.
And we'll see a little bit more about those arguments later.
That's the syntax for when an instruction actually
returns some value.
So addition returns the sum of the two operands.
Other instructions don't return a value, per se,
not a value that you'd store in a local register.
And so the syntax for those instructions
is just an op code followed by a list of operands.
Ironically, the return instruction
that you'd find at the end of a function
doesn't assign a particular register value.
And of course, the operands can be either registers,
or constants, or, as we'll see later on,
they can identify basic blocks within the function.
The LLVM IR instruction set is smaller than that of x86.
x86 contains hundreds of instructions
when you start counting up all the vector instructions.
And LLVM IR is far more modest in that regard.
There's some instructions for data movements,
including stack allocation, reading memory, writing memory,
converting between types.
Yeah, that's pretty much it.
There are some instructions for doing arithmetic or logic,
including integer arithmetic, floating-point arithmetic,
Boolean logic, binary logic, or address calculations.
And then there are a couple of instructions
to do control flow.
There are unconditional branches or jumps,
conditional branches or jumps, subroutines--
that's call or return--
and then there's this magical phi function,
which we'll see more of later on in these slides.
Finally, as I mentioned before, everything in LLVM IR
is explicitly typed.
It's a strongly-typed language in that sense.
And the type system looks something like this.
For integers, whenever there's a variable of an integer type,
you'll see an i followed by some number.
And that number defines the number of bits in that integer.
So if you see a variable of type i64,
that means it's a 64-bit integer.
If you see a variable of type i1,
that would be a 1-bit integer or, in other words,
a Boolean value.
There are also floating-point types,
such as double and float.
There are pointer types, when you follow an integer
or floating-point type with a star, much like in C,
you can have a raise.
And that uses a square bracket notation,
where, within the square brackets,
you'll have some number and then times and then some other type.
Maybe it's a primitive type, like an integer
or a floating-point.
Maybe it's something more complicated.
You can have structs with an LLVM IR.
And that uses squiggly brackets with types
enumerated on the inside.
You can have vector types, which uses angle brackets
and otherwise adopts a similar syntax to the array type.
Finally, you can occasionally see a variable,
which looks like an ordinary register,
except that its type is label.
And that actually refers to a basic block.
Those are the basic components of LLVM IR.
Any questions so far?
Everything clear?
Everything unclear?
STUDENT: What's the basic [INAUDIBLE]??
TAO SCHARDL: That should be unclear,
and we'll talk about it.
Yeah?
STUDENT: Is the vector notation there
for the vectorization that's done, like the special register
is used?
TAO SCHARDL: Is the vector notation used for the vector
registers?
In a sense, yes.
The vector operations with an LLVM
don't look like SEC or AVX, per se.
They look more like ordinary operations,
except those ordinary operations work on a vector type.
So that's how the vector operations show up in LLVM IR.
That make some sense?
Cool.
Anything else?
OK, that's the whole primer.
That's pretty much all of the language
that you're going to need to know,
at least for this slide deck.
We'll cover some of the details as we go along.
Let's start translating C code into LLVM IR.
Is that good?
All right, let's start with pretty much the simplest
thing we can--
straight line C code.
What do I mean by straight line C code?
I mean that this is a blob of C code
that contains no conditionals or loops.
So it's just a whole sequence of operations.
And that sequence of operations in C code
turns into a sequence of operations in LLVM IR.
So in this example here, we have foo of n minus 1
plus bar of n minus 2.
That is a sequence of operations.
And it turns into the LLVM IR on the right.
We can see how that happens.
There are a couple rules of thumb
when reading straight line C code
and interpreting it in the IR.
Arguments to any operation are evaluated
before the operation itself.
So what do I mean by that?
Well, in this case, we need to evaluate n minus 1
before we pass the results to foo.
And what we see in the LLVM IR is
that we have an addition operation that
computes n minus 1.
And then the result of that--
stored into register 4-- gets passed to the call
instruction on the next line, which
calls out to function foo.
Sound good?
Similarly, we need to evaluate n minus 2
before passing its results to the function bar.
And we see that sequence of instructions
showing up next in the LLVM IR.
And now, we actually need the return value-- oh, yeah?
Question?
STUDENT: What is NSW?
TAO SCHARDL: NSW?
Essentially, that is an attribute,
which we'll talk about later.
These are things that decorate the instructions, as well
as the types, within LLVM IR, basically,
as the compiler figures stuff out.
So it helps the compiler along with analysis and optimization.
Good?
So for the last operation here, we
had to evaluate both foo and bar and get their return values
before we could add them together.
And so the very last operation in this sequence
is the addition.
That just takes us those return values and computes their sum.
Now all of that used primitive types, in particular, integers.
But it's possible that your code uses aggregate types.
By aggregating types, I mean, arrays or struts,
that sort of thing.
And aggregate types are harder to store within registers,
typically speaking.
And so they're typically stored within memory.
As a result, if you want to access something
within an aggregate type, if you want to read some elements out
of an array, that involves performing a memory access
or, more precisely, computing some address into memory,
and then loading or storing that address.
So here, for example, we have an array A of seven integers.
And we're going to access A sub x.
In LLVM IR, that turns into two instructions--
this getelementptr followed by a load.
And in the getelementptr case, this computes an address
into memory and stores the result
of that address into a register, in this case, register 5.
The next instruction, the load, takes
the address stored in register 5 and simply loads
that particular memory address, storing
the result into another register, in this case, 6.
Pretty simple.
When reading the getelementptr instruction,
the basic syntax involves a pointer
into memory followed by a sequence of indices.
And all that getelementptr really
does is it computes an address by taking that pointer
and then adding on that sequence of indices.
So in this case, we have a getelementptr instruction,
which takes the address in register 2,
and then adds onto it--
yeah, that's a pointer into memory--
and then it adds onto it to indices.
One is the literal value 0, and the other
is the value stored in register 4.
So that just computes the address,
starting at 2 plus 0 plus whatever was in register 4.
That's all for straight line code.
Good so far?
feel free to interrupt if you have questions.
Cool.
Functions-- let's talk about C functions.
So when there's a function in your C code,
generally speaking, you'll have a function within the LLVM code
as well.
And similarly, when there's a return statement in the C code,
you'll end up with a return statement in the LLVM IR.
So here, we have just the bare bones C
code for this fib routine.
That corresponds to this fib function within LLVM IR.
And the function declaration itself
looks pretty similar to what you would get in ordinary C.
The return statement is also similar.
It may take an argument, if you're returning
some value to the caller.
In this case, for the fib routine,
we're going to return a 64-bit integer.
And so we see that this return statement returns
the 64-bit integer stored in register 0, a lot like in C.
Functions can have parameters.
And when you have a C function with a list of parameters,
basically, in LLVM IR, you're going
to end up with a similar looking function
with the exact same list of parameters translated
into LLVM IR.
So here, we have this C code for the mm base routine.
And we have the corresponding LLVM IR
for an mm-based function.
And what we see is we have a pointer to a double
as the first parameter, followed by a 32-bit integer,
followed by another pointer to a double,
followed by another 32-bit integer,
following another pointer to a double,
and another 33-bit integer, and another 32-bit integer.
One implicit thing with an LLVM IR if you're looking
at a function declaration or definition,
the parameters are automatically named %0, %1, %2,
so on and so forth.
There's one unfortunate thing about LLVM IR.
The registers are a lot like C functions,
but unfortunately, that implies that when
you're reading LLVM IR, it's a lot like reading
the code from your teammate, who always insists on naming things
with nondescript, single-letter variable names.
Also, that teammate doesn't comment his code, or her code,
or their code.
OK, so basic blocks--
when we look at the code within a function,
that code gets partitioned into chunks,
which are called basic blocks.
A basic block has a property that's
a sequence of instructions.
In other words, it's a blob a straight line code,
where control can only enter from the first instruction
in that block.
And it can only leave from the last instruction in that block.
So here we have the C code for this routine fib.c.
We're going to see a lot of this routine fib.c, by the way.
And we have the corresponding LLVM IR.
And what we have in the C code, what the C code is
telling us is that if n is less than 2,
you want to do one thing.
Otherwise, you want to do some complicated computation
and then return that result.
And if we think about that.
We've got this branch in our control flow.
And what we'll end up with are three different blocks
within the LLVM IR.
So we end up with one block, which
does the computation is n less than 2.
And then we end up with another block that says, well,
in one case, just go ahead and return something, in this case,
the input to the function.
In the other case, do some complicated calculations,
some straight line code, and then return that result.
Now when we partition the code of a function
into these basic blocks, we actually
have connections between the basic blocks
based on how control can move between the basic blocks.
These control flow instructions, in particular, the branch
instructions, as we'll see, induce edges
among these basic blocks.
Whenever there's a branch instruction that can specify,
that control can leave this basic block
and go to that other basic block,
or that other basic block, or maybe one or the other,
depending on how the result of some computation unfolded.
And so for the fib function that we saw before,
we had those three basic blocks.
And based on whether or not n was than 2, either
we would execute the simple return statement,
or we would execute the blob of straight line
code shown on the left.
So those are basic blocks and functions.
Everyone still good so far?
Any questions?
Clear as mud?
Let's talk about conditionals.
You've already seen one of these conditionals.
That's given rise to these basic blocks and these control flow
edges.
So let's tease that apart a little bit further.
When we have a C conditional-- in other words,
an if-then-else statement or a switch statement,
for that matter--
that gets translated into a conditional branch
instruction, or BR, in the LLVM IR representation.
So what we saw before is that we have this if n less than 2
and this basic block with two outgoing edges.
If we take a really close look at that first basic block,
we can tease it apart and see what each operation does.
So first, in order to do this conditional operation,
we need to compute whether or not n is less than 2.
We need to do a comparison between n
and the literal value 2.
That comparison operation turns into an icmp instruction
within the LLVM IR, an integer comparison in the LLVM IR.
The result of that comparison then
gets passed to a conditional branch as one of its arguments,
and the conditional branch specifies a couple of things
beyond that one argument.
In particular, that conditional branch takes out 1-bit
integer-- that Boolean result--
as well as labels of two different basic blocks.
So that Boolean value is called the predicate.
And that's, in this case, a result of that comparison
from before.
And then the two basic blocks say
where to go if the predicate is true
or where to go if the predicate is false.
The first label is the destination when it's true,
second label destination when it's false--
pretty straightforward.
And if we decide to map this onto our control flow
graph, which we were looking at before,
we can identify the two branches coming out
of our first basic block as either the true branch
or the false branch based on whether or not
you follow that edge when the predicate is true
or you follow it when the predicate is false.
Sound good?
That should be straightforward.
Let me know if it's not.
Let me know if it's confusing.
Now it's also possible that you can
have an unconditional branch in LLVM IR.
You can just have a branch instruction with one operand,
and that one operand specifies a basic block.
There's no predicate.
There is no true or false.
It's just the one basic block.
And what that instruction says is, when you get here,
now, go to that other basic block.
This might seem kind of silly, right?
Why wouldn't we just need to jump to another basic block?
Why not just merge this code with the code
in the subsequent basic block?
Any thoughts?
STUDENT: For instance, in this case,
other things might jump in.
TAO SCHARDL: Correct.
Other things might go to that basic block.
And in general, when we look at the structure
that we get for any particular conditional in C,
we end up with this sort of diamond shape.
And in order to implement that diamond shape,
we need these unconditional branches.
So there's a good reason for them to be around.
And here, we just have an example
of a slightly more complicated conditional
that creates this diamond shape in our control flow graph.
So lets tease this piece of code apart.
In the first block, we're going to evaluate if some predicate--
and in this case, our predicate is x bitwise and 1.
And what we see in the first basic block
is that we compute the bitwise and store that result,
do a comparison between that result, and the value 1.
That gives us a Boolean value, which is stored in register 3.
And we branch conditionally on whether 3 is true or false.
In the case that it's true, we'll branch to block 4.
And in block 4, that contains the code for the consequence,
the then clause of the if, then, else.
And in the call square, we just call function foo.
And then we need to leave the conditional,
so we'll just branch unconditionally.
The alternative, if x and 1 is zero, if it's false,
then we will execute the function bar,
but then also need to leave the conditional.
And so we see in block 5, following
the false branch that we call bar,
then we'd just branch to block 6.
And finally, in block 6, we return the result.
So we end up with this diamond pattern whenever we
have a conditional, in general.
We may delete certain basic blocks
if the conditional in the code is particularly simple.
But in general, it's going to be this kind
of diamond-looking thing.
Everyone good so far?
One last C construct-- loops.
Unfortunately, this is the most complicated C construct
when it comes to the LLVM IR.
But things haven't been too bad so far.
So yeah, let's walk into this with some confidence.
So the simple part is that what we will see
is the C code for a loop translates
into LLVM IR that, in the control flow graph
representation, is a loop.
So a loop in C is literally a loop
in this graph representation, which is kind of nice.
But to figure out what's really going on with these loops,
let's first tease apart the components of a C loop.
Because we have a couple of different pieces
in an arbitrary C loop.
We have a loop body, which is what's
executed on each iteration.
And then we have some loop control,
which manages all of the iterations of that loop.
So in this case, we have a simple C loop,
which multiplies each element of an input
vector x by some scale over a and stores the result into y.
That body gets translated into a blob of straight line code.
I won't step through all of the straight line code just now.
There's plenty of it, and you'll be
able to see the slides after this lecture.
But that blob of straight line code
corresponds to a loop body.
And the rest of the code in the LLVM IR snippet
corresponds to the loop control.
So we have the initial assignment
of the induction variable.
The comparison would be end of the loop
and the increment operation at the end.
All of that gets encoded in the stuff highlighted in yellow,
that loop control part.
Now if we take a look at this code,
there's one odd piece that we haven't really understood yet,
and it's this phi instruction at the beginning.
The phi instruction is weird, and it arises pretty commonly
when you're dealing with loops.
It basically is there to solve a problem
with LLVM's representation of the code.
So before we describe the phi instruction,
let's actually take a look at the problem
that this phi instruction tries to solve.
So let's first tease apart the loop to reveal the problem.
The C loop produces this looping pattern
in the control flow graph, literally, an edge that
goes back to the beginning.
If we look at the different basic blocks we have,
we have one block at the beginning, which
initializes the induction variable and sees
if there are any iterations of the loop that need to be run.
If there aren't any iterations, then they'll
branch directly to the end of loop.
It will just skip the loop entirely.
No need to try to execute any of that code.
And in this case, it will simply return.
And then inside the loop block, we
have these two incoming edges-- one
from the entry point of the loop, where i has just
been set to zero, and another where we're repeating the loop,
where we've decided there's one more iteration to execute.
And we're going to go back from the end of the loop
to the beginning.
And that back edge is what creates the loop structure
in the control flow graph.
Make sense?
I at least see one nod over there.
So that's encouraging.
OK, so if we take a look at the loop control,
there are a couple of components to that loop control.
There's the initialization of the induction variable.
There is the condition, and there's the increment.
Condition says when do you exit.
Increment updates the value of the induction variable.
And we can translate each of these components
from the C code for the loop control
into the LLVM IR code for that loop.
So the increment, we would expect
to see some sort of addition where we add 1 to some register
somewhere.
And lo and behold, there is an add operation.
So we'll call that the increment.
For the condition, we expect some comparison operation
and a conditional branch based on that comparison.
Look at that.
Right after the increment, there's
a compare and a conditional branch
that we'll either take us back to the beginning of the loop
or out of the loop entirely.
And we do see that there is some form of initialization.
The initial value of this induction variable is 0.
And we do see a 0 among this loop control code.
It's kind of squirreled away in that weird notation there.
And that weird notation is sitting next
to the phi instruction.
What's not so clear here is where exactly
is the induction variable.
We had this single variable i in our C code.
And what we're looking at in the LLVM IR
are a whole bunch of different registers.
We have a register that stores what
we're claiming to be i plus 1, then
we do this comparison and branch thing.
And then we have this phi instruction
that takes 0 or the result of the increment.
Where did i actually go?
So the problem here is that i is really
represented across all of those instructions.
And that happens because the value of the induction variable
changes as you execute the loop.
The value of i is different on iteration 0 versus iteration 1
versus iteration 2 versus iteration 3
and so on and so forth.
i is changing as you execute the loop.
And there's this funny invariant.
Yeah, so if we try to map that induction variable to the LLVM
IR, it kind of maps to all of these locations.
It maps to various uses in the loop body.
It maps, roughly speaking, to the return value of this field
instruction, even though we're not sure what that's all about.
But we can tell it maps to that, because we're going
to increment that later on.
And we're going to use that in a comparison.
So it kind of maps all over the place.
And because it changes values with the increment operation,
we're going to encounter--
so why does it change registers?
Well, we have this property in LLVM
that each instruction defines the value
of a register, at most, once.
So for any particular register with LLVM,
we can identify a unique place in the code
of the function that defines that register value.
This invariant is called the static single assignment
invariant.
And it seems a little bit weird, but it turns out
to be an extremely powerful invariant within the compiler.
It assists with a lot of the compiler analysis.
And it also can help with reading the LLVM
IR if you expect it.
So this is a nice invariant, but it
poses a problem when we're dealing
with induction variables, which change as the loop unfolds.
And so what happens when control flow merges at the entry
point of a loop, for example?
How do we define what the induction
variable is at that location?
Because it could either be 0, if this
is the first time through the loop, or whatever you lost
incremented.
And the solution to that problem is the phi instruction.
The phi instruction defines a register that says,
depending on how you get to this location in the code,
this register will have one of several different values.
And the phi instruction simply lists
what the value of that register will be,
depending on which basic block you came from.
So in this particular code, the phi instruction says,
if you came from block 6, which was the entry
point of the loop, where you initially checked if there were
any loop iterations to perform, if you come from that block,
then this register 9 is going to adopt the value 0.
If, however, you followed the back edge of the loop,
then the register is going to adopt the value,
in this case, 14.
And 14, lo and behold, is the result
of the incremental operation.
And so this phi instruction says,
either you're going to start from zero,
or you're going to be i plus 1.
Just to note, the phi instruction
is not a real instruction.
It's really a solution to a problem with an LLVM.
And when you translate this code into assembly,
the phi instruction isn't going to map
to any particular assembly instruction.
It's really a representational trick.
Does that make some sense?
Any questions about that?
Yeah?
STUDENT: Why is it called phi?
TAO SCHARDL: Why is it called phi?
That's a great question.
I actually don't know why they chose the name phi.
I don't think they had a particular affinity
for the Golden Ratio, but I'm not
sure what the rationale was.
I don't know if anyone else knows.
Yeah?
Google knows all, sort of.
Yeah, so adopt the value 0 from block 6 or 14 from block 8.
So that's all of the basic components
of C translated into LLVM IR.
The last thing I want to leave you
with in this section on LLVM IR is a discussion
of these attributes.
And we already saw one of these attributes before.
It was this NSW thing attached the add instruction.
In general, these LLVM IR constructs
might be decorated with these extra words and keywords.
And those are the keywords I'm referring to as attributes.
Those attributes convey a variety of information.
So in this case, what we have here is C code
that performs this memory calculation,
which you might have seen from our previous lecture.
And what we see in the corresponding LLVM IR
is that there's some extra stuff tacked onto that load
instruction where you load memory.
One of those pieces of extra information is this align 4.
And what that align 4 attribute says
is it describes the alignment of that read from memory.
And so if subsequent stages of the compiler
can employ that information, if they can optimize
reads that are 4-byte aligned, then this attribute will say,
this is a load that you can go ahead and optimize.
There are a bunch of places where
attributes might come from.
Some of them are derived directly from the source code.
If you write a function that takes
a parameter marked as const, or marked as restrict, then
in the LLVM IR, you might see that the corresponding function
parameter is marked as no alias, because the restricted keyword
said this pointer can ever alias or the const keyword says,
you're only ever going to read from this pointer.
So this pointer is going to be marked read-only.
So in that case, the source code itself-- the C code--
was the source of the information
for those attributes.
There are some other attributes that occur simply
because the compiler is smart, and it
does some clever analysis.
So in this case, the LLVM IR has a load operation
that's 8-byte aligned.
It was really analysis that figured out the alignment
of that load operation.
Good so far?
Cool.
So let's summarize this part of the discussion with what
we've seen about LLVM IR.
LLVM IR is similar to assembly, but a lot simpler
in many, many ways.
All of the computed values are stored in registers.
And, really, when you're reading LLVM IR,
you can think of those registers a lot
like ordinary C variables.
LLVM IR is a little bit funny in that
it adopts a static, single assignment paradigm--
this invariant-- where each registered name, each variable
is written by, at most, one instruction within the LLVM IR
code.
So if you're ever curious where %14 is defined within this
function, just do a search for where %14 is on the left-hand
side of an equals, and there you go.
We can model of function in LLVM IR
as a control flow graph, whose nodes
correspond to basic blocks--
these blobs of straight line code--
and whose edges do node control flow among those basic blocks.
And compared to C, LLVM IR is pretty similar,
except that all of these operations are explicit.
The types are explicit everywhere.
The integer sizes are all apparent.
You don't have to remember that int really
means a 32-bit integer, and you need
n-64 to be a 64-bit integer, or you need a long or anything.
It's just i and then a bit width.
There no implicit operations at the LLVM IR level.
All the typecasts are explicit.
In some sense, LLVM IR is like assembly
if assembly were more like c.
And that's doubly a statement that would not
have made sense 40 minutes ago.
All right, so you've seen how to translate C code into LLVM IR.
There's one last step.
We want to translate the LLVM IR into assembly.
And it turns out that structurally speaking,
LLVM IR is very similar to assembly.
We can, more or less, map each line of LLVM IR
to some sequence of lines in the final assembly code.
But there is some additional complexity.
The compiler isn't done with its work
yet when it's compiling C to LLVM IR to assembly.
There are three main tasks that the compiler still
has to perform in order to generate x86-64.
First, it has to select the actual x86 assembly
instructions that are going to implement these various LLVM IR
operations.
It has to decide which general purpose registers are going
to hold different values and which values
need to be squirreled away into memory,
because it just has no other choice.
And it has to coordinate all of the function calls.
And it's not just the function calls
within this particular source file.
It's also function calls between that source file,
and other source files that you're compiling,
and binary libraries that are just sitting on the system.
But the compiler never really gets to touch.
It has to coordinate all of those calls.
That's a bit complicated.
That is going to be the reason for a lot of the remaining
complexity.
And that's what brings our discussion to the Linux
x86-64 calling convention.
This isn't a very fun convention.
Don't worry.
But nevertheless, it's useful.
So to talk about this convention,
let's first take a look at how a program gets laid out
in memory when you run it.
So when a program executes, virtually memory
gets organized into a whole bunch of different chunks
which are called segments.
There's a segment that corresponds to the stack that's
actually located near the top of virtual memory,
and it grows downwards.
The stack grows down.
Remember this.
There is a heap segment, which grows upwards
from a middle location in memory.
And those two dynamically-allocated segments
live at the top of the virtual address space.
There are then two additional segments--
the bss segment for uninitialized data
and the data segment for initialized data.
And finally, at the bottom of virtual address space,
there's a tech segment.
And that just stores the code of the program itself.
Now when you read assembly code directly,
you'll see that the assembly code
contains more than just some labels and some instructions.
In fact, it's decorated with a whole bunch of other stuff.
And these are called assembler directives,
and these directives operate on different sections
of the assembly code.
Some of those directives refer to the various segments
of virtual memory.
And those segment directives are used
to organize the content of the assembly file.
For example, the .text directive identifies some chunk
of the assembly, which is really code and should be located
in the text segment when the program is run.
The .bss segment identifies stuff that lives
in the assembler directive to identify stuff in the bss
segment.
The .data directive identify stuff in the data segment,
so on and so forth.
There are also various storage directives
that will store content of some variety
directly into the current segment-- whatever was last
identified by a segment directive.
So if, at some point, there is a directive x colon
dot space 20, that space directive says,
allocate some amount of memory.
And in this case, it says, allocate 20 bytes of memory.
And we're going to label that location x.
The .long segment says, store a constant long integer value--
in this case, 172--
in this example, at location y.
The asciz segment similarly stores a string
at that particular location.
So here, we're storing the string 6.172 at location z.
There is an align directive that aligns
the next content in the assembly file to an 8-byte boundary.
There are additional segments for the linker to obey,
and those are the scope and linkage directives.
For example, you might see .globl in front of a label.
And that single is linker that that particular symbol
should be visible to the other files that the linker touches.
In this case, .globl fib makes fib visible to the other object
files, and that allows this other object files to call
or refer to this fib location.
Now, let's turn our attention to the segment
at the top, the stack segment.
This segment is used to store data and memory in order
to manage function calls and returns.
That's a nice high-level description, but what exactly
ends up in the stack segment?
Why do we need a stack?
What data will end up going there?
Can anyone tell me?
STUDENT: Local variables in function?
TAO SCHARDL: Local variables in function.
Anything else?
You already answered once.
I may call on you again.
Go ahead.
STUDENT: Function arguments?
TAO SCHARDL: Sorry?
STUDENT: Function arguments?
TAO SCHARDL: Function arguments-- very good.
Anything else?
I thought I saw a hand over here, but--
STUDENT: The return address?
TAO SCHARDL: The return address.
Anything else?
Yeah?
There's one other important thing
that gets stored on stack.
Yeah?
STUDENT: The return value?
TAO SCHARDL: The return value--
actually, that one's interesting.
It might be stored on the stack, but it might not
be stored on the stack.
Good guess, though.
Yeah?
STUDENT: Intermediate results?
TAO SCHARDL: Intermediate results,
in a manner of speaking, yes.
There are more intermediate results
than meets the eye when it comes to assembly
or comparing it to C. But in particular,
by intermediate results, let's say, register state.
There are only so many registers on the machine.
And sometimes, that's not enough.
And so the function may want to squirrel away
some data that's in registers and stash it somewhere
in order to read it back later.
The stack is a very natural place to do it.
That's the dedicated place to do it.
So yeah, that's pretty much all the content
of what ends up on the call stack as the program executes.
Now, here's the thing.
There are a whole bunch of functions in the program.
Some of them may have been defined in the source file
that you're compiling right now.
Some of them might be defined in other source files.
Some of them might be defined in libraries
that were compiled by someone else,
possibly using a different compiler, with different flags,
under different parameters, presumably,
for this architecture-- at least, one hopes.
But those libraries are completely out of your control.
And now, we have this problem.
All those object files might define these functions.
And those functions want to call each other, regardless
of where those functions are necessarily defined.
And so somehow, we need to coordinate all those function
calls and make sure that if one function wants
to use these registers, and this other function
wants to use the same registers, those functions
aren't going to interfere with each other.
Or if they both want to read stack memory,
they're not going to clobber each other's stacks.
So how do we deal with this coordination problem?
At a high level, what's the high-level strategy
we're going to adopt to deal with this coordination problem?
STUDENT: Put the values of the registers on the stack
before you go into the function.
TAO SCHARDL: That will be part of it.
But for the higher level strategy--
so that's a component of this higher level strategy.
Yeah?
Go ahead.
STUDENT: Calling convention?
TAO SCHARDL: Calling convention.
You remembered the title of this section of the talk.
Great.
We're going to make sure that every single function,
regardless of where it's defined, they all abide
by the same calling convention.
So it's a standard that all the functions
will obey in order to make sure they all play nicely together.
So let's unpack the Linux x86-64 calling convention.
Well, not the whole thing, because it's actually
pretty complicated, but at least enough to understand
the basics of what's going on.
So a high level, this calling convention organizes the stack
segment into frames, such that each function instantiation--
each time you call a function--
that instantiation gets a single frame all to itself.
And to manage all those stack frames,
the calling convention is going to use these two pointers-- rbp
and rsp, which you should've seen last time.
rbp, the base pointer, will point to the top
of the current stack frame.
rsp will point to the bottom up the current stack frame.
And remember, the stack grows.
Now when the code executes call-and-return instructions,
those instructions are going to operate
on the stack, these various stock pointers, as well
as the instruction pointer, rip, in order
to manage the return address of each function.
In particular, when a call instruction gets executed,
in x86, that call instruction will
push the current value of rip onto the stack,
and that will be the return address.
And then the call instruction will jump to its operand.
It's operand being the address of some function in the program
memory, or, at least, one hopes.
Perhaps there was buffer overflow corruption
of some kind, and your program is in dire straits.
But presumably, it's the address of a function.
The return instruction complements the call,
and it's going to undo the operations of that call
instruction.
It'll pop the return address off the stack
and put that into rip.
And that will cause the execution
to return to the caller and resume execution
from the statement right after the original call.
So that's the high level of how the stack gets managed
as well as the return address.
How about, how do we maintain registers
across all those calls?
Well, there's a bit of a problem.
Because we might have two different functions
that want to use the same registers.
Some of this might be review, by the way, from 6004.
If you have questions, just let me know.
So we have this problem, where two different functions,
function A, which might call another function
B. Those two functions might want to use the same registers.
So who's responsible for making sure
that if function B operates on the same registers as A,
that when B is done, A doesn't end up
with corrupted state in its registers?
Well, they're two different strategies
that could be adopted.
One is to have the caller save off the register
state before invoking a call.
But that has some downsides.
The caller might waste work, saying, well,
I have to save all of this register state in case
the function I'm calling wants to use those registers.
If the calling function doesn't use those registers,
that was a bunch of wasted work.
So on the other side, you might say, well,
let's just have the callee save all that registered state.
But that could waste work if the callee
is going to save off register state that the caller wasn't
using.
So if the callee says, well, I want
to use all these registers.
I don't know what the calling function used,
so I'm just going to push everything on the stack, that
could be a lot of wasted work.
So what does the x86 calling convention
do, if you had to guess?
Yeah?
STUDENT: [INAUDIBLE]
TAO SCHARDL: That's exactly right.
It does a little bit of both.
It specifies some of the registers
as being callee-saved registers, and the rest of the registers
are caller-saved registers.
And so the caller will be responsible for saving
some stuff.
The callee will be responsible for saving other stuff.
And if either of those functions doesn't
need one of those registers, then it can avoid wasted work.
In x86-64, in this calling convention,
turns out that the rbx, rbp, and r12 through r15 registers
are all callee saved, and the rest of the registers
are caller saved.
In particular, the C linkage defined
by this calling convention for all the registers
looks something like this.
And that identifies lots of stuff.
It identifies a register for storing the return value,
registers for storing a bunch of the arguments,
caller-save registers, callee-saved registers,
a register just for linking.
I don't expect you to memorize this in 12 seconds.
And I think on any quiz-- well, I
won't say what the course app will do on quizzes this year.
STUDENT: [INAUDIBLE] everyone.
TAO SCHARDL: Yeah, OK, well, there you go.
So you'll have these slides later.
You can practice memorizing them.
Not sure on this slide.
There are a couple other registers
that are used for saving function arguments and return
values.
And, in particular, whenever you're passing floating point
stuff around, the xmm register 0 through 7
are used to deal with those floating point values.
Cool.
So we have strategies for maintaining the stack.
We have strategies for maintaining register states.
But we still have the situation where
functions may want to use overlapping
parts of stack memory.
And so we need to coordinate how all those functions are going
to use the stack memory itself.
This is a bit hard to describe.
The cleanest way I know describe it is just
to work through an example.
So here's the setup.
Let's imagine that we have some function A that
is called of function B. And we're
in the midst of executing function B,
and now, function B is about to call some other function C.
As we mentioned before, B has a frame all to itself.
And that frame contains a whole bunch of stuff.
It contains arguments that A passed to B.
It contains a return address.
It contains a base pointer.
It contains some local variables.
And because B is about to call C,
it's also going to contain some data for arguments
that B will pass to C.
So that's our setup.
We have one function ready to call another.
Let's take a look at how this stack memory
is organized first.
So at the top, we have what's called a linkage block.
And in this linkage block, this is the region
of stack memory, where function B will
access non-register arguments from its caller, function A.
It will access these by indexing off
of the base pointer, rbp, using positive offsets.
Again, the stack grows down.
B will also have a block of stack space
after the linkage block and return address and bass
pointer.
It will have a region of its frame for local variables,
and it can access those local variables
by indexing off of rbp in the negative direction.
Stack grows down.
If you don't have anything else, stack grows down.
Now B is about to call a function C,
and we want to see how all of this unfolds.
So before calling C, B is going to place non-register arguments
for C on to a reserved linkage block in its own stack memory
below its local variables.
And it will access those by indexing rbp
with negative offsets.
So those arguments from B to its callers
will specify those to be arguments from B to C. And then
what's going to happen?
Then B is going to call C. And as we saw before,
the call instruction saves off the return address
onto the stack, and then it branches
control to the entry point of function C.
When the function C starts, it's going
to execute what's called the function prologue.
And the function prologue consists of a couple of steps.
First, it's going to save off the base
pointer for B's stack frame.
So it'll just squirrel away the value of rbp onto the stack.
Then it's going to set rbp equal to rsp,
because we're now entering a brand new frame
for the invocation of C.
And then C can go ahead and allocate the space
that it needs on the stack.
This will be space that C needs for its own local variables,
as well as space that C will use for any linkage blocks
that it creates for the things that it calls.
Now there is one common optimization
that the compiler will attempt to perform.
If a function never needs to perform stack allocations,
except to handle these function calls--
in other words, if the difference between rbp and rsp
is a compile time constant, then the compiler
might go ahead and just get rid of rbp
and do all of the indexing based off the stack pointer rsp.
And the reason it'll do that is because, if it
could get one more general purpose register out
of our rbp, well, now, rpb is general purpose.
And it has one extra register to use
to do all of its calculations.
Reading from a register takes some time.
Reading from even L1 cache takes significantly more, I think,
four times that amount.
And so this is a common optimization
that the compiler will want to perform.
Now, turns out that there's a lot more
to the calling convention than just
what's shown on these slides.
We're not going to go through that today.
If you'd like to have more details,
there's a nice document-- the System V ABI--
that describes the whole calling convention.
Any questions so far?
All right, so let's wrap all this up with a final case
study, and let's take a look at how all these components fit
together.
When we're translating a simple C
function to compute Fibonacci numbers
all the way down to assembly.
And as you've been describing this whole time,
we're going to take this in two steps.
Let's describe our starting point, fib.c.
This should be basically no surprise to you at this point.
This is a C function fib, which computes the nth Fibonacci
number in one of the worst computational ways possible,
it turns out.
But it computes the nth Fibonacci number
f of n recursively using the formula f of n
is equal to n when n is either 0 or 1.
Or it computes f of n minus 1 and f of n minus 2
and takes their sum.
This is an exponential time algorithm
to compute Fibonacci numbers.
I would say, don't run this at home,
except, invariably, you'll run this at home.
There are much faster algorithms to compute Fibonacci numbers.
But this is good enough for a didactic example.
We're not really worried about how fast can we
compute fib today.
Now the C code fib.c is even simpler
than the recurrence implies.
We're not even going to bother checking that the input value
n is some non-negative value.
What we're going to do is say, look, if n is less than 2,
go ahead and return that value of n.
Otherwise, do the recursive thing.
We've already seen this go a couple of times.
Everyone good so far?
Any questions on these three lines?
Great.
All right, so let's translate fib.c into fib.ll.
We've seen a lot of these pieces in lectures so far.
And here, we've just rewritten fib.c a little bit
to make drawing all the lines a little bit simpler.
So here, we have the C code for fib.c.
The corresponding LLVM IR looks like this.
And as we could guess from looking at the code for fib.c,
we have this conditional and then
two different things that might occur based on
whether or not n is less than 2.
And so we end up with three basic blocks within the LLVM
IR.
The first basic block checks event is less than 2
and then branches based on that result.
And we've seen how all that works previously.
If n happens to be less than 2, then the consequent--
the true case of that branch--
ends up showing up at the end.
And all it does is it returns the input value,
which is stored in register 0.
Otherwise, it's going to do some straight line
code to compute fib of n minus 1 and fib of n minus 2.
It will take those return values, add them together,
return that result. That's the end Fibonacci number.
So that gets us from C code to LLVM IR.
Questions about that?
All right, fib n minus 1, fib n minus 2, add them, return it.
We're good.
OK, so one last step.
We want to compile LLVM IR all the way down to assembly.
As I alluded to before, roughly speaking,
the structure of the LLVM IR resembles the structure
of the assembly code.
There's just extra stuff in the assembly code.
And so we're going to translate the LLVM IR, more or less,
line by line into the assembly code
and see where that extra stuff shows up.
So at the beginning, we have a function.
We were defining a function fib.
And in the assembly code, we make
sure that fib is a globally accessible function using
some assembler directives, the globlfib directive.
We do an alignment to make sure that function
lies in a nice location in the instruction memory,
and then we declare the symbol fib, which just defines where
this function lives in memory.
All right, let's take a look at this assembly.
The next thing that we see here are
these two instructions-- a push queue or rbp
and a movq of rsp, rbp.
Who can tell me what these do?
Yes?
STUDENT: Push the base [INAUDIBLE] on the stack,
then [INAUDIBLE].
TAO SCHARDL: Cool.
Does that sound like a familiar thing we described earlier
in this lecture?
STUDENT: the calling convention?
TAO SCHARDL: Yep, it's part of the calling convention.
This is part of the function prologue.
Save off rpb, and then set rbp equal to rsp.
So we already have a couple extra instructions
that weren't in the LLVM IR, but must be in the assembly
in order to coordinate everyone.
OK, so now, we have these two instructions.
We're now going to push a couple more registers onto the stack.
So why does the assembly do this?
Any guesses?
Yeah?
STUDENT: Callee-saved registers?
TAO SCHARDL: Callee-saved registers--
yes, callee-saved registers.
The fib routing, we're guessing, will
want to use r14 rbx during this calculation.
And so if there are interesting values in those registers,
save them off onto the stack.
Presumably, we'll restore them later.
Then we have this move instruction for rdi into rbx.
This requires a little bit more arcane knowledge,
but any guesses as to what this is for?
STUDENT: rdi is probably the argument to the function.
TAO SCHARDL: rdi is the argument to the function.
Exactly.
That's the arcane knowledge.
So this is implicit from the assembly, which
is why you either have to memorize that huge chart of GPR
C linkage nonsense.
But all this operation does is it takes whatever
that argument was, and it's squirrels it away into the rbx
register for some purpose that we'll find out about soon.
Then we have this instruction, and this corresponds
to the highlighted instruction on the left,
in case that gives any hints.
What does this instruction do?
STUDENT: [INAUDIBLE].
TAO SCHARDL: Sorry.
STUDENT: It calculates whether n is small [INAUDIBLE]..
TAO SCHARDL: Correct.
It evaluates the predicate.
It's just going to do a comparison
between the value of n and the literal value of 2,
comparing against 2.
So based on the result of that comparison, if you recall,
last lecture, the results of a comparison
will set some bits in this implicit EFLAGS flags register,
or RFLAGS register.
And based on the setting of those bits,
the various conditional jumps that occur next in the code
will have varying behavior.
So in case the comparison results to false-- if n is,
in fact, greater than or equal to 2--
then the next instruction is jge, will jump to the label
LBB0 underscore 1.
You can tell already that reading assembly is super-fun.
Now that's a conditional jump.
And it's possible that the setting of bits in RFLAGS
doesn't evaluate true for that condition code.
And so it's possible that the code will just fall through
pass this jge instruction and, instead, execute
these operations.
And these operations correspond to the true side of the LLVM IR
branch operation.
When n is less than 2, this will move n into rax,
and then jumped to the label LBB03.
Any guesses as to why it moves n into our rax?
Yeah?
STUDENT: That's the return value.
TAO SCHARDL: That's a return value-- exactly.
If it can return a value through registers,
it will return it through rax.
Very good.
So now, we see this label LBBO1.
That's the label, as we saw before,
for the false side of the LLVM branch.
And the first thing in that label is this operation--
leaq minus 1 of rbx rdi.
Any guesses as to what that's for?
The corresponding LLVM IR is highlighted on the left,
by the way.
The lea instruction means load-effective address.
All lea does is an address calculation.
But something that compilers really like to do
is exploit the lea instruction to do simple integer arithmetic
as long as that integer arithmetic fits with the things
that lea can actually compute.
And so all this instruction is doing
is adding negative 1 to rbx.
And rbx, as we recall, stored the input value of n.
And it will store the result into rdi.
That's all that this instruction does.
So it computes the negative 1, stores it into rbi.
How about this instruction?
This one should be easier.
STUDENT: For the previous one, how did you get [INAUDIBLE]??
I'm familiar with [INAUDIBLE] because [INAUDIBLE]..
But is there no add immediate instruction in x86?
TAO SCHARDL: Is there no add immediate instruction?
So you can do an add instruction in x86
and specify an immediate value.
The advantage of this instruction
is that you can specify a different destination operand.
That's why compilers like to use it.
More arcane knowledge.
I don't blame you if this kind of thing
turns you off from reading x86.
It certainly turns me off from reading x86.
So this instruction should be a little bit easier.
Guess as to why it does?
Feel free to shout it out, because we're
running a little short on time.
STUDENT: Calls a function.
TAO SCHARDL: Calls a function.
What function?
STUDENT: Call fib.
TAO SCHARDL: Call fib, exactly.
Great.
Then we have this move operation,
which moves rax into r14.
Any guess as to why we do this?
Say it.
STUDENT: Get the result of the call.
TAO SCHARDL: Get the result of the call.
So rax is going to store the return value of that call.
And we're just going to squirrel it away into r14.
Question?
STUDENT: [INAUDIBLE]
TAO SCHARDL: Sorry.
STUDENT: It stores [INAUDIBLE]?
TAO SCHARDL: It'll actually store the whole return value
from the previous call.
STUDENT: [INAUDIBLE]
TAO SCHARDL: It's part of that result. This
will be a component in computing the return
value for this call of fib.
You're exactly right.
But we need to save off this result,
because we're going to do, as we see, another call to fib.
And that's going to clobber rax.
Make sense?
Cool.
So rax stores the result of the function.
Save it into r14.
Great.
Since we're running short of time,
anyone want to tell me really quickly what
these instructions do?
Just a wild guess if you had to.
STUDENT: N minus 2
TAO SCHARDL: n minus 2.
Compute n minus 2 by this addition operation.
Stash it into rdi.
And then you call fib on n minus 2.
And that will return the results into rax, as we saw before.
So now, we do this operation.
Add r14 into rax.
And this does what?
STUDENT: Ends our last function return to what
was going off this one.
TAO SCHARDL: Exactly.
So rax stores the result of the last function return.
Add it into r14, which is where we stashed
the result of fib of n minus 1.
Cool.
Then we have a label for the true side of the branch.
This is the last pop quiz question I'll ask.
Pop quiz-- God, I didn't even intend that one.
Why do we do these pop operations?
In the front.
STUDENT: To restore the register before exiting the stack frame?
TAO SCHARDL: Restore the registers
before exiting the stack frame-- exactly.
In calling convention terms, that's
called the function epilogue.
And then finally, we return.
So that is how we get from C to assembly.
This is just a summary slide of everything we covered today.
We took the trip from C to assembly via LLVM IR.
And we saw how we can represent things in a control flow graph
as basic blocks connected by control flow edges.
And then there's additional complexity
when you get to the actual assembly, mostly to deal
with this calling invention.
That's all I have for you today.
Thanks for your time.