Subtitles section Play video
In a previous video, I've talked about linear systems of equations, and I sort of brushed
aside the discussion of actually computing solutions to these systems.
And while it's true that number-crunching is something we typically leave to the computers,
digging into some of these computational methods is a good litmus test for whether or not you
actually understand what's going on, since this is really where the rubber meets the
road.
Here I want to describe the geometry behind a certain method for computing solutions to
these systems, known as Cramer's rule.
The relevant background needed here is an understanding of determinants, dot products,
and of linear systems of equations, so be sure to watch the relevant videos on those
topics if you're unfamiliar or rusty.
But first!
I should say up front that Cramer's rule is not the best way for computing solutions
to linear systems of equations.
Gaussian elimination, for example, will always be faster.
So why learn it?
Think of this as a sort of cultural excursion; it's a helpful exercise in deepening your
knowledge of the theory of these systems.
Wrapping your mind around this concept will help consolidate ideas from linear algebra,
like the determinant and linear systems, by seeing how they relate to each other.
Also, from a purely artistic standpoint, the ultimate result is just really pretty to think
about, much more so that Gaussian elimination.
Alright, so the setup here will be some linear system of equations, say with two unknowns,
x and y, and two equations.
In principle, everything we're talking about will work systems with a larger number of
unknowns, and the same number of equations.
But for simplicity, a smaller example is nicer to hold in our heads.
So as I talked about in a previous video, you can think of this setup geometrically
as a certain known matrix transforming an unknown vector, [x; y], where you know what
the output is going to be, in this case [-4; -2].
Remember, the columns of this matrix tell you how the matrix acts as a transform, each
one telling you where the basis vectors of the input space land.
So this is a sort of puzzle, what input [x; y], is going to give you this
output [-4; -2]?
Remember, the type of answer you get here can depend on
whether or not the transformation squishes all of space into a lower dimension.
That is if it has zero determinant.
In that case, either none of the inputs land on our given output or there are a whole bunch
of inputs landing on that output.
But for this video we'll limit our view to the case of a non-zero determinant, meaning
the output of this transformation still spans the full n-dimensional space it started in;
every input lands on one and only one output and every output has one and only one input.
One way to think about our puzzle is that we know the given output vector is some linear
combination of the columns of the matrix; x*(the vector where i-hat lands) + y*(the
vector where j-hat lands), but we wish to compute what exactly x and y are.
As a first pass, let me show an idea that is wrong, but in the right direction.
The x-coordinate of this mystery input vector is what you get by taking its dot product
with the first basis vector, [1; 0].
Likewise, the y-coordinate is what you get by dotting it with the second basis vector,
[0; 1].
So maybe you hope that after the transformation, the dot products with the transformed version
of the mystery vector with the transformed versions of the basis vectors will also be
these coordinates x and y.
That'd be fantastic because we know the transformed versions of each of these vectors.
There's just one problem with this: it's not at all true!
For most linear transformations, the dot product before and after the transformation will be
very different.
For example, you could have two vectors generally pointing in the same direction, with a positive
dot product, which get pulled away from each other during the transformation, in such a
way that they then have a negative dot product.
Likewise, if things start off perpendicular, with dot product zero, like the two basis
vectors, there's no guarantee that they will stay perpendicular after the transformation,
preserving that zero dot product.
In the example we were looking at, dot products certainly aren't preserved.
They tend to get bigger since most vectors are getting stretched.
In fact, transformations which do preserve dot products are special enough to have their
own name: Orthonormal transformations.
These are the ones which leave all the basis vectors perpendicular to each other with unit
lengths.
You often think of these as rotation matrices.
The correspond to rigid motion, with no stretching, squishing or morphing.
Solving a linear system with an orthonormal matrix is very easy: Since dot products are
preserved, taking the dot product between the output vector and all the columns of your
matrix will be the same as taking the dot products between the input vector and all
the basis vectors, which is the same as finding the coordinates of the input vector.
So, in that very special case, x would be the dot product of the first column with the
output vector, and y would be the dot product of the second column with the output vector.
Now, even though this idea breaks down for most linear systems, it points us in the direction
of something to look for: Is there an alternate geometric understanding for the coordinates
of our input vector which remains unchanged after the transformation?
If your mind has been mulling over determinants, you might think of this clever idea: Take
the parallelogram defined by the first basis vector, i-hat, and the mystery input vector
[x; y].
The area of this parallelogram is its base, 1, times the height perpendicular to that
base, which is the y-coordinate of our input vector.
So, the area of this parallelogram is sort of a screwy roundabout way to describe the
vector's y-coordinate; it's a wacky way to talk about coordinates, but run with me.
Actually, to be more accurate, you should think of the signed area of this parallelogram,
in the sense described by the determinant video.
That way, a vector with negative y-coordinate would correspond to a negative area for this
parallelogram.
Symmetrically, if you look at the parallelogram spanned by the vector
and the second basis vector, j-hat, its area will be the x-coordinate of the vector.
Again, it's a strange way to represent the x-coordinate, but you'll see what it buys
us in a moment.
Here's what this would look like in three-dimensions: Ordinarily the way you might think of one
of a vector's coordinate, say its z-coordinate, would be to take its dot product with the
third standard basis vector, k-hat.
But instead, consider the parallelepiped it creates with the other two basis vectors,
i-hat and j-hat.
If you think of the square with area 1 spanned by i-hat and j-hat as the base of this guy,
its volume is the same its height, which is the third coordinate of our vector.
Likewise, the wacky way to think about any other coordinate of this vector is to form
the parallelepiped between this vector an all the basis vectors other than the one you're
looking for, and get its volume.
Or, rather, we should talk about the signed volume of these parallelepipeds, in the sense
described in the determinant video, where the order in which you list the three vectors
matters and you're using the right-hand rule.
That way negative coordinates still make sense.
Okay, so why think of coordinates as areas and volumes like this?
As you apply some matrix transformation, the areas of the parallelograms don't stay the
same, they may get scaled up or down.
But(!), and this is a key idea of determinants, all these areas get scaled by the same amount.
Namely, the determinant of our transformation matrix.
For example, if you look the parallelogram spanned by the vector where your first basis
vector lands, which is the first column of the matrix, and the transformed version of
[x; y], what is its area?
Well, this is the transformed version of that parallelogram we were looking at earlier,
whose area was the y-coordinate of the mystery input vector.
So its area will be the determinant of the transformation multiplied by that value.
So, the y-coordinate of our mystery input vector is the area of this parallelogram,
spanned by the first column of the matrix and the output vector, divided by the determinant
of the full transformation.
And how do you get this area?
Well, we know the coordinates for where the mystery input vector lands, that's the whole
point of a linear system of equations.
So, create a matrix whose first column is the same as that of our matrix, and whose
second column is the output vector, and take its determinant.
So look at that; just using data from the output of the transformation, namely the columns
of the matrix and the coordinates of our output vector, we can recover the y-coordinate of
our mystery input vector.
Likewise, the same idea can get you the x-coordinate.
Look at that parallelogram we defined early which encodes the x-coordinate of the mystery
input vector, spanned by the input vector and j-hat.
The transformed version of this guy is spanned by the output vector and the second column
of the matrix, and its area will have been multiplied by the determinant of the matrix.
So the x-coordinate of our mystery input vector is this area divided by the determinant of
the transformation.
Symmetric to what we did before, you can compute the area of that output parallelogram by creating
a new matrix whose first column is the output vector, and whose second column is the same
as the original matrix.
So again, just using data from the output space, the numbers we see in our original
linear system, we can recover the x-coordinate of our mystery input vector.
This formula for finding the solutions to a linear system of equations is known as Cramer's
rule.
Here, just to sanity check ourselves, plug in the numbers here.
The determinant of that top altered matrix is 4+2, which is 6, and the bottom determinant
is 2, so the x-coordinate should be 3.
And indeed, looking back at that input vector we started with, it's x-coordinate is 3.
Likewise, Cramer's rule suggests the y-coordinate should be 4/2, or 2, and that is indeed the
y-coordinate of the input vector we started with here.
The case with three dimensions is similar, and I highly recommend you pause to think
it through yourself.
Here, I'll give you a little momentum.
We have this known transformation, given by a 3x3 matrix, and a known output vector, given
by the right side of our linear system, and we want to know what input vector lands on
this output vector.
If you think of, say, the z-coordinate of the input vector as the volume of this parallelepiped
spanned by i-hat, j-hat, and the mystery input vector, what happens to the volume of this
parallelepiped after the transformation?
How can you compute that new volume?
Really, pause and take a moment to think through the details of generalizing this to higher
dimensions; finding an expression for each coordinate of the solution to larger linear
systems.
Thinking through more general cases and convincing yourself that it works is where all the learning
will happen, much more so than listening to some dude on YouTube walk through the reasoning
again.