Subtitles section Play video
[MUSIC PLAYING]
SPEAKER: So today we're going to have our first of a few discussions
about cybersecurity, and later on we're going
to talk a little bit about cybersecurity in the context of the internet
and some of the challenges that it brings up there.
But today we're going to focus mostly on cybersecurity issues related
to your machine, your computer without necessarily
being connected to the internet.
Before we do, we need to understand a little bit more
about our machine's infrastructure, its hardware.
And the biggest question to ask at the outset
is, when we talk about the system's memory, what do we mean by that?
That term kind of gets thrown around and it means a couple of different things,
potentially.
It might mean your system's RAM or random access
memory, which is a rough translation of how much computing power it has,
how many things it can do.
And we can also talk about hard drive space
as another example of system memory.
Hard drive space is usually just free storage, basically.
How much room do we have to literally store files on our machine?
How much memory does your computer have?
Maybe you do or maybe you don't know.
If you take a look at your system information
or look up the computer that you bought on the internet,
you might find that if we're quoting memory in terms of RAM,
that your device might have as low as 512 megabytes of RAM, which
is about half of a gigabyte.
And that's not very much, most machines have much more than that now
unless you have a low powered Chromebook,
for example, that you use for travel.
Memory on the RAM scale might go as high as 32 gigabytes of RAM,
which is quite a bit more than that.
That's generally for really high end computers.
Computers, in particular, that process a lot of graphics.
So sometimes computers that are specifically dedicated for gaming
might have that much RAM.
But typically the range is somewhere between four and 16 nowadays.
When we're talking about hard drive space, that number is quite a bit
bigger.
So the typical hard drive nowadays might be as low as 128 gigabytes,
if the drive is a solid state drive, versus a hard disk drive.
We won't go into too much detail about the distinction between those two
things, other than right now to say those are just two
different ways to store data long term.
So that might be the low end.
The high end is probably somewhere on two terabytes of information.
One terabyte is 1000 gigabytes, give or take.
So two terabytes would be about 2000, give or take, gigabytes.
So quite a bit.
Maybe even as high as four terabytes.
That's quite a bit of storage information.
That's enough to store several hundred HD, high quality films.
But there's much more to memory than just RAM and hard disk space.
There's actually kind of a hierarchy of memory that exists within your machine.
Most of these numbers, though, aren't usually quoted
in the specs of a device.
So there's RAM, random access memory, and then
there's a series of caches, each of which gets successively smaller.
So they're going to be quite a bit smaller than the four gigs,
say, of RAM that your device has.
But they're also a little bit faster, and the reason these things get faster,
these caches get faster, is they are getting closer and closer
to the computer's processor, which is really the only part of the device that
is able to manipulate information.
It's the only part that can process information.
So the memory that we're feeding to that processor
needs to get faster and faster, such that it
can continue to swap things in and out.
So we have the RAM, maybe an L3 cache, a Level 3 cache, Level 2, Level 1,
and then finally CPU memory, which is the processor memory itself.
Plus some small bits of memory called registers,
which are used to be the final sort of pass of information from RAM
or this hierarchy of memory into the CPU.
But again, every file on your machine lives somewhere permanently
on a disk drive.
And there are, again, two different kinds of disk drives.
We have solid state drives and hard disk drives.
We should treat them as effectively identical
for purposes of our discussion today.
They-- solid state drives tend to behave a bit differently than hard disk
drives, they tend to be a bit more secure than some of the vulnerabilities
that hard disk drives present, which we're
going to talk about a little bit later in today's lecture.
But in general, when we talk about hard disks or storage space
for the rest of today's lecture, we're going
to be mostly focusing on hard disk drives.
They're also just much more prevalent still.
Solid state drives are coming into their own and becoming more and more frequent
as they appear in devices, but hard disk drives
are still far and away more and more prevalent within devices
that exist now.
They are just storage space, though, we can't do anything
with data that is stored on disk.
We have to first move it to RAM and then have
it sort of go up and down that chain of RAM, the different caches to the CPU,
in order to actually manipulate the data.
Once we're done manipulating it, and maybe we're
turning our computer off for the evening,
then all of the data that is in RAM will be stored back into the hard disk space
so that we're able to access it at another time.
One thing to keep in mind as we begin this discussion of memory,
though, is that memory is really just an array.
And we've talked about arrays already, where each cell of that array
basically is one byte wide.
And recall that one byte is eight bits.
We may have anywhere between 512 megabytes of memory,
so about 512 million of those little one byte
sized cells, maybe as high as four, 8, 16, and so on gigabytes.
And we have quite a few of those items in our array.
But it really is just an array, which means
we can jump to different addresses.
It has the same properties as any other random access
array that we've already discussed.
Different types of data take up different amounts of memory
on our systems.
So if we think about a very low level programming language like C,
which is this is just an example.
Different programming languages may store different types of data
using different amounts of space.
But if we look to just the most base level of data
and think about the smallest individual pieces into which we can break it,
we may be able to store an integer, for example, in four byte.
Which means we have exactly 32 bits worth of space to store an integer.
Characters will take up one byte, so we have only eight bits worth of memory
required to store a single character.
So capital or lowercase letters, digits, punctuation marks, and so on.
Not a huge variety of options there.
Floats are-- you may recall are real numbers,
numbers that have decimal points in them.
Doubles are, as well.
They're double precision floating point values
and they take up four or eight bytes.
So basically the idea here is different types of memory
will take up different amount of space and then
we eventually can construct these things into pixels, and images, and films,
each of which will also take up different amounts of space and memory
if we are manipulating or working with that data.
So again, let's think of memory as a big array of individual byte-sized cells.
Because it is an array, that means we have random accessability.
We can say, I want to go to memory address x and see what is there.
I want to go to memory address y and change what is there.
We have the ability to do that.
We don't have to iterate through step by step by step in order to make changes.
If we did, the processor would be quite a bit slower having to perform this,
we might term linear search as we try to iterate through memory
to find the one byte we're looking for.
It's very helpful to be able to jump to a particular byte.
And that means that every location in memory must have an address.
We must have a way to refer to that individual byte
in order to randomly access it.
We can't just look at this grid of cells and say, I want to go to this one
and sort of, you know, imagine particular spot.
We need to say, I want to go to exactly this memory address.
OK?
So s-- the fact that memory cells have an address
is what comes into play when you think about this idea
of a 32-bit system or a 64-bit system, and this
may be a term that you've heard before.
It refers to the ability to process an address.
So for example, a 32-bit computer, a 32-bit system,
can process memory addresses up to 32 bits in length.
Which means it understands memory address zero through memory address
right up to four billion, a little over four billion.
But it doesn't understand memory past that.
Now interestingly, this doesn't mean that a 32-bit system
is limited to four gigabytes of RAM.
There are some software tricks that we can pull using something called virtual
memory, which we're not going to get into in any more depth than to refer
to it as virtual memory today, that allow you to use more than four
gigabytes of RAM on a 32-bit system by doing-- sort of, you know,
pretending that things live somewhere where they don't.
But when you talk about a 64-bit system, that
means we have many more memory cells that we
can refer to without running into our sort of artificial limit of how high we
can count.
Now granted, there are no memory banks out there
that have all of the memory addresses from zero to 64 bits worth of memory.
That's somewhere in the quintillion or higher.
It's a very, very large number and we don't yet
have the storage capacity to store that much data on our machines.
But theoretically, it is possible that with a 64-bit system
we could have very, very large amounts of RAM and again, the more RAM we have,
generally the more quickly our computer is
going to operate because there's more space for it to store information.
It doesn't have to keep sending stuff back to the hard drive
when the RAM is full because there's so much information
being processed at once.
More of it is available in that quicker, more accessible bit of memory.
So recall that with each bit, remember a bit can only take on one of two states.
Zero or one, off or on.
Or you can think about it in terms of electricity, which is how RAM actually
works, as being unpowered or powered.
That again means that we have 32--
two to the 32nd power, excuse me, possible memory addresses.
So about four billion memory addresses.
Now it is sometimes the case that programmers, and subsequently,
those who may need to read their code, may need a way
to refer to specific memory addresses.
But a memory address like this, which is a memory address.
There are zeros and ones in this address.
This is exactly how we would refer to an address in memory.
This is rather cumbersome.
No programmer wants to talk to another programmer and no programmer wants
to talk to an advisor by saying the code that lives at 00101 and so on.
That's just not-- that doesn't make any sense.
That's just not how we would talk and it would take forever just
to say the name of the memory before you even get
to the point of what is in that memory.
And so rather than using binary notation to refer to a memory address,
computer scientists will oftentimes use something called hexadecimal notation.
Hexadecimal is 16 hexadecimal, 6 and 10.
And so this is the base 16 number system.
It's a different number system than the decimal system, base 10,
that we have used since childhood to count and understand
place values of numbers and so on.
What's convenient about hexadecimal being
base 16 versus binary being base two is that four binary digits or four bits
can be represented using a single what is often called hex digit.
So for every group of four binary digits that we have,
we can represent that more succinctly using just one hexadecimal digit.
And because there are four bits, that means
we have two to the fourth, or 16 different combinations.
So we can account for every single possible on off
combination of all of the four bits in that cluster using a single hex digit.
So we might instead refer to this memory address looking like this.
And there are some letter characters in there,
and that's because in order to represent a single digit in hexadecimal,
we need to be on the count higher than 10
using two digits, as we are confined to in decimal.
In order to represent the number 10, we need
a one and zero, a one being in the tens place and a zero in the ones place.
But in hexadecimal, we need 16 possible digits
to represent all of the 16 possible values at any given place value.
So here's an example of something that a programmer might see.
This is using a tool called GDB, which is
a debugging tool that is used to debug or root out problems in some low level
code.
And all we're seeing here is a bunch of memory addresses.
So I've highlighted them here in yellow.
We don't need to worry too much about the context around this, what these all
refer to.
But basically, these things on the left, EAX, ECX and so on are registers.
Those are things that are very close to the memory.
And they are storing the memory address of something else.
And so all these things on the left here are just memory addresses,
and the things on the right are translations of those memory addresses
in some cases into decimal numbers that make
more sense to us having used the base 10 or decimal system for quite some time.
So we can map all of the different possible values in hexadecimal
to their binary equivalents as well as to decimal numbers
that we're familiar with.
So again, here we have all of the possible combinations of four bits
or zeros and ones showing you what they translate to in decimal,
recalling that for every set of four bits here we see, the one on the right
is the ones place, the one to its left is the twos place.
Then we have the fours place and the eights place.
Because again, our base is two.
Every place value is a power of two as opposed to a power of 10
like we would in decimal.
And then it's hexadecimal equivalent.
So again, for every single one of those combinations,
we have one distinct way to represent it using a single hex digit.
And sometimes you'll see the hex digits for 10
through 15, which are a through f, presented in capital letters.
I like to present them in capital letters,
but sometimes you see them in lowercase letters as well.
That is immaterial to it.
And this zero x at the beginning of it, I should mention that as well.
Zero x means absolutely nothing.
It is purely a note for us as human beings
when we are seeing something like this that we should interpret it
as hexadecimal numbers as opposed to as decimal, for example.
Because we could have a valid hexadecimal string that is--
I'm going to use the zero x here just for second--
0x, five, zero.
If we saw that, we might read it if we didn't have a 0x in front of it,
we might read that as 50, which would be not actually accurate, because 0x,
five, zero is actually 80 in decimal notation.
So that 0x is really just a guide for us as human beings
to say, OK, what I'm about to read here is a hexadecimal number.
Let's just do a quick exercise where we translate
some binary into hexadecimal and then subsequently into decimal as well.
And so here, we have eight bits, each of which again is a zero or a one,
and our goal is to translate this into ultimately decimal,
but let's start by translating it into hexadecimal.
The first approach is counting from right to left,
we want to split these into groups of four.
It so happens that we have eight bits here,
and so this splits pretty cleanly into two groups of four.
But if we, for example, had seven bits, like if this wasn't here,
we would start by having one zero one zero,
and then whatever we had left over, we would just
pad with extra zeros at the front so we always had a cluster of four bits
at a time to work with.
Each of these maps directly to a single hexadecimal digit.
And sometimes you may be able to just quickly do this in your head,
or you can jump back to the table that we had here
to see when I see this particular pattern,
I want to plug in this hexadecimal digit.
And so if we do that here, we see that the one on the left, 0010,
this is in binary again.
A zero in the ones place, a one in the twos place,
and nothing else, which means we have one times two.
And so this would be a two.
And 1010, well, that's a one in the eights place and a one
in the twos place, which is 10.
But in hexadecimal, we would represent that as a, because again,
we need to confine this idea of 10 to a single place value.
We can't have two digits to represent it using hexadecimal notation.
And so this binary value, 001010, is 0x--
again, human convention to prepend a 0x in front of anything
that is a hexadecimal number--
0x2a.
Now, how do we translate this to decimal?
Well, it may help to think about how we translate this or understand
this number, 123.
When we see it, one two three just written out,
we are really doing something like this in our head where we're saying,
there's a one in the one hundreds place, there's a two in the tens place,
and there's a three in the ones place.
And we've just over time internalized that
and have been able to very quickly understand
that the number I'm talking about here is 123.
Well, another way to think about these labels here,
one hundreds place, tens place, and ones place, might be to say,
we have the 10 squareds place or the 10 to the second powers place,
the 10 to the first powers place, and the ten to the zero powers place.
Any number to the zero power is always one,
and so this is really the ones place, the tens place, and the hundreds place.
With hexadecimal, we don't have 10 as the base of the exponent here.
Instead, we have 16 as the base of the exponent.
But the rules are the same.
We have a 16 to the zero place which is one.
We have 16 to the first power or 16s place,
and we have a 16 squared or 256s place.
In our example number here, we didn't go that high.
We had 0x2a.
We only had two digits, which means we really
only needed these two place values, the 16 to the zero power and the 16
to the one power.
Now, we just translate this in exactly the same way that we would intuitively
do it in when we're counting in decimal or reading a decimal number.
This is zero times 16 squared plus two times 16
to the first power plus a times one, or 16 to the zero power.
Two times 16 is 32, and a, which again is hexadecimal's way of representing
10, 10 times one is ten, so what we're really saying
is that we have 32 plus 10.
And so to translate this hexadecimal number, 0x2a, into decimal,
we end up with 42, because 42 is 32 plus 10.
So hopefully, that gives you a bit of a better understanding
of what these cryptic number strings that you might have seen before mean.
And if you're working with programmers or you're ever analyzing source code
and you see references like this, hopefully this
gives you a better understanding of what they mean
and what they likely refer to on the system
and how that might affect things.
Let's talk a little bit more about the function, how memory actually
works now that we know how to access individual parts of it.
With the exception of hard disk space-- so again,
the permanent storage space on your device--
memory on your computer is termed volatile,
which means two different things.
One, that the memory is constantly changing.
Things are cycling in and out of it.
It's very dynamic in terms of the values that are being stored there,
again because the RAM is sort of this holding ground for everything that's
going to eventually need to go to the processor,
and things are getting swapped in and out pretty frequently.
But the other really key detail about volatile memory
is that it requires power.
If it is unpowered, if there is not electricity literally
flowing to the RAM at any given time, that is a problem
and that memory will no longer work.
In fact, after some amount of time, a pretty small amount time
like 30 seconds to a minute perhaps, without power, the electrical charge
which is used to maintain each of those individual cells of memory--
remember, a little bit of electricity being one,
and the absence of electricity being zero
is how the computer can store this idea of zeros and ones
on a physical manifestation thereof.
Without power, that electrical charge eventually dissipates.
It does not just stay.
it goes away.
And the state is eventually lost such that unpowered for about a minute
or so, all the data in RAM has effectively turned into zeros.
It has completely become completely unpowered.
Now obviously, that would be very bad if our entire system
relied on this technology.
But it's only RAM and the caches from RAM going forward that rely on this.
Processing can only happen in the processor.
This probably makes a little bit of sense.
And again, recall that a 32-bit processor
can understand 32-bit addresses.
That also means that it only has 32 bits of space in which to do anything.
So it only can work with four bytes of information at a time.
And maybe if you have a computer that has multiple cores,
maybe you've heard that term before, multicore processors,
you might have a few of these processors that can do four bytes at a time.
But either way, we're still talking about a very, very small amount
of information, maybe four to 16 or 32 bytes.
That's not very much at all when you consider
that a basic document perhaps using Microsoft Word
will contain enough metadata to be about 15,000 bytes before you even
type a single character into it.
So a lot of metadata there, and that amount of empty files
gets pretty big pretty quickly.
Because the process can only process 32 bits worth of information
at a time, any given processor, we need to move data to it frequently.
And that's what the caches are for, and that's
why each one needs to be faster and be able to get information
to the processor pretty quickly.
Because even though the processor can only
process four bytes or 32 bits worth of information at any given time,
it can do two to three billion operations per second,
so that's what a gigahertz is.
And in terms of when a processor's speed is quoted,
it's sometimes said it's like 2.4 gigahertz or 2.6 gigahertz or so on.
That means that the computer can do 2.4 to $2.6 billion things per second.
So again, 32 bits, not a lot of information at any instant,
but there's a lot of those instants within a second.
It can do two to three billion things per second, each one of those things
operating on exactly four bytes at a time, 32 bits at a time,
on a 32-bit processor, as opposed to a 64-bit processor which
can process a little bit more data.
Let's take a look now at what we determine
on your computer as the motherboard, or sort of the control
processor for everything that your computer does,
and highlight some of the different pieces of where
things live on your physical device.
So right here are some slots for RAM, so these are
basically sticks that get plugged in.
A RAM stick is just a green chip.
It looks similar to the motherboard.
They're usually green.
They have some gold connector pins at the bottom of them,
and they plug into the motherboard.
And information can then be stored there and flow to and from when
needed by the processor and so on.
So that's where these go.
This particular motherboard, which is from a computer that's
about 15 years old.
For example, I don't think most of us have
floppy drive connectors on our computers anymore, but this one still does.
Here is where the CPU would live, so this
is where the actual processor goes.
And that processor again can only do 32 or 64 bits worth of information
at any given time.
And on top of the CPU, it's not pictured here, but typically on top of the CPU
there's a giant fan, literally like mounted or screwed right above it.
And again, that's because the computer is doing two to three billion things
a second, so it gets quite hot.
And to prevent a CPU meltdown or a core meltdown,
you want to make sure to have air constantly flowing
across the top of the device as well as a heat
sink to pull all the heat away from the CPU such that it doesn't overheat,
which would create quite a big problem and eventually might
result in computer breakage if left to overheat
for a prolonged period of time.
Over here is a graphics processor.
Graphics processors are really just CPUs that
are specialized to do certain operations that make interpreting graphics
on your monitor much easier.
The math for those is usually a bit more complicated,
and so modern devices may have both a CPU and a GPU, a Graphical Processor
Unit, as opposed to relying on just the CPU you to handle
all of those different things.
And it similarly would have a heat sink and a fan mounted with it as well.
And then over here at the top, it's pretty small.
There are things called SATA connectors.
SATA connectors are what you might use to connect hard drives to your machine
so that you can extend the storage capacity of the device.
But all of these things might live on your computer,
and also all of these things in shrunk down form will live on your laptop
and even in your mobile phone.
This basic idea exists just in smaller and smaller scales
with all of the parts being similarly scaled down.
So again, CPU memory, what actually lives in the CPU as well
as the registers, those really fast things right around the CPU memory,
is the fastest memory on your machine.
But there's the least of it.
And the reason for this is that it's very, very expensive.
It is the most expensive stuff in your computer.
That is basically the price that you are paying
when you buy the computer is for that processor
and the materials that are used to allow electricity
to conduct through it very quickly really
determines the cost of the device.
So there's the least amount of it, but it is the most important memory
on your machine.
The caches, one two and three, are each successively slower than CPU memory
but also successively cheaper.
So your l1 cache is going to be a little bit slower than your CPU,
but there will be a little bit more of it.
And your l1 cache will be a little bit larger than the CPU space
that you have, but it'll be a little bit cheaper.
The l2 cache may be a little bit larger than the l1 cache
but a little bit cheaper.
Again, this is really just referring to the materials that are
used to make the memory operational.
RAM is slower but cheaper.
RAM typically used to be the most expensive
or be considered the driving cost.
If you had more RAM in your computer, that made it more powerful.
That was the cost driver.
This is becoming less and less the case.
It's still more expensive than hard disk space, which
is effectively free at this point.
It's really just how much stuff we can literally
fit into the container for the hard disk itself, which is just pure storage.
But RAM is slower memory than any of the caches,
but you're able to have more of it because it is less expensive.
So that's memory.
But in terms of hard disk space, that does not work in the same way
that RAM and the other volatile memories work,
and hard disk space is non-volatile.
Information in the hard disk is not changed terribly often,
only when we're certain that we're done working with it in RAM.
And the data there is also persistent, and that's
because it does not rely on electricity to store state.
Instead, and we're talking again specifically now about hard disk drive,
solid state drives behave a little bit differently.
They use microchips that do some different things.
But we're talking about hard disk space, HDDs, traditional hard disks.
Each cell of a hard disk is instead controlled by magnetism,
so data is stored magnetically.
If there is a--
we'll just say for purposes of this discussion
here that if the magnetism is in a down position, so south for example,
it's oriented south, that would be zero.
That's a way to represent zero.
And any magnet that is in the up position
is one, so we can have these flip states of the polarity is pointing up or north
and the polarity is pointing down or south to represent zero and one as
opposed to using powered versus unpowered to represent one and zero,
respectively in a RAM or volatile memory situation.
Because these magnets, though, don't require power
in order to work long term, that means that when the computer shuts off
and they become unpowered, the data remains.
And this is a really good thing, right?
Because if every time we shut off our computer
we lost literally all of the files we'd ever saved on it,
that would not be very effective.
We would lose a lot of the utility that we rely on computers for.
And so the way that hard disks work is specifically designed such that memory
can persist after the computer is shut off.
But again, that memory can not be processed directly in the hard disk.
We have to move it to the processor eventually.
So if our system detects that we need a chunk of memory
from the hard disk, that's all going to be moved from the hard disk
to RAM using something called a bus.
Much like a bus is used to move human beings from one place
to another in large quantities, a bus is used
to move data from one part of your machine to another in large quantities.
And in fact, if you ever see a SATA connection from a hard drive to RAM
using one of the SATA connectors we saw a moment ago on the slide,
there's usually a long, thin strip that connects them together.
That strip also forms part of the bus that
is used to transfer data from the hard drive
to the RAM in fairly large quantities.
In general, when we're working on a program,
the data for that program including the code that actually is running
is moved from hard disk to RAM.
And it stays in RAM, assuming there's no space constraint that
forces it to have to leave which sometimes can happen if you're
running a lot of programs at once.
You may notice your computer slows down quite a lot.
That's because the computer is going to have
to keep swapping things in and out of RAM
in order to process multiple things.
That's why you don't want to leave several hundred tabs open,
for example in your browser, or have 20 or 30 programs running
at once on your computer if you can avoid it,
because it's going to slow down and require things
to be swapped in and out of RAM such that it can
be moved to the processor quite a bit.
That's really going to slow things down.
While the program is running or being used by the computer,
everything will stay in RAM.
All the data will keep being manipulated there,
and then ultimately when we close the program
or once we otherwise indicate we haven't used it for some time
and the computer realizes it needs that space for something else, all
of those bits and bytes have been manipulated in RAM
will be sort of picked up and moved back on the bus back to a hard disk
where they will be resaved with the new state, such that any changes that you
make in a program will ultimately be saved back to hard disk,
but only once the program is completely done being used by the computer
and it realizes it can free up that information
and save it for long term storage.
Hard drives, though, are not unbreakable.
They have a lot of moving pieces.
A typical hard disk drive consists of several platters,
some thin metal circles spinning around a central axis very rapidly,
about 4,000 to 5,000 revolutions per minute.
So very, very quickly, with a magnetic read
write arm that extends over across the diameter of the disk, basically.
And each one of the little rings that gets
formed as you do this, as is the read write arm moves in and out,
it can access different sectors on the disk,
and those different sectors are the things that
get zeroed and oned over time.
So it is possible for hard drives to fail.
There's usually a couple ways that this happens.
If the read write arm jams, because it is on some sort of track that
moves in and out, if it jams without collapsing,
your hard drive will just stop working, basically,
because you can't read or write information anymore using that arm.
But it is also possible for the hard disk arm to break and fall.
That arm spins just above the top of these disks, and if it crashes into it,
you'll hear that sound.
That'll be a very unique and interesting sound to hear.
Suffice it to say, your hard drive at that point
is destroyed, because the collapse will crash everything,
and these things are spinning very, very quickly,
and so they're going to shred themselves from the inside.
And you will no longer be able to get any data off of that drive.
But if it's just the arm that gets stuck moving in and out but it doesn't fall
down, you will still be able to recover data from that hard drive,
and we'll talk about that shortly.
Because a hard drive failure does not mean that the data is unrecoverable
if the hard drive hasn't literally suffered this catastrophic shredding
sort of thing that happens.
That's going to render it unusable.
But if it's just the arm that gets stuck, it is still usable.
So what happens when we actually delete something on our machine?
It turns out that overwriting hard disk space
is actually a very, very time consuming and what
we might consider computationally expensive operation for the machine.
You could think about it as it has to pull all of the data from the hard disk
into RAM, change all of those bytes to delete what was there before,
and then put all of that data back.
The computer for some large files, say you
want to delete a video file like a movie, that
might be several gigabytes, so several billion bytes worth of data
that we have to delete.
The computer does not want to incur that sort of cost.
Deleting a file if it actually had to do it that way would be very, very slow.
It would compromise any other program that you had running on your machine.
And so that's not how computers actually delete information.
Rather, they just forget where the data live.
It turns out there's also something called a page file that
exists on your machine that is basically the home
address of the first byte of every single file
that you have on your machine.
And when you delete a file typically in your computer,
it just forgets where it lives.
The bytes that made it up are still there.
The zeros and ones that comprise that file don't go anywhere.
They may eventually be overwritten by some other file that
happens to be stored in that same spot, because the computer now
thinks it's open because it forgot that you live there.
And even then, this only happens when you empty your recycle bin or trash
if you're using a Mac.
If you just put something in the recycle bin,
that's not actually deleting it in any meaningful way at all.
It hides the icon.
You can't really click on that icon anymore,
but you haven't deleted that file, and you probably
know this because you can restore things from the recycle bin.
But even when you empty the recycle bin or empty the trash on your machine,
you're still not actually deleting anything in the sense
that you might be thinking is how we delete things.
Instead, your computer's just forgetting what was there before.
But those bits and bytes that comprise those files that you have deleted
are still there, and that creates a couple of really interesting security
implications.
So files that get deleted aren't really deleted,
which means that we can recover the information from them if we need to.
How exactly might we do that?
Well, there's definitely some tools out there that can be used to do this.
And again, this requires that the hard drive was not
physically destroyed in some way by the collapse of the read write arm.
But we can literally just connect the hard drive to something and have
a specialized tool that reads over all of those individual sectors
on the disk-- and this is a very slow operation for sure--
read over all of the individual sectors on that disk and just
say, well, this is a zero and this is a one and this is a zero
and this is a one until we end up with this huge file that
is all the zeros and ones that comprised what was originally
the state of that hard drive.
And we usually refer to this file that gets created,
this clone of the hard drive, as a for forensic image.
It's really just a huge file that is a complete replication
of the bit by bit content as well as any metadata that
might be associated with it that can be then created
and read on a different computer so that even though the hard drive this was
plugged into, maybe the computer got destroyed,
where we can make a copy of it and read it on a different machine instead.
So we go from this to how do people pick out what those files were?
Again, computers only understand zeros and ones
and at the end of the day, all of the stuff that
is stored in your hard drive, all those files,
anything that was stored in RAM when it was powered,
is still just zeros and ones.
They don't have icons like we see on our desktop.
They don't mean anything intuitively.
So how do we figure out what those files are?
Well, it turns out that many of them have what is called a signature
or a magic number associated with them.
A magic number is just a way to refer to the first few bytes of a file
where many file types, for examples, PDFs, most image files, most music file
types and so on, happen to start in a particular way.
This isn't a way that we ever see when we open one of these files.
But in the metadata at the beginning of those files,
there's usually a sequence of bytes that represent
a signature in effect of saying, the file that I'm about to open is a PDF,
and you can generally rely on that because these first four bytes
or whatever are these values.
Now again, it's four to eight bytes, which
means there are two to the 32 to two to the 256ish possibilities for what
these first bits are.
That's a lot of different combinations.
And so if we see a magic number randomly appear in some forensic image
or on some hard drive, the odds are pretty
good that if we see that pattern, we know that that pattern generally
refers to a file of that type, that what we have found
is the beginning of a file of exactly that type.
And we can start to interpret it in that way
maybe and maybe be able to reconstruct something from it.
So for example, it turns out that most PDFs have in their metadata--
and we never really see this--
the characters percent PDF at the beginning of them.
And that translates into this sequence of bits using the Ascii
table that we've talked about before, and we
don't need to get into a lot of detail, and it translates
into these hexadecimal values.
And so generally, if we happen to encounter exactly this pattern of 32
bits, which we should only expect to see at the beginning of a PDF
or otherwise once every one in two to the 32nd times--
like it's pretty uncommon to see exactly this pattern
and we're looking for exactly that pattern.
If we see those bits, generally what we can do
is start to interpret the rest of this file as a PDF
until we encounter some signature that we've reached the end of that.
Whether that's a whole bunch of zeros or whether that's a signature
that is again perhaps the start of another PDF.
Now, of course it's possible that you'll end up with a false positive.
For example, anybody who's examining these slides
at some point in the future-- say that my hard drive crashed
and I happen to literally have the characters percent
PDF typed on to this slide.
If you were to forensically recover my hard drive and analyze it
and you found this PowerPoint file that is where I'm presenting the slides from
and you saw literally the characters percent PDF in it as zeros and ones,
you might mistakenly think, this happens to be a PDF
and start to interpret from this point forward,
this yellow point forward as a PDF.
But it wouldn't work.
And that's OK.
You might get a false positive sometimes,
and then you just kind of disregard it and you keep looking.
You look for a different type of file.
You look for a different file signature and so on.
But it can happen that you have a false positive like this
in situations where you're trying to sort it out,
because you have no other context clues.
All you have are the bits and the information
that you know about file signatures.
OK, so we have this empty trash or empty recycle bin icon or menu
option on our computers.
But now we know it doesn't actually empty the trash at all.
So how do we actually delete files from our hard drives
as opposed to just having our hard drives forget
or our systems forget where on the hard drive that file lived?
We probably want to do that at some point, get rid of the data
on our machines.
How exactly can we go about that?
Well, there's actually relatively few ways to actually delete this data.
The first of which we've already kind of discussed,
which is physically destroying the hard drive.
There are services out there that will shred your hard drives for you.
If your read write arm breaks in a catastrophic way,
your read write arm will shred the device for you itself.
That's one way to ensure that your data is protected or deleted
is to make it absolutely impossible to recover information
from it by physical destruction.
You can use a tool called a degausser A degausser is really
just a very strong magnet that you hold over the device for a period of time.
It will also usually cause some sort of physical damage,
because it's also going to mess up some of the metal that
is inside the machine that is not storing data
but is just structural metal.
So usually a degausser will not only wipe out information
by setting all of the bits, flipping the polarity of all the bits from south
to north or something like that, but it will also
usually cause some sort of mechanical wear just based
on the strength of that magnet.
But then we have this thing Secure Empty Trash.
We saw this in the menu a second ago.
What do you think Secure Empty Trash might do?
Well, one thing that you might think is that it
would overwrite the data with random bits, and you would be correct.
That's what Secure Empty Trash does.
So instead of just deleting information from the hard drive
by forgetting where it lives, instead we actually go to that spot.
And instead of writing all zeros or all ones,
we just write random bits over it.
But it turns out that this is actually not good enough
to delete information on a single pass.
But a single pass is actually what Secure Empty Trash does.
It only makes one pass through, randomly setting each bit of that file
to a one or a zero.
But it turns out, and the physics of this is a little bit beyond me,
but it turns out that when the polarity of a magnet on a hard drive
is flipped from zero to one, there's actually sort of this lingering halo
effect that it leaves behind so that you can tell that this bit is a one now,
but it used to be a zero.
And that effect lingers for a little while.
But if you keep changing it multiple times over and over,
eventually that effect gets lost.
So you can tell what bits--
imagine every bit was a one after you make one pass through it.
All of those things that were ones before, their polarity didn't flip.
There's no halo effect.
But everything that used to be zero and is now a one
has this slight signature left behind that says, this used to be a zero.
And a good forensic analyst is able to take a look at that.
As it reads, it can read the polarity of the magnet
and see that it's slightly not exactly zero and not exactly one and say, OK.
Well this bit probably used to be the opposite.
And so even making one random pass across a hard drive
is not enough to definitely securely erase the data on it.
You actually have to make it's considered
to be seven passes is the industry standard
to make sure that enough randomness has affected each of the individual magnets
such that you can't tell what was there before.
So to truly securely erase the hard drive and preserve it in a state where
you can actually use it, you need to use--
and there are software tools that do this--
a tool that will overwrite the drive randomly
multiple times to eliminate any of that lingering halo effect.
But Secure Empty Trash does not do that.
It only makes a single pass over the drive.
So enough to cover it up for undescerning
eyes, but experts who study this and work with this kind of data
regularly might still be able to figure out what the original data was
if just a single pass is made.
So why is this important?
Well, there's two reasons.
One, as attorneys, we want to make sure that we are doing everything
we can to protect our clients' data.
And also as we're working with those who may be less technically inclined, it's
important for us as part of our competent representation of clients
to inform them about what we can about the technology implications of some
of the things they do from a legal perspective.
And so if you're working in a large firm environment or as an in-house counsel,
it's probably not going to fall to you as an attorney
to develop some sort of protocol for establishing the best
practices for working with client data.
But it is really useful to understand what these protocols are
and how you might be able to contribute to a conversation
about making these protocols more robust.
Here are some basic strategies that you can use as an attorney
to protect your own client data but also to advise clients
so that they can protect their data for their clients and so on.
So the first one is quite easy, and that is to encrypt your hard drive.
So we talked about encryption previously,
but you can also encrypt your own hard drive such
that when your computer turns on, you need to enter a password.
It's again similar to this public private key idea
that we've previously discussed.
You need to type in this password in order for your entire hard drive
to be unencrypted such that you can then read the data on it.
Most operating systems now provide tools that
are built into the operating system itself so that you can do this.
So there's really no excuse not to do it.
It is a very easy, straightforward and simple way
to take a pretty strong step at protecting the data on your machine
easily.
Again, this usually requires a password.
Typically it'll be after you turn your computer on before the operating
system itself loads, the operating system being one of the few things that
is not encrypted such that it can then open the files
and unencrypt everything and so on.
But it will not proceed past the operating system load point
until that password is provided.
But do be careful, because some of these systems,
particularly the more advanced ones, after a certain number
of incorrect guesses will begin to securely wipe your hard drive using
multiple passes of zeros and ones.
And so if you think there's a danger that you might forget your master
password so to speak for this hard drive encryption,
you might want to keep something somewhere to remind you.
I wouldn't recommend like sticking a sticky note on the monitor
or anything like that, but have some sort of way
to remember that password in the event that you might forget it,
because you might lose data if you guess wrong too many times depending
on which hard drive encryption tool you are using.
Another relatively easy thing to do is to avoid
using insecure wireless networks.
These are generally not as common anymore.
Most people have wireless networks that require a password,
and usually wireless networks that require a password will then
have encryption for that individual making the connection
on the system on the network.
But unsecured networks do provide opportunities
for those listening using tools that are called packet sniffers, which
are literally just listening and gathering
data on all of the packets of information
that are being transmitted over the internet in the vicinity
of the unsecured wireless network.
And so you might see-- this as a screenshot of a tool called Wireshark,
and it's a little blurry.
There's not a lot of relevant information here.
But on an unsecured network, it is possible to read
all of the bytes and bits that are flowing through,
translate them into their Ascii equivalence,
and realize that this person is providing a username
and password and an action logging in.
And so anybody who is able to then take this information and see what IP
address it came from-- and we'll talk about IP addresses shortly as well--
or where it was going to might be able to use
that data to log in as that person, which would definitely not
be a good thing at all.
One way to get around this if you find yourself in a situation
where you need to connect to the internet to do work
or for whatever reason you need to be connected to the internet
even if you're not sure about the quality of the network
is to rely on private or work provided VPN services.
VPN is a virtual private network, and it provides a way
to connect to a trusted encrypted network, have that network act as you,
effectively for providing encryption services for your web traffic
even if you're not sure that your traffic itself is unencrypted.
So VPNs are available at most businesses or also available online.
Relatively inexpensively, you can buy tools
that would allow you to make use of a virtual private network.
Password managers.
Password managers are great.
Honestly, I can tell you that I don't know most of the passwords
that I use on a daily basis because I rely on a password manager.
There are several services out there--
Last Pass, One Password, and others.
Basically, the idea is the tool will generate passwords for you.
You only have to remember the master password, the one
password that you can use to unlock everything
to open the password manager itself.
And then once you're logged into the password manager,
you just direct it to log in on your behalf to different services.
You usually tell it this is the URL I'd like you to go to,
this is the username to use, and then the secretly generated password
that you don't generally know is stored in the password manager itself.
Some of these tools are local to your machine.
More often than not, they are starting to migrate
to be cloud based services, which does introduce another interesting question
of do you trust your data to be stored on the cloud as opposed
to being stored on your device?
And that's really a question that you should
consider when you're thinking about using one of these tools.
Most of these tools also have an excellent secondary effect,
which is that they often provide two factor authentication support.
And two factor authentication is something
that we will talk about shortly as well, but it is usually something
that you know, like a password or something
that the password manager knows, and something you have like your cell
phone, for example, that might be getting a text message with a code
that you're you're supposed to enter as well.
And the idea is that an adversary who is trying to hack into your account
probably may know your password but won't have your phone,
or may have your phone because they took it but won't know your password.
And so these two factors are designed to preempt basic hacking attempts.
But as I mentioned, these tools are great,
but you should be skeptical of them, particularly
if they are cloud based, because it is possible for bad things to happen.
So for example, not too long ago, a few million users
of the password manager Blur had information that was leaked online.
None of this information was actually their passwords.
It was more customer related information, sort of ancillary
this is their email address and some other stuff.
But it hits a little close to home.
And so again, always be skeptical when thinking
about your own data and your clients' data.
But these tools are generally more good than bad.
But again, the decision of whether to use these tools really
does ultimately fall to you having done research into them,
seeing whether or not they make sense for you,
whether you want to take advantage of the advantages that they offer.
If you're not going to use a password manager,
you should at least be sure to use complex passwords
and certainly make sure to avoid using the same password for multiple services
unless it's like a throw away password that you use on things
that you don't care about.
But you want to definitely avoid using the same password
on important services.
So like your Gmail account or any client log in related information
that you have, or anything banking.
You want to use different passwords for all of those things.
Passwords that have less than eight characters
or less than or equal to eight characters,
you should effectively consider have been broken and hacked already.
Those are not secure.
Computers are definitely powerful enough nowadays that it can be brute forced
in a relatively short amount of time.
We're still talking maybe days here for an eight character password,
but that is not that much of an effort.
Passwords should be at least 12 characters now for sure.
You should definitely have a mix of uppercase, lowercase letters, numbers,
symbols, anything like that.
But anything that is less than or equal to eight characters
should definitely be considered to be effectively hacked already.
And if it hasn't been hacked already, certainly it
is capable of being hacked very easily by anybody who
wants to put in the effort to do so.
You should also change your passwords as frequently as you can.
For example, I have a bank that requires me to change my password every 90 days
in order to continue to use their online banking services.
And on the one hand, yes, you may find that kind of annoying.
But on the other hand, it's good to keep things changing so that you're never
having a password get stale and potentially then leaving it vulnerable,
especially if it's the password that you may have
used on multiple services in the past.
It's a good thing to keep in mind, especially
if you don't have that many passwords that you
need to maintain to change them as frequently as you're able to.
Creating backups.
Creating backups of information is really
important, because sometimes things will go wrong that you don't expect,
like maybe your hard drive will suffer some sort
of catastrophic mechanical failure and you wouldn't otherwise have a way
to get that information back.
So periodically backing your data up protects you
in the event of hardware failure or in the event
of some sort of ransomware attack where an adversary breaks
into your network, your office's network for example,
and doesn't take any data away but encrypts it using their own public
and private key such that there's no way for you
to read that information until you usually pay them
some ransom, which is usually money or something like that
or bitcoin or the like.
So you should back your data up pretty regularly.
You can back it up in the cloud using cloud based document storage services.
You can also just back it up on paper in certain situations as well.
But definitely back it up to non network connected machines,
so a computer that you have that is never connected to the internet
and is primarily used just for its hard drive space, basically.
Or to flash drives or CD ROMS if you're still using that technology.
Just have some offline way to access important data in the event
that something goes really, really wrong.
Also, have an archival plan for data.
You don't need to keep data around forever.
We oftentimes think that because we're living in this digital age
that everything we do persists forever and needs to persist forever
and is tracked.
But that's not entirely true, particularly
if we are proactive in doing our part to archive or delete data
when we no longer need it.
Particularly when you're considering client data,
it is important to develop a consistent plan for when
you are done working with that data.
So for example, it may be the case that in your firm after three years of no
longer having any matters related to that client,
it is just your office's policy to delete that client's data.
And that might mean transferring other data that
might be on a shared disk with them off of it
and literally going through the process of either destroying the drive
or doing the multiple passes over the drive using zeros and ones randomly
just to obscure that data, because having that policy of not keeping
things forever generally protects you, protect your clients if that data is
no longer needed.
Also, make talking about data security a priority.
I know it's not exactly the buzziest conversation
to have around the water cooler, but a lot of people
are not as thoughtful about technology as you may be taking this course.
And it may be a shock to them to realize that when
they delete a file on their machine, it doesn't actually do anything,
basically.
It just forgets that information, but that information still lives on.
You don't have to be a tech expert to educate others.
Particularly as someone who's coming into it with maybe a bit more of a leg
up in understanding technology, speaking to individuals
who may not know anything about what this technology is you
can really do yourself and your colleagues and your clients a service
by making this part of a typical conversation.
Share your knowledge with others in your office and in your field.
And finally, think about establishing a compliance protocol.
A lot of these things that I've just described
are very, very easy to set up at the outset.
It is not difficult to say, I'm going to change all my passwords,
and I'm going to use this password manager,
and I'm going to write this policy for deleting information and archiving
information periodically.
The problem is that it becomes over time something that we forget to do.
And having regular periods of having someone
designated to make sure that these policies are being followed
is really important, as we'll see shortly when we talk about some
of the ABA ethical requirements for lawyers dealing with technology.
You want to make sure that if you establish some of these ground rules
for working with data, that you continue to follow these rules as you work
with this data for the months and years and so on going forward
as opposed to just doing it once and forgetting about it.
Because technology is not static.
It's going to continue to advance, and we need
to stay ahead of that as attorneys.
It's part of our obligation to really understand this technology,
stay current with any changes, and adapt and change our policies accordingly
so that we're always staying as close to the cutting edge as we possibly can.
I really encourage you to volunteer with the compliance team.
You may have a compliance team, particularly
if you are at a large office or in-house counsel
setting, who is tasked with developing these technological policies.
And even if you don't feel like you want to advise on new avenues to pursue
or new policies to initiate, you still should be part of that conversation.
You do bring something valuable to the conversation just having the knowledge
that you have from a course like this and should be part of this conversation
so that you can contribute to it more in the future as well.
I'd like to conclude our discussion today about security
by drawing your attention to two really important ABA ethical decisions
that relate to lawyers and technology and what
lawyers should do in the event of a data breach at their office.
And let's start by taking a look at formal opinion 477R which
was released by the ABA in May of 2017.
This opinion deals with attorneys' obligations with respect to technical
know how.
So it is now considered part of competent representation
for an attorney to be considerate of the technological implications of what
they do in their office.
What does it mean to store documents?
What does it mean to secure communications with clients?
It is incumbent upon us as lawyers to stay abreast of these developments
and really be informed about them and inform
our clients about the ramifications of some
of these new technological advancements.
It also formalizes the requirement of offices and firms
to have a compliance protocol.
What do you do when you receive client data?
Now, this opinion came out in 2017.
It replaced something from 1999, which at the time
the previous ABA opinion stated that all communications, including
unsecured unencrypted email, were generally
considered quote unquote secured.
Obviously, I think we can agree that is not the case anymore
and certainly the ABA agrees that is not the case anymore.
That's because we've transitioned from a time when a lot of lawyerly work
was done not using the internet, not using emails.
It was done using fax and paper and so on.
And now we've transitioned to a mostly electronic way
of providing legal services to our clients,
and so our technological rules of our self-governing ethics
need to evolve to account for that.
It also brings up a very interesting question which is something just
to think about going forward or discuss with others in your group of how
do you reconcile a situation where you have a client who doesn't want
to use secured communications or doesn't want
to secure their data in working with you?
How does that square with your job or your requirement
as an attorney to ethically abide by this opinion
and be mindful and guard clients against technological mistakes?
Is it possible to provide competent representation to a client
if they are unwilling to adhere to your firm's compliance protocol?
It's a really interesting question that I don't have an answer to
but provokes an interesting discussion about what does it
mean for us to have client intake and work with clients,
and what happens when the client's wishes run
against our ethical obligations?
That's not a novel question to lawyers.
That presents itself in different ways, but via technology,
do we have yet another way we might have to consider this dilemma?
Subsequent to 477R, a year and a half later in October of 2018,
the ABA issued formal opinion 483, which kind of
is the natural follow on to 477R, which deals with what
happens if a lawyer's information is breached?
If there is a data breach at the firm and client data is compromised,
what do you have to do?
One important thing to think about here is that this opinion formalizes
the notion that has sort have long been held in technological circles
that there are two kinds of businesses that exist--
ones that have been hacked, and ones that will be.
Not ones that might be or not ones that could be.
And perhaps even these are ones that have been and they don't know it yet.
But it's just such a part of life nowadays
that businesses either have been hacked or will be hacked,
and that is the mindset that you should have
when you are thinking about protecting client data, bringing in consultants,
and hiring people to do their best work to defend your clients' data.
Now, it turns out that law firms tend to be excellent targets for hackers,
and the reason for that is that they have a lot of very valuable data.
And unfortunately, the history is such that it is not always as well protected
by law firms as it might have been by the clients themselves,
because we as lawyers have been as equipped
to have a conversation about technology and how that technology might
affect our representation of clients.
The opinion describes a bunch of different cyber episodes, so to speak,
that might comprise a data breach, which would rise to the level of needing
to report to a client.
These include things such as ransomware attacks,
as we've discussed a little bit earlier today,
systems attacks that might break or somehow damage
the infrastructure of the firm or workplace,
as well as exfiltrations, which are probably the worst
kind of breach, which is someone hacks into your system
and is able to remove data such that you may not even have a copy of that data
anymore, and that's why having backups is so important, but removes
that data from your servers, for example, to the adversary's servers.
There is no ethical violation in being hacked.
It's really important to make that very clear.
The ethical violation occurs when non reasonable efforts are made,
unreasonable efforts are made to protect that data.
If we as attorneys are making reasonable efforts to protect our clients' data
and we still get hacked, we have not necessarily done anything wrong
as long as we were doing our best to protect or prevent that
from happening in the first place and once we
detect that it has happened, to make every reasonable effort to stop
the attack if it is ongoing from continuing.
This also introduces a very interesting question
of what to do with former client data that has been hacked,
and that's why it's really important to establish
some sort of archival or deletion plan for working with that data.
The ABA proposes a couple of different ways
to resolve how to deal with informing a former client about information related
to a hack.
But one of the most important things to draw from this opinion, I would say,
is discussion about data retention needs to be
part of your firm's intake process or your intake
process for dealing with new clients.
Who owns what has always sort of been part of the conversation.
Generally as we know, we return client data to them
when we are done working with it.
How does this work in a digital context?
It is really important for your intake plan
at your firm to handle what happens to digital versions of client data
when the representation has concluded because the matter has concluded.
Speaking of concluded, that is going to wrap up our discussion today
on security.
This will be the first of our two discussions
generally at length about security in the legal context.
But hopefully you've come away from today
with a better understanding of how your system works, what memory is,
and why when we delete things on our hard drives,
it doesn't actually get deleted and what some of the ramifications
might be for that.
And hopefully you also have come away from this
with an understanding of what to do going forward establishing
best practices for working with client data
to stay within the ethical guidelines proposed by the ABA,
and just to generally have a more technical conversation with clients
about your representation of them and what happens to their data
when that representation has concluded.