Subtitles section Play video Print subtitles [MUSIC PLAYING] SPEAKER: So today we're going to have our first of a few discussions about cybersecurity, and later on we're going to talk a little bit about cybersecurity in the context of the internet and some of the challenges that it brings up there. But today we're going to focus mostly on cybersecurity issues related to your machine, your computer without necessarily being connected to the internet. Before we do, we need to understand a little bit more about our machine's infrastructure, its hardware. And the biggest question to ask at the outset is, when we talk about the system's memory, what do we mean by that? That term kind of gets thrown around and it means a couple of different things, potentially. It might mean your system's RAM or random access memory, which is a rough translation of how much computing power it has, how many things it can do. And we can also talk about hard drive space as another example of system memory. Hard drive space is usually just free storage, basically. How much room do we have to literally store files on our machine? How much memory does your computer have? Maybe you do or maybe you don't know. If you take a look at your system information or look up the computer that you bought on the internet, you might find that if we're quoting memory in terms of RAM, that your device might have as low as 512 megabytes of RAM, which is about half of a gigabyte. And that's not very much, most machines have much more than that now unless you have a low powered Chromebook, for example, that you use for travel. Memory on the RAM scale might go as high as 32 gigabytes of RAM, which is quite a bit more than that. That's generally for really high end computers. Computers, in particular, that process a lot of graphics. So sometimes computers that are specifically dedicated for gaming might have that much RAM. But typically the range is somewhere between four and 16 nowadays. When we're talking about hard drive space, that number is quite a bit bigger. So the typical hard drive nowadays might be as low as 128 gigabytes, if the drive is a solid state drive, versus a hard disk drive. We won't go into too much detail about the distinction between those two things, other than right now to say those are just two different ways to store data long term. So that might be the low end. The high end is probably somewhere on two terabytes of information. One terabyte is 1000 gigabytes, give or take. So two terabytes would be about 2000, give or take, gigabytes. So quite a bit. Maybe even as high as four terabytes. That's quite a bit of storage information. That's enough to store several hundred HD, high quality films. But there's much more to memory than just RAM and hard disk space. There's actually kind of a hierarchy of memory that exists within your machine. Most of these numbers, though, aren't usually quoted in the specs of a device. So there's RAM, random access memory, and then there's a series of caches, each of which gets successively smaller. So they're going to be quite a bit smaller than the four gigs, say, of RAM that your device has. But they're also a little bit faster, and the reason these things get faster, these caches get faster, is they are getting closer and closer to the computer's processor, which is really the only part of the device that is able to manipulate information. It's the only part that can process information. So the memory that we're feeding to that processor needs to get faster and faster, such that it can continue to swap things in and out. So we have the RAM, maybe an L3 cache, a Level 3 cache, Level 2, Level 1, and then finally CPU memory, which is the processor memory itself. Plus some small bits of memory called registers, which are used to be the final sort of pass of information from RAM or this hierarchy of memory into the CPU. But again, every file on your machine lives somewhere permanently on a disk drive. And there are, again, two different kinds of disk drives. We have solid state drives and hard disk drives. We should treat them as effectively identical for purposes of our discussion today. They-- solid state drives tend to behave a bit differently than hard disk drives, they tend to be a bit more secure than some of the vulnerabilities that hard disk drives present, which we're going to talk about a little bit later in today's lecture. But in general, when we talk about hard disks or storage space for the rest of today's lecture, we're going to be mostly focusing on hard disk drives. They're also just much more prevalent still. Solid state drives are coming into their own and becoming more and more frequent as they appear in devices, but hard disk drives are still far and away more and more prevalent within devices that exist now. They are just storage space, though, we can't do anything with data that is stored on disk. We have to first move it to RAM and then have it sort of go up and down that chain of RAM, the different caches to the CPU, in order to actually manipulate the data. Once we're done manipulating it, and maybe we're turning our computer off for the evening, then all of the data that is in RAM will be stored back into the hard disk space so that we're able to access it at another time. One thing to keep in mind as we begin this discussion of memory, though, is that memory is really just an array. And we've talked about arrays already, where each cell of that array basically is one byte wide. And recall that one byte is eight bits. We may have anywhere between 512 megabytes of memory, so about 512 million of those little one byte sized cells, maybe as high as four, 8, 16, and so on gigabytes. And we have quite a few of those items in our array. But it really is just an array, which means we can jump to different addresses. It has the same properties as any other random access array that we've already discussed. Different types of data take up different amounts of memory on our systems. So if we think about a very low level programming language like C, which is this is just an example. Different programming languages may store different types of data using different amounts of space. But if we look to just the most base level of data and think about the smallest individual pieces into which we can break it, we may be able to store an integer, for example, in four byte. Which means we have exactly 32 bits worth of space to store an integer. Characters will take up one byte, so we have only eight bits worth of memory required to store a single character. So capital or lowercase letters, digits, punctuation marks, and so on. Not a huge variety of options there. Floats are-- you may recall are real numbers, numbers that have decimal points in them. Doubles are, as well. They're double precision floating point values and they take up four or eight bytes. So basically the idea here is different types of memory will take up different amount of space and then we eventually can construct these things into pixels, and images, and films, each of which will also take up different amounts of space and memory if we are manipulating or working with that data. So again, let's think of memory as a big array of individual byte-sized cells. Because it is an array, that means we have random accessability. We can say, I want to go to memory address x and see what is there. I want to go to memory address y and change what is there. We have the ability to do that. We don't have to iterate through step by step by step in order to make changes. If we did, the processor would be quite a bit slower having to perform this, we might term linear search as we try to iterate through memory to find the one byte we're looking for. It's very helpful to be able to jump to a particular byte. And that means that every location in memory must have an address. We must have a way to refer to that individual byte in order to randomly access it. We can't just look at this grid of cells and say, I want to go to this one and sort of, you know, imagine particular spot. We need to say, I want to go to exactly this memory address. OK? So s-- the fact that memory cells have an address is what comes into play when you think about this idea of a 32-bit system or a 64-bit system, and this may be a term that you've heard before. It refers to the ability to process an address. So for example, a 32-bit computer, a 32-bit system, can process memory addresses up to 32 bits in length. Which means it understands memory address zero through memory address right up to four billion, a little over four billion. But it doesn't understand memory past that. Now interestingly, this doesn't mean that a 32-bit system is limited to four gigabytes of RAM. There are some software tricks that we can pull using something called virtual memory, which we're not going to get into in any more depth than to refer to it as virtual memory today, that allow you to use more than four gigabytes of RAM on a 32-bit system by doing-- sort of, you know, pretending that things live somewhere where they don't. But when you talk about a 64-bit system, that means we have many more memory cells that we can refer to without running into our sort of artificial limit of how high we can count. Now granted, there are no memory banks out there that have all of the memory addresses from zero to 64 bits worth of memory. That's somewhere in the quintillion or higher. It's a very, very large number and we don't yet have the storage capacity to store that much data on our machines. But theoretically, it is possible that with a 64-bit system we could have very, very large amounts of RAM and again, the more RAM we have, generally the more quickly our computer is going to operate because there's more space for it to store information. It doesn't have to keep sending stuff back to the hard drive when the RAM is full because there's so much information being processed at once. More of it is available in that quicker, more accessible bit of memory. So recall that with each bit, remember a bit can only take on one of two states. Zero or one, off or on. Or you can think about it in terms of electricity, which is how RAM actually works, as being unpowered or powered. That again means that we have 32-- two to the 32nd power, excuse me, possible memory addresses. So about four billion memory addresses. Now it is sometimes the case that programmers, and subsequently, those who may need to read their code, may need a way to refer to specific memory addresses. But a memory address like this, which is a memory address. There are zeros and ones in this address. This is exactly how we would refer to an address in memory. This is rather cumbersome. No programmer wants to talk to another programmer and no programmer wants to talk to an advisor by saying the code that lives at 00101 and so on. That's just not-- that doesn't make any sense. That's just not how we would talk and it would take forever just to say the name of the memory before you even get to the point of what is in that memory. And so rather than using binary notation to refer to a memory address, computer scientists will oftentimes use something called hexadecimal notation. Hexadecimal is 16 hexadecimal, 6 and 10. And so this is the base 16 number system. It's a different number system than the decimal system, base 10, that we have used since childhood to count and understand place values of numbers and so on. What's convenient about hexadecimal being base 16 versus binary being base two is that four binary digits or four bits can be represented using a single what is often called hex digit. So for every group of four binary digits that we have, we can represent that more succinctly using just one hexadecimal digit. And because there are four bits, that means we have two to the fourth, or 16 different combinations. So we can account for every single possible on off combination of all of the four bits in that cluster using a single hex digit. So we might instead refer to this memory address looking like this. And there are some letter characters in there, and that's because in order to represent a single digit in hexadecimal, we need to be on the count higher than 10 using two digits, as we are confined to in decimal. In order to represent the number 10, we need a one and zero, a one being in the tens place and a zero in the ones place. But in hexadecimal, we need 16 possible digits to represent all of the 16 possible values at any given place value. So here's an example of something that a programmer might see. This is using a tool called GDB, which is a debugging tool that is used to debug or root out problems in some low level code. And all we're seeing here is a bunch of memory addresses. So I've highlighted them here in yellow. We don't need to worry too much about the context around this, what these all refer to. But basically, these things on the left, EAX, ECX and so on are registers. Those are things that are very close to the memory. And they are storing the memory address of something else. And so all these things on the left here are just memory addresses, and the things on the right are translations of those memory addresses in some cases into decimal numbers that make more sense to us having used the base 10 or decimal system for quite some time. So we can map all of the different possible values in hexadecimal to their binary equivalents as well as to decimal numbers that we're familiar with. So again, here we have all of the possible combinations of four bits or zeros and ones showing you what they translate to in decimal, recalling that for every set of four bits here we see, the one on the right is the ones place, the one to its left is the twos place. Then we have the fours place and the eights place. Because again, our base is two. Every place value is a power of two as opposed to a power of 10 like we would in decimal. And then it's hexadecimal equivalent. So again, for every single one of those combinations, we have one distinct way to represent it using a single hex digit. And sometimes you'll see the hex digits for 10 through 15, which are a through f, presented in capital letters. I like to present them in capital letters, but sometimes you see them in lowercase letters as well. That is immaterial to it. And this zero x at the beginning of it, I should mention that as well. Zero x means absolutely nothing. It is purely a note for us as human beings when we are seeing something like this that we should interpret it as hexadecimal numbers as opposed to as decimal, for example. Because we could have a valid hexadecimal string that is-- I'm going to use the zero x here just for second-- 0x, five, zero. If we saw that, we might read it if we didn't have a 0x in front of it, we might read that as 50, which would be not actually accurate, because 0x, five, zero is actually 80 in decimal notation. So that 0x is really just a guide for us as human beings to say, OK, what I'm about to read here is a hexadecimal number. Let's just do a quick exercise where we translate some binary into hexadecimal and then subsequently into decimal as well. And so here, we have eight bits, each of which again is a zero or a one, and our goal is to translate this into ultimately decimal, but let's start by translating it into hexadecimal. The first approach is counting from right to left, we want to split these into groups of four. It so happens that we have eight bits here, and so this splits pretty cleanly into two groups of four. But if we, for example, had seven bits, like if this wasn't here, we would start by having one zero one zero, and then whatever we had left over, we would just pad with extra zeros at the front so we always had a cluster of four bits at a time to work with. Each of these maps directly to a single hexadecimal digit. And sometimes you may be able to just quickly do this in your head, or you can jump back to the table that we had here to see when I see this particular pattern, I want to plug in this hexadecimal digit. And so if we do that here, we see that the one on the left, 0010, this is in binary again. A zero in the ones place, a one in the twos place, and nothing else, which means we have one times two. And so this would be a two. And 1010, well, that's a one in the eights place and a one in the twos place, which is 10. But in hexadecimal, we would represent that as a, because again, we need to confine this idea of 10 to a single place value. We can't have two digits to represent it using hexadecimal notation. And so this binary value, 001010, is 0x-- again, human convention to prepend a 0x in front of anything that is a hexadecimal number-- 0x2a. Now, how do we translate this to decimal? Well, it may help to think about how we translate this or understand this number, 123. When we see it, one two three just written out, we are really doing something like this in our head where we're saying, there's a one in the one hundreds place, there's a two in the tens place, and there's a three in the ones place. And we've just over time internalized that and have been able to very quickly understand that the number I'm talking about here is 123. Well, another way to think about these labels here, one hundreds place, tens place, and ones place, might be to say, we have the 10 squareds place or the 10 to the second powers place, the 10 to the first powers place, and the ten to the zero powers place. Any number to the zero power is always one, and so this is really the ones place, the tens place, and the hundreds place. With hexadecimal, we don't have 10 as the base of the exponent here. Instead, we have 16 as the base of the exponent. But the rules are the same. We have a 16 to the zero place which is one. We have 16 to the first power or 16s place, and we have a 16 squared or 256s place. In our example number here, we didn't go that high. We had 0x2a. We only had two digits, which means we really only needed these two place values, the 16 to the zero power and the 16 to the one power. Now, we just translate this in exactly the same way that we would intuitively do it in when we're counting in decimal or reading a decimal number. This is zero times 16 squared plus two times 16 to the first power plus a times one, or 16 to the zero power. Two times 16 is 32, and a, which again is hexadecimal's way of representing 10, 10 times one is ten, so what we're really saying is that we have 32 plus 10. And so to translate this hexadecimal number, 0x2a, into decimal, we end up with 42, because 42 is 32 plus 10. So hopefully, that gives you a bit of a better understanding of what these cryptic number strings that you might have seen before mean. And if you're working with programmers or you're ever analyzing source code and you see references like this, hopefully this gives you a better understanding of what they mean and what they likely refer to on the system and how that might affect things. Let's talk a little bit more about the function, how memory actually works now that we know how to access individual parts of it. With the exception of hard disk space-- so again, the permanent storage space on your device-- memory on your computer is termed volatile, which means two different things. One, that the memory is constantly changing. Things are cycling in and out of it. It's very dynamic in terms of the values that are being stored there, again because the RAM is sort of this holding ground for everything that's going to eventually need to go to the processor, and things are getting swapped in and out pretty frequently. But the other really key detail about volatile memory is that it requires power. If it is unpowered, if there is not electricity literally flowing to the RAM at any given time, that is a problem and that memory will no longer work. In fact, after some amount of time, a pretty small amount time like 30 seconds to a minute perhaps, without power, the electrical charge which is used to maintain each of those individual cells of memory-- remember, a little bit of electricity being one, and the absence of electricity being zero is how the computer can store this idea of zeros and ones on a physical manifestation thereof. Without power, that electrical charge eventually dissipates. It does not just stay. it goes away. And the state is eventually lost such that unpowered for about a minute or so, all the data in RAM has effectively turned into zeros. It has completely become completely unpowered. Now obviously, that would be very bad if our entire system relied on this technology. But it's only RAM and the caches from RAM going forward that rely on this. Processing can only happen in the processor. This probably makes a little bit of sense. And again, recall that a 32-bit processor can understand 32-bit addresses. That also means that it only has 32 bits of space in which to do anything. So it only can work with four bytes of information at a time. And maybe if you have a computer that has multiple cores, maybe you've heard that term before, multicore processors, you might have a few of these processors that can do four bytes at a time. But either way, we're still talking about a very, very small amount of information, maybe four to 16 or 32 bytes. That's not very much at all when you consider that a basic document perhaps using Microsoft Word will contain enough metadata to be about 15,000 bytes before you even type a single character into it. So a lot of metadata there, and that amount of empty files gets pretty big pretty quickly. Because the process can only process 32 bits worth of information at a time, any given processor, we need to move data to it frequently. And that's what the caches are for, and that's why each one needs to be faster and be able to get information to the processor pretty quickly. Because even though the processor can only process four bytes or 32 bits worth of information at any given time, it can do two to three billion operations per second, so that's what a gigahertz is. And in terms of when a processor's speed is quoted, it's sometimes said it's like 2.4 gigahertz or 2.6 gigahertz or so on. That means that the computer can do 2.4 to $2.6 billion things per second. So again, 32 bits, not a lot of information at any instant, but there's a lot of those instants within a second. It can do two to three billion things per second, each one of those things operating on exactly four bytes at a time, 32 bits at a time, on a 32-bit processor, as opposed to a 64-bit processor which can process a little bit more data. Let's take a look now at what we determine on your computer as the motherboard, or sort of the control processor for everything that your computer does, and highlight some of the different pieces of where things live on your physical device. So right here are some slots for RAM, so these are basically sticks that get plugged in. A RAM stick is just a green chip. It looks similar to the motherboard. They're usually green. They have some gold connector pins at the bottom of them, and they plug into the motherboard. And information can then be stored there and flow to and from when needed by the processor and so on. So that's where these go. This particular motherboard, which is from a computer that's about 15 years old. For example, I don't think most of us have floppy drive connectors on our computers anymore, but this one still does. Here is where the CPU would live, so this is where the actual processor goes. And that processor again can only do 32 or 64 bits worth of information at any given time. And on top of the CPU, it's not pictured here, but typically on top of the CPU there's a giant fan, literally like mounted or screwed right above it. And again, that's because the computer is doing two to three billion things a second, so it gets quite hot. And to prevent a CPU meltdown or a core meltdown, you want to make sure to have air constantly flowing across the top of the device as well as a heat sink to pull all the heat away from the CPU such that it doesn't overheat, which would create quite a big problem and eventually might result in computer breakage if left to overheat for a prolonged period of time. Over here is a graphics processor. Graphics processors are really just CPUs that are specialized to do certain operations that make interpreting graphics on your monitor much easier. The math for those is usually a bit more complicated, and so modern devices may have both a CPU and a GPU, a Graphical Processor Unit, as opposed to relying on just the CPU you to handle all of those different things. And it similarly would have a heat sink and a fan mounted with it as well. And then over here at the top, it's pretty small. There are things called SATA connectors. SATA connectors are what you might use to connect hard drives to your machine so that you can extend the storage capacity of the device. But all of these things might live on your computer, and also all of these things in shrunk down form will live on your laptop and even in your mobile phone. This basic idea exists just in smaller and smaller scales with all of the parts being similarly scaled down. So again, CPU memory, what actually lives in the CPU as well as the registers, those really fast things right around the CPU memory, is the fastest memory on your machine. But there's the least of it. And the reason for this is that it's very, very expensive. It is the most expensive stuff in your computer. That is basically the price that you are paying when you buy the computer is for that processor and the materials that are used to allow electricity to conduct through it very quickly really determines the cost of the device. So there's the least amount of it, but it is the most important memory on your machine. The caches, one two and three, are each successively slower than CPU memory but also successively cheaper. So your l1 cache is going to be a little bit slower than your CPU, but there will be a little bit more of it. And your l1 cache will be a little bit larger than the CPU space that you have, but it'll be a little bit cheaper. The l2 cache may be a little bit larger than the l1 cache but a little bit cheaper. Again, this is really just referring to the materials that are used to make the memory operational. RAM is slower but cheaper. RAM typically used to be the most expensive or be considered the driving cost. If you had more RAM in your computer, that made it more powerful. That was the cost driver. This is becoming less and less the case. It's still more expensive than hard disk space, which is effectively free at this point. It's really just how much stuff we can literally fit into the container for the hard disk itself, which is just pure storage. But RAM is slower memory than any of the caches, but you're able to have more of it because it is less expensive. So that's memory. But in terms of hard disk space, that does not work in the same way that RAM and the other volatile memories work, and hard disk space is non-volatile. Information in the hard disk is not changed terribly often, only when we're certain that we're done working with it in RAM. And the data there is also persistent, and that's because it does not rely on electricity to store state. Instead, and we're talking again specifically now about hard disk drive, solid state drives behave a little bit differently. They use microchips that do some different things. But we're talking about hard disk space, HDDs, traditional hard disks. Each cell of a hard disk is instead controlled by magnetism, so data is stored magnetically. If there is a-- we'll just say for purposes of this discussion here that if the magnetism is in a down position, so south for example, it's oriented south, that would be zero. That's a way to represent zero. And any magnet that is in the up position is one, so we can have these flip states of the polarity is pointing up or north and the polarity is pointing down or south to represent zero and one as opposed to using powered versus unpowered to represent one and zero, respectively in a RAM or volatile memory situation. Because these magnets, though, don't require power in order to work long term, that means that when the computer shuts off and they become unpowered, the data remains. And this is a really good thing, right? Because if every time we shut off our computer we lost literally all of the files we'd ever saved on it, that would not be very effective. We would lose a lot of the utility that we rely on computers for. And so the way that hard disks work is specifically designed such that memory can persist after the computer is shut off. But again, that memory can not be processed directly in the hard disk. We have to move it to the processor eventually. So if our system detects that we need a chunk of memory from the hard disk, that's all going to be moved from the hard disk to RAM using something called a bus. Much like a bus is used to move human beings from one place to another in large quantities, a bus is used to move data from one part of your machine to another in large quantities. And in fact, if you ever see a SATA connection from a hard drive to RAM using one of the SATA connectors we saw a moment ago on the slide, there's usually a long, thin strip that connects them together. That strip also forms part of the bus that is used to transfer data from the hard drive to the RAM in fairly large quantities. In general, when we're working on a program, the data for that program including the code that actually is running is moved from hard disk to RAM. And it stays in RAM, assuming there's no space constraint that forces it to have to leave which sometimes can happen if you're running a lot of programs at once. You may notice your computer slows down quite a lot. That's because the computer is going to have to keep swapping things in and out of RAM in order to process multiple things. That's why you don't want to leave several hundred tabs open, for example in your browser, or have 20 or 30 programs running at once on your computer if you can avoid it, because it's going to slow down and require things to be swapped in and out of RAM such that it can be moved to the processor quite a bit. That's really going to slow things down. While the program is running or being used by the computer, everything will stay in RAM. All the data will keep being manipulated there, and then ultimately when we close the program or once we otherwise indicate we haven't used it for some time and the computer realizes it needs that space for something else, all of those bits and bytes have been manipulated in RAM will be sort of picked up and moved back on the bus back to a hard disk where they will be resaved with the new state, such that any changes that you make in a program will ultimately be saved back to hard disk, but only once the program is completely done being used by the computer and it realizes it can free up that information and save it for long term storage. Hard drives, though, are not unbreakable. They have a lot of moving pieces. A typical hard disk drive consists of several platters, some thin metal circles spinning around a central axis very rapidly, about 4,000 to 5,000 revolutions per minute. So very, very quickly, with a magnetic read write arm that extends over across the diameter of the disk, basically. And each one of the little rings that gets formed as you do this, as is the read write arm moves in and out, it can access different sectors on the disk, and those different sectors are the things that get zeroed and oned over time. So it is possible for hard drives to fail. There's usually a couple ways that this happens. If the read write arm jams, because it is on some sort of track that moves in and out, if it jams without collapsing, your hard drive will just stop working, basically, because you can't read or write information anymore using that arm. But it is also possible for the hard disk arm to break and fall. That arm spins just above the top of these disks, and if it crashes into it, you'll hear that sound. That'll be a very unique and interesting sound to hear. Suffice it to say, your hard drive at that point is destroyed, because the collapse will crash everything, and these things are spinning very, very quickly, and so they're going to shred themselves from the inside. And you will no longer be able to get any data off of that drive. But if it's just the arm that gets stuck moving in and out but it doesn't fall down, you will still be able to recover data from that hard drive, and we'll talk about that shortly. Because a hard drive failure does not mean that the data is unrecoverable if the hard drive hasn't literally suffered this catastrophic shredding sort of thing that happens. That's going to render it unusable. But if it's just the arm that gets stuck, it is still usable. So what happens when we actually delete something on our machine? It turns out that overwriting hard disk space is actually a very, very time consuming and what we might consider computationally expensive operation for the machine. You could think about it as it has to pull all of the data from the hard disk into RAM, change all of those bytes to delete what was there before, and then put all of that data back. The computer for some large files, say you want to delete a video file like a movie, that might be several gigabytes, so several billion bytes worth of data that we have to delete. The computer does not want to incur that sort of cost. Deleting a file if it actually had to do it that way would be very, very slow. It would compromise any other program that you had running on your machine. And so that's not how computers actually delete information. Rather, they just forget where the data live. It turns out there's also something called a page file that exists on your machine that is basically the home address of the first byte of every single file that you have on your machine. And when you delete a file typically in your computer, it just forgets where it lives. The bytes that made it up are still there. The zeros and ones that comprise that file don't go anywhere. They may eventually be overwritten by some other file that happens to be stored in that same spot, because the computer now thinks it's open because it forgot that you live there. And even then, this only happens when you empty your recycle bin or trash if you're using a Mac. If you just put something in the recycle bin, that's not actually deleting it in any meaningful way at all. It hides the icon. You can't really click on that icon anymore, but you haven't deleted that file, and you probably know this because you can restore things from the recycle bin. But even when you empty the recycle bin or empty the trash on your machine, you're still not actually deleting anything in the sense that you might be thinking is how we delete things. Instead, your computer's just forgetting what was there before. But those bits and bytes that comprise those files that you have deleted are still there, and that creates a couple of really interesting security implications. So files that get deleted aren't really deleted, which means that we can recover the information from them if we need to. How exactly might we do that? Well, there's definitely some tools out there that can be used to do this. And again, this requires that the hard drive was not physically destroyed in some way by the collapse of the read write arm. But we can literally just connect the hard drive to something and have a specialized tool that reads over all of those individual sectors on the disk-- and this is a very slow operation for sure-- read over all of the individual sectors on that disk and just say, well, this is a zero and this is a one and this is a zero and this is a one until we end up with this huge file that is all the zeros and ones that comprised what was originally the state of that hard drive. And we usually refer to this file that gets created, this clone of the hard drive, as a for forensic image. It's really just a huge file that is a complete replication of the bit by bit content as well as any metadata that might be associated with it that can be then created and read on a different computer so that even though the hard drive this was plugged into, maybe the computer got destroyed, where we can make a copy of it and read it on a different machine instead. So we go from this to how do people pick out what those files were? Again, computers only understand zeros and ones and at the end of the day, all of the stuff that is stored in your hard drive, all those files, anything that was stored in RAM when it was powered, is still just zeros and ones. They don't have icons like we see on our desktop. They don't mean anything intuitively. So how do we figure out what those files are? Well, it turns out that many of them have what is called a signature or a magic number associated with them. A magic number is just a way to refer to the first few bytes of a file where many file types, for examples, PDFs, most image files, most music file types and so on, happen to start in a particular way. This isn't a way that we ever see when we open one of these files. But in the metadata at the beginning of those files, there's usually a sequence of bytes that represent a signature in effect of saying, the file that I'm about to open is a PDF, and you can generally rely on that because these first four bytes or whatever are these values. Now again, it's four to eight bytes, which means there are two to the 32 to two to the 256ish possibilities for what these first bits are. That's a lot of different combinations. And so if we see a magic number randomly appear in some forensic image or on some hard drive, the odds are pretty good that if we see that pattern, we know that that pattern generally refers to a file of that type, that what we have found is the beginning of a file of exactly that type. And we can start to interpret it in that way maybe and maybe be able to reconstruct something from it. So for example, it turns out that most PDFs have in their metadata-- and we never really see this-- the characters percent PDF at the beginning of them. And that translates into this sequence of bits using the Ascii table that we've talked about before, and we don't need to get into a lot of detail, and it translates into these hexadecimal values. And so generally, if we happen to encounter exactly this pattern of 32 bits, which we should only expect to see at the beginning of a PDF or otherwise once every one in two to the 32nd times-- like it's pretty uncommon to see exactly this pattern and we're looking for exactly that pattern. If we see those bits, generally what we can do is start to interpret the rest of this file as a PDF until we encounter some signature that we've reached the end of that. Whether that's a whole bunch of zeros or whether that's a signature that is again perhaps the start of another PDF. Now, of course it's possible that you'll end up with a false positive. For example, anybody who's examining these slides at some point in the future-- say that my hard drive crashed and I happen to literally have the characters percent PDF typed on to this slide. If you were to forensically recover my hard drive and analyze it and you found this PowerPoint file that is where I'm presenting the slides from and you saw literally the characters percent PDF in it as zeros and ones, you might mistakenly think, this happens to be a PDF and start to interpret from this point forward, this yellow point forward as a PDF. But it wouldn't work. And that's OK. You might get a false positive sometimes, and then you just kind of disregard it and you keep looking. You look for a different type of file. You look for a different file signature and so on. But it can happen that you have a false positive like this in situations where you're trying to sort it out, because you have no other context clues. All you have are the bits and the information that you know about file signatures. OK, so we have this empty trash or empty recycle bin icon or menu option on our computers. But now we know it doesn't actually empty the trash at all. So how do we actually delete files from our hard drives as opposed to just having our hard drives forget or our systems forget where on the hard drive that file lived? We probably want to do that at some point, get rid of the data on our machines. How exactly can we go about that? Well, there's actually relatively few ways to actually delete this data. The first of which we've already kind of discussed, which is physically destroying the hard drive. There are services out there that will shred your hard drives for you. If your read write arm breaks in a catastrophic way, your read write arm will shred the device for you itself. That's one way to ensure that your data is protected or deleted is to make it absolutely impossible to recover information from it by physical destruction. You can use a tool called a degausser A degausser is really just a very strong magnet that you hold over the device for a period of time. It will also usually cause some sort of physical damage, because it's also going to mess up some of the metal that is inside the machine that is not storing data but is just structural metal. So usually a degausser will not only wipe out information by setting all of the bits, flipping the polarity of all the bits from south to north or something like that, but it will also usually cause some sort of mechanical wear just based on the strength of that magnet. But then we have this thing Secure Empty Trash. We saw this in the menu a second ago. What do you think Secure Empty Trash might do? Well, one thing that you might think is that it would overwrite the data with random bits, and you would be correct. That's what Secure Empty Trash does. So instead of just deleting information from the hard drive by forgetting where it lives, instead we actually go to that spot. And instead of writing all zeros or all ones, we just write random bits over it. But it turns out that this is actually not good enough to delete information on a single pass. But a single pass is actually what Secure Empty Trash does. It only makes one pass through, randomly setting each bit of that file to a one or a zero. But it turns out, and the physics of this is a little bit beyond me, but it turns out that when the polarity of a magnet on a hard drive is flipped from zero to one, there's actually sort of this lingering halo effect that it leaves behind so that you can tell that this bit is a one now, but it used to be a zero. And that effect lingers for a little while. But if you keep changing it multiple times over and over, eventually that effect gets lost. So you can tell what bits-- imagine every bit was a one after you make one pass through it. All of those things that were ones before, their polarity didn't flip. There's no halo effect. But everything that used to be zero and is now a one has this slight signature left behind that says, this used to be a zero. And a good forensic analyst is able to take a look at that. As it reads, it can read the polarity of the magnet and see that it's slightly not exactly zero and not exactly one and say, OK. Well this bit probably used to be the opposite. And so even making one random pass across a hard drive is not enough to definitely securely erase the data on it. You actually have to make it's considered to be seven passes is the industry standard to make sure that enough randomness has affected each of the individual magnets such that you can't tell what was there before. So to truly securely erase the hard drive and preserve it in a state where you can actually use it, you need to use-- and there are software tools that do this-- a tool that will overwrite the drive randomly multiple times to eliminate any of that lingering halo effect. But Secure Empty Trash does not do that. It only makes a single pass over the drive. So enough to cover it up for undescerning eyes, but experts who study this and work with this kind of data regularly might still be able to figure out what the original data was if just a single pass is made. So why is this important? Well, there's two reasons. One, as attorneys, we want to make sure that we are doing everything we can to protect our clients' data. And also as we're working with those who may be less technically inclined, it's important for us as part of our competent representation of clients to inform them about what we can about the technology implications of some of the things they do from a legal perspective. And so if you're working in a large firm environment or as an in-house counsel, it's probably not going to fall to you as an attorney to develop some sort of protocol for establishing the best practices for working with client data. But it is really useful to understand what these protocols are and how you might be able to contribute to a conversation about making these protocols more robust. Here are some basic strategies that you can use as an attorney to protect your own client data but also to advise clients so that they can protect their data for their clients and so on. So the first one is quite easy, and that is to encrypt your hard drive. So we talked about encryption previously, but you can also encrypt your own hard drive such that when your computer turns on, you need to enter a password. It's again similar to this public private key idea that we've previously discussed. You need to type in this password in order for your entire hard drive to be unencrypted such that you can then read the data on it. Most operating systems now provide tools that are built into the operating system itself so that you can do this. So there's really no excuse not to do it. It is a very easy, straightforward and simple way to take a pretty strong step at protecting the data on your machine easily. Again, this usually requires a password. Typically it'll be after you turn your computer on before the operating system itself loads, the operating system being one of the few things that is not encrypted such that it can then open the files and unencrypt everything and so on. But it will not proceed past the operating system load point until that password is provided. But do be careful, because some of these systems, particularly the more advanced ones, after a certain number of incorrect guesses will begin to securely wipe your hard drive using multiple passes of zeros and ones. And so if you think there's a danger that you might forget your master password so to speak for this hard drive encryption, you might want to keep something somewhere to remind you. I wouldn't recommend like sticking a sticky note on the monitor or anything like that, but have some sort of way to remember that password in the event that you might forget it, because you might lose data if you guess wrong too many times depending on which hard drive encryption tool you are using. Another relatively easy thing to do is to avoid using insecure wireless networks. These are generally not as common anymore. Most people have wireless networks that require a password, and usually wireless networks that require a password will then have encryption for that individual making the connection on the system on the network. But unsecured networks do provide opportunities for those listening using tools that are called packet sniffers, which are literally just listening and gathering data on all of the packets of information that are being transmitted over the internet in the vicinity of the unsecured wireless network. And so you might see-- this as a screenshot of a tool called Wireshark, and it's a little blurry. There's not a lot of relevant information here. But on an unsecured network, it is possible to read all of the bytes and bits that are flowing through, translate them into their Ascii equivalence, and realize that this person is providing a username and password and an action logging in. And so anybody who is able to then take this information and see what IP address it came from-- and we'll talk about IP addresses shortly as well-- or where it was going to might be able to use that data to log in as that person, which would definitely not be a good thing at all. One way to get around this if you find yourself in a situation where you need to connect to the internet to do work or for whatever reason you need to be connected to the internet even if you're not sure about the quality of the network is to rely on private or work provided VPN services. VPN is a virtual private network, and it provides a way to connect to a trusted encrypted network, have that network act as you, effectively for providing encryption services for your web traffic even if you're not sure that your traffic itself is unencrypted. So VPNs are available at most businesses or also available online. Relatively inexpensively, you can buy tools that would allow you to make use of a virtual private network. Password managers. Password managers are great. Honestly, I can tell you that I don't know most of the passwords that I use on a daily basis because I rely on a password manager. There are several services out there-- Last Pass, One Password, and others. Basically, the idea is the tool will generate passwords for you. You only have to remember the master password, the one password that you can use to unlock everything to open the password manager itself. And then once you're logged into the password manager, you just direct it to log in on your behalf to different services. You usually tell it this is the URL I'd like you to go to, this is the username to use, and then the secretly generated password that you don't generally know is stored in the password manager itself. Some of these tools are local to your machine. More often than not, they are starting to migrate to be cloud based services, which does introduce another interesting question of do you trust your data to be stored on the cloud as opposed to being stored on your device? And that's really a question that you should consider when you're thinking about using one of these tools. Most of these tools also have an excellent secondary effect, which is that they often provide two factor authentication support. And two factor authentication is something that we will talk about shortly as well, but it is usually something that you know, like a password or something that the password manager knows, and something you have like your cell phone, for example, that might be getting a text message with a code that you're you're supposed to enter as well. And the idea is that an adversary who is trying to hack into your account probably may know your password but won't have your phone, or may have your phone because they took it but won't know your password. And so these two factors are designed to preempt basic hacking attempts. But as I mentioned, these tools are great, but you should be skeptical of them, particularly if they are cloud based, because it is possible for bad things to happen. So for example, not too long ago, a few million users of the password manager Blur had information that was leaked online. None of this information was actually their passwords. It was more customer related information, sort of ancillary this is their email address and some other stuff. But it hits a little close to home. And so again, always be skeptical when thinking about your own data and your clients' data. But these tools are generally more good than bad. But again, the decision of whether to use these tools really does ultimately fall to you having done research into them, seeing whether or not they make sense for you, whether you want to take advantage of the advantages that they offer. If you're not going to use a password manager, you should at least be sure to use complex passwords and certainly make sure to avoid using the same password for multiple services unless it's like a throw away password that you use on things that you don't care about. But you want to definitely avoid using the same password on important services. So like your Gmail account or any client log in related information that you have, or anything banking. You want to use different passwords for all of those things. Passwords that have less than eight characters or less than or equal to eight characters, you should effectively consider have been broken and hacked already. Those are not secure. Computers are definitely powerful enough nowadays that it can be brute forced in a relatively short amount of time. We're still talking maybe days here for an eight character password, but that is not that much of an effort. Passwords should be at least 12 characters now for sure. You should definitely have a mix of uppercase, lowercase letters, numbers, symbols, anything like that. But anything that is less than or equal to eight characters should definitely be considered to be effectively hacked already. And if it hasn't been hacked already, certainly it is capable of being hacked very easily by anybody who wants to put in the effort to do so. You should also change your passwords as frequently as you can. For example, I have a bank that requires me to change my password every 90 days in order to continue to use their online banking services. And on the one hand, yes, you may find that kind of annoying. But on the other hand, it's good to keep things changing so that you're never having a password get stale and potentially then leaving it vulnerable, especially if it's the password that you may have used on multiple services in the past. It's a good thing to keep in mind, especially if you don't have that many passwords that you need to maintain to change them as frequently as you're able to. Creating backups. Creating backups of information is really important, because sometimes things will go wrong that you don't expect, like maybe your hard drive will suffer some sort of catastrophic mechanical failure and you wouldn't otherwise have a way to get that information back. So periodically backing your data up protects you in the event of hardware failure or in the event of some sort of ransomware attack where an adversary breaks into your network, your office's network for example, and doesn't take any data away but encrypts it using their own public and private key such that there's no way for you to read that information until you usually pay them some ransom, which is usually money or something like that or bitcoin or the like. So you should back your data up pretty regularly. You can back it up in the cloud using cloud based document storage services. You can also just back it up on paper in certain situations as well. But definitely back it up to non network connected machines, so a computer that you have that is never connected to the internet and is primarily used just for its hard drive space, basically. Or to flash drives or CD ROMS if you're still using that technology. Just have some offline way to access important data in the event that something goes really, really wrong. Also, have an archival plan for data. You don't need to keep data around forever. We oftentimes think that because we're living in this digital age that everything we do persists forever and needs to persist forever and is tracked. But that's not entirely true, particularly if we are proactive in doing our part to archive or delete data when we no longer need it. Particularly when you're considering client data, it is important to develop a consistent plan for when you are done working with that data. So for example, it may be the case that in your firm after three years of no longer having any matters related to that client, it is just your office's policy to delete that client's data. And that might mean transferring other data that might be on a shared disk with them off of it and literally going through the process of either destroying the drive or doing the multiple passes over the drive using zeros and ones randomly just to obscure that data, because having that policy of not keeping things forever generally protects you, protect your clients if that data is no longer needed. Also, make talking about data security a priority. I know it's not exactly the buzziest conversation to have around the water cooler, but a lot of people are not as thoughtful about technology as you may be taking this course. And it may be a shock to them to realize that when they delete a file on their machine, it doesn't actually do anything, basically. It just forgets that information, but that information still lives on. You don't have to be a tech expert to educate others. Particularly as someone who's coming into it with maybe a bit more of a leg up in understanding technology, speaking to individuals who may not know anything about what this technology is you can really do yourself and your colleagues and your clients a service by making this part of a typical conversation. Share your knowledge with others in your office and in your field. And finally, think about establishing a compliance protocol. A lot of these things that I've just described are very, very easy to set up at the outset. It is not difficult to say, I'm going to change all my passwords, and I'm going to use this password manager, and I'm going to write this policy for deleting information and archiving information periodically. The problem is that it becomes over time something that we forget to do. And having regular periods of having someone designated to make sure that these policies are being followed is really important, as we'll see shortly when we talk about some of the ABA ethical requirements for lawyers dealing with technology. You want to make sure that if you establish some of these ground rules for working with data, that you continue to follow these rules as you work with this data for the months and years and so on going forward as opposed to just doing it once and forgetting about it. Because technology is not static. It's going to continue to advance, and we need to stay ahead of that as attorneys. It's part of our obligation to really understand this technology, stay current with any changes, and adapt and change our policies accordingly so that we're always staying as close to the cutting edge as we possibly can. I really encourage you to volunteer with the compliance team. You may have a compliance team, particularly if you are at a large office or in-house counsel setting, who is tasked with developing these technological policies. And even if you don't feel like you want to advise on new avenues to pursue or new policies to initiate, you still should be part of that conversation. You do bring something valuable to the conversation just having the knowledge that you have from a course like this and should be part of this conversation so that you can contribute to it more in the future as well. I'd like to conclude our discussion today about security by drawing your attention to two really important ABA ethical decisions that relate to lawyers and technology and what lawyers should do in the event of a data breach at their office. And let's start by taking a look at formal opinion 477R which was released by the ABA in May of 2017. This opinion deals with attorneys' obligations with respect to technical know how. So it is now considered part of competent representation for an attorney to be considerate of the technological implications of what they do in their office. What does it mean to store documents? What does it mean to secure communications with clients? It is incumbent upon us as lawyers to stay abreast of these developments and really be informed about them and inform our clients about the ramifications of some of these new technological advancements. It also formalizes the requirement of offices and firms to have a compliance protocol. What do you do when you receive client data? Now, this opinion came out in 2017. It replaced something from 1999, which at the time the previous ABA opinion stated that all communications, including unsecured unencrypted email, were generally considered quote unquote secured. Obviously, I think we can agree that is not the case anymore and certainly the ABA agrees that is not the case anymore. That's because we've transitioned from a time when a lot of lawyerly work was done not using the internet, not using emails. It was done using fax and paper and so on. And now we've transitioned to a mostly electronic way of providing legal services to our clients, and so our technological rules of our self-governing ethics need to evolve to account for that. It also brings up a very interesting question which is something just to think about going forward or discuss with others in your group of how do you reconcile a situation where you have a client who doesn't want to use secured communications or doesn't want to secure their data in working with you? How does that square with your job or your requirement as an attorney to ethically abide by this opinion and be mindful and guard clients against technological mistakes? Is it possible to provide competent representation to a client if they are unwilling to adhere to your firm's compliance protocol? It's a really interesting question that I don't have an answer to but provokes an interesting discussion about what does it mean for us to have client intake and work with clients, and what happens when the client's wishes run against our ethical obligations? That's not a novel question to lawyers. That presents itself in different ways, but via technology, do we have yet another way we might have to consider this dilemma? Subsequent to 477R, a year and a half later in October of 2018, the ABA issued formal opinion 483, which kind of is the natural follow on to 477R, which deals with what happens if a lawyer's information is breached? If there is a data breach at the firm and client data is compromised, what do you have to do? One important thing to think about here is that this opinion formalizes the notion that has sort have long been held in technological circles that there are two kinds of businesses that exist-- ones that have been hacked, and ones that will be. Not ones that might be or not ones that could be. And perhaps even these are ones that have been and they don't know it yet. But it's just such a part of life nowadays that businesses either have been hacked or will be hacked, and that is the mindset that you should have when you are thinking about protecting client data, bringing in consultants, and hiring people to do their best work to defend your clients' data. Now, it turns out that law firms tend to be excellent targets for hackers, and the reason for that is that they have a lot of very valuable data. And unfortunately, the history is such that it is not always as well protected by law firms as it might have been by the clients themselves, because we as lawyers have been as equipped to have a conversation about technology and how that technology might affect our representation of clients. The opinion describes a bunch of different cyber episodes, so to speak, that might comprise a data breach, which would rise to the level of needing to report to a client. These include things such as ransomware attacks, as we've discussed a little bit earlier today, systems attacks that might break or somehow damage the infrastructure of the firm or workplace, as well as exfiltrations, which are probably the worst kind of breach, which is someone hacks into your system and is able to remove data such that you may not even have a copy of that data anymore, and that's why having backups is so important, but removes that data from your servers, for example, to the adversary's servers. There is no ethical violation in being hacked. It's really important to make that very clear. The ethical violation occurs when non reasonable efforts are made, unreasonable efforts are made to protect that data. If we as attorneys are making reasonable efforts to protect our clients' data and we still get hacked, we have not necessarily done anything wrong as long as we were doing our best to protect or prevent that from happening in the first place and once we detect that it has happened, to make every reasonable effort to stop the attack if it is ongoing from continuing. This also introduces a very interesting question of what to do with former client data that has been hacked, and that's why it's really important to establish some sort of archival or deletion plan for working with that data. The ABA proposes a couple of different ways to resolve how to deal with informing a former client about information related to a hack. But one of the most important things to draw from this opinion, I would say, is discussion about data retention needs to be part of your firm's intake process or your intake process for dealing with new clients. Who owns what has always sort of been part of the conversation. Generally as we know, we return client data to them when we are done working with it. How does this work in a digital context? It is really important for your intake plan at your firm to handle what happens to digital versions of client data when the representation has concluded because the matter has concluded. Speaking of concluded, that is going to wrap up our discussion today on security. This will be the first of our two discussions generally at length about security in the legal context. But hopefully you've come away from today with a better understanding of how your system works, what memory is, and why when we delete things on our hard drives, it doesn't actually get deleted and what some of the ramifications might be for that. And hopefully you also have come away from this with an understanding of what to do going forward establishing best practices for working with client data to stay within the ethical guidelines proposed by the ABA, and just to generally have a more technical conversation with clients about your representation of them and what happens to their data when that representation has concluded.
B1 memory data hard drive hard disk disk drive CS50 for Lawyers 2019 - Cybersecurity 3 0 林宜悉 posted on 2020/03/28 More Share Save Report Video vocabulary