Subtitles section Play video Print subtitles We were talking a few weeks ago about how we can add additional processes into a computers (sic) to do specialist tasks One of the things we talked about was floating point processors Now these days they're built directly onto your CPU But, you can still get some CPUs –some of the variants of the ARM CPU– and certainly if you go back in history Some of the other CPUs you can get didn't have a floating-point unit. They could do math but they could only do integer math So they could add 1 and 2 but they couldn't add 1.5 and 2.5 Well you "could" but you had to write the software to handle the fractional part of the number an do the math and stick it back together in the right order. So I was wondering What's the difference in speed that a floating-point person would actually make? So as I said most computers we have these days have floating-point units of some sort built-in So I went back to one of my old ones and I decided to write a program to test it So I wrote a very simple program which does a 3d spinning cube. So the program is relatively straightforward It's got a series of eight points which restores representation into a series of matrix transformations on them To get them into the screen coordinates and then we draw the lines so I did this using floating point maps and the programs running here and we can see it's Reasonably quick it takes no point not for five a second to calculate where all the screen coordinates need to be for each frame sometimes varies but that's in general what it takes so I then went off onto a popular auction site beginning with a letter E and Ordered myself a floating point ship for the Falcon and I then inserted it into the machine and I recompiled the program this time to use The floating point unit. So this version it's doing floating point maps It's using the fractions But it's all being done in software this machine code instructions for the six 8030 chip in there To calculate all those different floating point numbers so then we compiled the program this time to actually use the floating point unit and This version runs about 4.5 times faster it takes no point Not one seconds rather no point naught four five seconds to do exactly the same calculations. This is exactly the same source code I just recompile a using GCC to produce a version that used the floating-point Unit and you can actually see that the graphics are slightly smoother and the time is much less So the fact we can speed it up by doing it in hardware. Perhaps isn't that surprising? There's lots of tasks where you could either implement it in software or implement it in harder And if you implement it in hard way it's often Faster to do so so we are going to try that but it's actually worth thinking about what's involved in adding up Floating-point numbers Tom did a good video Right back at the beginning computer file looking at how floating-point numbers are represented as a sort of high-level and it will say well It - naught point nine nine nine nine nine nine nine then after a while you'll stop but actually when you get down to the sort of Level of the computer's having to deal with them and see how they're stored It's quite interesting to see how they're stored and how then manipulating them Are you writing software to do something simple like adding two numbers together? Actually ends up being quite an involved task compared to adding together two binary numbers So let's look at how we add numbers together So let's let's take a number and to save time I've printed out the bits. So let's take the number Let's say 42 because why everyone who uses that so we've got here the number 42 is one zero One zero one zero and then we need to fill the rest of these with zeros. Well ignore that for now So that's bit naught over on the right hand side through two bit One two three, and these of course are the same with the powers of 2 So 2 2 zeros is ones two to the one is two four eight Just like we have powers of 10 when we do decimal numbers. So let's everyone to add together 42 and 23 so I've got another binary number here 23 so the same bits and We'll just basically do addition. So 0 plus 1 Shaun is 1 good yeah, ok, 1 plus 1 is 0 and we have to carry the 1 0 plus 1 plus 1 Okay. Yeah 1 plus 0 plus 1 So yeah We've run it up 42 this numbers 23 42 plus 23 is 65 and sorry produced 65 in binary is a result So rubbing up in binary is a relatively straightforward thing What we do is we take from the right each pair of bits add them together We produce a sum bit and occasion We also produce a carry bit and then we add the carry on in the next column just like we do when we do decimal arithmetic And you can generate systems that represent Decimals or by symbols I guess they'd be called or fractional numbers Using this so you can use a system which is quite common Was used in Adobe Acrobat as used on the iOS for 3d graphics at one point which is fixed point numbers where you say that say about 32 bits the top 16 bits are going to represent the sort of Integer part the bottom 16 bits are going to represent the fractional part and the basic way to think about that is you multiplied every number by 6 on 65,536 shifts everything along and then when you want to produce the final result you divide it all by 6 65536 now the problem with fixed point numbers is that they have a fixed scale Fixed is key in the name. So for example, if we use 32-bit fixed point numbers splitting into 16 bits and 16 bits. That's great. We can go up to 65,000 or so on the integer part, but if we need to get to 70,000 we can't story Likewise we can go to 1 65536 the other thing we'd agree to go to 1 130 1072 sort of a thing we can't because we don't have that resolution on occasion We need the bits down here to represent very small quantities and occasion We want them to represent very large quantities for something like 3d graphics or graphics in general Fixed point numbers can work. Well for general-purpose things. They don't work that well So what people tend to do is they use floating-point numbers, which is the right things as tom said in scientific notation so rather than writing 102 4 we write it as 1 point 0 to 4 times 10 to the 3 so we're using scientific notation We can do the same in binary rather than writing 101 One, oh, we can write one point zero one zero One times two this time rather than 10 to the 1 2 3 4 so we can write it 2 to the 4 So what floating-point numbers do is that they say okay rather than representing Numbers using a fixed number as bits for each we're going to represent them in scientific notation effectively. We're the sort of Number that were then going to multiply by 2 to the something to shift it to the right point to make things absolutely clear I'm going to use Decimal numbers here to represent the 2 times 10 to the 4 so I will cheat and use this here but of course it would be 102 the 1 0 0 so I guess the question that remains is how do we represent this in a computer? We've got to change this notation, which we can write nicely on a piece of paper to represent the binary number Multiplied by a power of 2, but how do we represent that in the computer? what we need to do is take this and find an encouraging which Represents it as a series of bits that the computer can then deal with So we're going to look at 32 bit of floating point numbers mainly because the number of digits I have to fill in become Relatively smaller to deal with then rather than doing 64 bit We could have done 16 bit sign things, but they use the same thing It's just the way they break it down change your slightly how many bits are assigned to each section So we've got our 32 bits and we need to represent this number in there we start off by splitting this up into A few different things. So the first bits or the most significant bit are in the number. The one on the left over here is The sign bit and that says whether it's a positive number We just let it be zero or negative number what which case it will be what? So unlike two's complement Which David's looked at in the past two's complement is equivalent to the ones complement with one? Added to it. The sign is represented purely as being a zero means positive one means negative We just have one bit represented with that They then say we're going to have eight bits, which we're going to use to represent the exponent this bit here I what power of 2 which gives us 255 or so Different powers of two we can use we'll come back to how that's represented in a second and then the rest of it Is used to represent the mantissa as its referred to so the remaining 23 bits are at the 32 are used to represent The remaining bit of the number, okay So they've got 23 bits to represent the number which is n gonna be multiplied by the 8 bit exponent They said every single possible floating-point number you're gonna write down is going to have a 1 as its most significant digit Except 1 0 say, ok. We'll treat 0 as a special case and to represent 0 they just set all bits to be zeros So we know that this is going to be 1 what we know is 1 so we don't need to encode it It's always going to be 1 so actually these 23 bits here are the bits that come after the 1 so it's one dollar So on which are all the bits that come after here So we we sort of don't encode that bit because we know it's there one way to think about floating-point numbers Is there a sort of lossy compression mechanism for? real numbers floating-point real Fractional numbers because we're taking a number which is some representation and we're compressing it into these bits But we lose some information and we can see that in a second we run a little demo and we'll see that actually it can't represent all numbers and It's surprising sometimes which numbers it can't represent and which can each it can so we can then start writing Numbers in this form and to simplify things. I've printed out a form like this So if you want to write out the number one, it's one point naught naught naught naught Times 2 to the power of 0 so it's one point na-na-na-na-na-na-na-na naught times 2 to the power of 0 which is 1 so it's 1 times 1 which is 1 and of course the sign bit because it's positive would be 0 to say that so we could write that out as The number so we can start assigning these things to the different bits We put a 0 there cuz it's positive and the mantissa is all 0 so we just fill them up with Zeros, and that leaves us with this 8 bit here We've got to represent 2 to the power of 0 now they could have decided to just put 0 in there But then the number 1 would have exactly the same a bit patent There's the number zero and introduced that's potentially not a good idea so what they actually say we're going to do is we're going to take the power which will go from mind 127 through to 127 and then they add 127 on it. So our exponent here. Our power of 2 is 0 so 0 plus 127 obviously is 127 so we encode 127 into these remaining bits so 0 1 1 1 1 1 1 1 So to encode the number 1 like that we encode it into the binary representation 0 for the sign bit 0 1 127 for the exponent and then because we know that one's already encoded the rest of it becomes 0 this is a lossy system We can encode some numbers, but we're only encoding 24 significant bits where they are within the number of encoding changes But we're only encoding 24 significant bits So that's just right program That takes a number 1 6 7 7 7 - 1 5 an integer number and adds one to it And we'll do this in a loop We'll add one to the result and add one to the world and print out the values So we think that one six seven seven seven two one six one six seven seven seven Two one seven and we'll do this with both for an integer variable. So a 32-bit integer and also with a 32-bit float, so got to money without program written here on the computer So we set up a float why we set up the variable. I to be 16 million 770 7215 checking things binary there and we set Y to equal I so they both start off with the same value We're then going to print them out. Where's the decimal and the floating point? Well, I'm also going to print out the hexadecimal Representations of the bits so we can see what's going on We're then going to add 1 to the value of y and add 1 to the value of ice So we're going to increment them both So let's run this program To not million mistakes that's always a good sign and let's run it so we get And 15 and we get 16777215 point normal or not? What would expect? 16,777,216 and the same there. So now we had one on it again and we get for the integer value 16777215 point number that's not a right. Okay, so that's not right what's going on there? Well, if we think about how we represent this Let's think about the number one six seven seven seven two one six That number is one times two to the 24 and I sort of tricked by Generating it this way at the beginning. There's one with lots of zeros after it times two to the 24 we have only 23 bits to represent this bit in here if We want to add on an extra bit We would need 24 bits here. We've only got 23 we can't do it So we can't represent it. If we added up to each time, it would work fine. So actually as we get some larger numbers We still have the same number of significant bits or significant digits But we can't store certain values. Well as we can with integers, so it's a lossy compression system basically, we can store a large range of values for anything from minus 2 to the power of 127 through 2 2 to the power of 127 or we can go very very low and have numbers as small as 2 to the minus 127 But we only have a certain they were pursuing So if we deal with very very large numbers that we've still only got 23 bits We have the precision and if we do them very small, but numbers we can got 23 bits worth of precision, which is fine We can cope with that because often when you're dealing with big numbers You're not worried about the small fiddly decimal places in your small four significant figures if you're measuring how far it is from the Earth to Alpha Centauri in millimeters plus or minus a few Millimeters a few points of a millimeter isn't going to make much difference that sort of thing So it's a compression where it's just a lossy system you All for this videos gonna mean writing zeroes 23 times, maybe I should have done 16-bit numbers
B1 floating point represent floating point naught binary Floating Point Numbers (Part1: Fp vs Fixed) - Computerphile 2 0 林宜悉 posted on 2020/04/04 More Share Save Report Video vocabulary