Subtitles section Play video
We're going to talk about going from Galois fields to Reed-Solomon codes.
We must be mad, Sean. Really, I mean, so many of you have said: "Just do Reed-Solomon,
y' know. You've done Hamming codes. They [Reed-Solomon] can't be that much more complex?" Oh yes they can!
The stuff we have done earlier - and we've done a fair bit on Hamming codes, which if you
remember are basically going to correct a ... put right a single error. That all
happened in the 1960s. And in two of the videos out there, which we'll put the
links out for you, I did a thing called Multi-Dimensional Error Correction where
I had two bits of information (which was a San Francisco state of weather). So
there was two bits for the state of the weather, like I think it was '11' for "sunny",
or something like that. But in order to protect those bits against a
1-bit damage, I had to add no fewer than 3 extra bits. I had to make it into
a 5-bit code. But it was OK, because checking up whether it had been damaged
was easy. It was a simple parity check. It was effectively saying : "Look, here's the
3 parity-check bits at the end". I've made the rule that overall it must be
even parity. I know that one of them says '1' Another one says '0' but this [third] bit I
don't know. But it's got to be even parity overall so if I've only got a single
'1'and a '0', I need another '1' to make that grouping be even parity.
Now that's the easy error correction. How did it advance from that? What changed?
Well, we're talking about 10 years hence. I mean Richard Hamming was very definitely
late '50s / early '60s. The next phase of this or, this "big jump forward"
thst we would want to do, was by Reed and Solomon [and] was 10 years later. It was late '60s / early '70s.
O!h and big surprise: Were Reed and
Solomon at Bell Labs ?! No! For a change. Everybody else seemed to be at Bell Labs
[but] Reed and Solomon were, I think, at MIT's Lincoln Laboratory. But there
was a realization that if you wanted more powerful error correction there
were several things you could do but, the more they looked into it, the more they
found themselves having to learn about Pure Mathematics concepts called Galois
Fields. And this is finite field arithmetic where you've got to be able
to find a multiplicative and an additive inverse. We've tried to prepare you for
this by doing ISBN book codes, which is a very simple manifestation of those two
things. We thought in our early things [i.e. videos] where a code word became a string of bits.
And some of those are information bits. And interleaved with them, or parked at
the right-hand end very often, were parity check bits. What we're now going
to try and do is instead of just having one great long string of bits, linearly,
we're going to try and make it be almost two-dimensional. Think of every one of
these positions in your code-word now as not being a single bit but a column
of bits. Let's say it's an 8 bit column. If you're doing something like Reed-Solomon
correction in a CD context, y'know, how big is the bucket size of
these columns? I think it's something like 40-odd bits but then even that
couldn't cope. They had, like, two layers of Reed-Solomon encoding - one backing up
the other. So, if you like, the filling up of these things, instead of linearly
going like that - and then at the very far end you put a few parity check bits - what
we now do is we declare that every one of these positions isn't a *bit* position
it's a *symbol* position but a symbol can be multiple bits, OK? For the sake of
argument let's say it's an 8-bit symbol, a byte. The way they get filled up is the
bit stream comes in and it fills up a column of 8, and then it fills up a next
column 8 and a next column of 8. So it's almost like we've got a 2-dimensional
array of bits - and symbols in that direction-
but every symbol is composed of 'n' bits, as it goes along. What's the advantage of
doing that? Well, you can see one advantage, when you think about it,
straight away. Hamming codes, for example, the old way, tended to presume you've got
the occasional error, now and then, wide apart. What this is anticipating - if you
can fill up symbol positions - is you might get 'burst errors'. Yeah, you might
get - here we go - bits coming off a CD, trying to play your music.
It's encoded music. Now, if they're filling up a bucket in a column,
in some sense, and then moving on to the next one
there is just a chance that a localized scratch will get all its 'bit-clobbering'
over and done with, within two symbols shall we say. So, we know that we can
devise codes that will can detect and potentially correct a certain number of
errors but if we can make it so that those errors are not [only] bit errors within
the symbol but just the symbol itself ... something's wrong with it! Right?
You then might stand a chance if you've got, again, parity-check *symbols* at
the far right-hand end - not [just] parity-check bits, if they've got enough information
in them, you might be able to say: "Something went wrong. I got a burst error
there, as a scratch on that CD. Can I put it right?" Yes, but it's not going to be
simple-minded parity checking; it's gonna be serious hardcore stuff. Because those
check symbols at the end will normally be arranged so that if the information
is correct and nothing's got damaged there'll be zeros. If something gets
damaged, the first thing you know about it is that the parity-check symbols at
the end are nonzero. You're getting [perhaps] 3, 5, 12, 15. What does that mean?
well the answer is using lots and lots of detective work - by the
way those symbols that you put on the far right-hand end - revel in the name of
"syndrome", which i think is a wonderful word. And my first thought was what on
earth are pure mathematicians, or communications engineers, doing with
syndromes? They're medical aren' they?! Well, I looked up in the dictionary and as far
as I can make out it all makes sense. If you have a certain syndrome it means you
are exhibiting symptoms of an underlying problem
So the grouping of symptoms that's caused by an underlying problem is often
called a syndrome. [There's] something like the Guillain-Barre syndrome isn't there? I don't know what
it is but I'm no doubt you'll all put me right on that. But you can see it makes
sense. The signal that something has gone wrong is you get all of these
information bytes or even bigger columns, multiple bytes, whatever. But right at the
end is now a "checksum from hell". You have got a syndrome a set of remainders if
you like, that are not zeros. Given only that information, how can you work
backwards and find out which of these columns got hit and where in the column
it got hit? And the answer is: By using Galois Field theory over finite fields and
doing lots and lots of long divisions and additions. So, the bottom line is that
for this work and particularly if you're using, as we are of course now, powers of
two, Galois said: "I can liberate you from it having to be primes all the time,
I can do powers of primes. And for all of you future computer scientists that I'm
not even aware of, you'll love this because your beloved 2 comes into the real
world because where you say '4' and I say: Don't think of it as 2 x 2
Think of it as 2 ^ 2." And so he said: "Yeah, I can do powers of
any prime, including two, but what he didn't say is: Here are the rules
if you you light on 2 and say you want to use my methods, here's rule number 1.
Everything must be done modulo 2 Not modulo your bigger number - like 256 -
that's different. Modulo 2. So, for addition what do we know
about addition modulo 2, from Bletchley Park Sean? What do the
mathematical logicians call 'addition modulo 2' ? >> Sean: This is exclusive OR
>> DFB: It's Exclusive OR! So, no sweat for computer scientists. All our additions of these
numbers, represented in bytes or whatever, are going to be done
with Exclusive OR. Worse still - as we found out in the ISBN previous example -
you've got to be able to find multiplicative inverses. And that's going
to lead us into doing long divisions modulo 2 in a Galois Finite Field. Now
that is a bit hair-raising but not too terrifying. But if you're prepared in the
end to do all of that work then fine. You can use your 'syndrome' and analyze
it to tell you where the errors went. That's a lot of equation solving. For those of
you into these things it's like solving lots of simultaneous equations saying [for example]:
"The only logical solution to this damage is it absolutely must be column 3 and
column 13. That's where the damage has come. Does that syndrome always work back
to getting your precisely one answer? Yes - it's just magic the way that this
error correction works but it is complex because the brute force way to do it is
to get a set of simultaneous equations and - some of you will know this - get lots
of simultaneous equations to solve you use 'matrix inversion'. That is computationally
a very heavy undertaking. And so you're looking all the time for simplifications
because, if you don't, you're sitting there, like Sean and I were in the early
'80s, thinking: " ... this error correction [for CDs] this is in real time! How is this thing
solving sets of matrix equations in real time to make sure I can listen to this
CD without hearing the scratches? Where's our supercomputer free with every CD - a
Cray XMP [in those days] - to solve your syndrome equations? Nope! what's the answer? You shake
head and think: it's those hardware types! It's custom
hardware! Yes, custom hardware can fairly easily attain supercomputer capabilities
so long as it's in a tightly defined field. And it turns out, thank heavens,
that error correction in Reed-Solomon schemes lends itself very nicely indeed
to things that computer engineers love like "shift registers" [and pipelining] and all sorts of
hardware 'specials'.