Subtitles section Play video
On "Computerphile" we just love provocative and mysterious titles and
carrying on from the last time we spoke let's say this is going to be a chat
about what came to be called the UNCOL problem. "Universal Computer [Oriented] Language" I think it stands for.
It was more specific than just any old computer language. It
was :" ... is there a unique intermediate language which would suit everybody?"
You know, not as high as C even and not quite down at the absolute binary level but
more like a sort of absolutely Universal Assembler - a pseudo-assembler. It's not
really hardware implemented on any machine but it's it's one that we can
all work with and every compiler in the whole world - all they would have to do is
produce the UNCOL language, if we can only agree what it is, and then every
system could talk to every other system at this agreed low level. Well, as you can
imagine, it doesn't work like that. It very soon became obvious that, yes, this
business of putting a level in there and saying: "We'll all compile to intermediate
code", is fine but when you start looking at what facilities it should have what
facilities it shouldn't have yo're up against the fact that computer hardware
designers like to do things their own way. I mean, numbers of registers? Might be
16, might be fewer that's no big deal Some of them have arranged those registers
[as pointers into] in a formal or informal stack. Others don't. Should we always assume that we
have stack capabilities? And I think somebody pointed out to me - I think Ron
Ron not who originally created these notes, he said "The thing in the end that
kills you is that they all do input-output differently; there's almost
no agreement about how you do I/O". So, fairly soon the idea of finding one
unique into me language had to be forgotten about.
But the idea of different intermediate languages at different levels of
sophistication really did gain traction in the 1980s. We mentioned that Steve
Bourne had his Z-code as part of his Algol 68 project, way back in the early 80s.
A little bit later on, I think it was in the 80s, many of you will
know this one better, James Gosling developed the language Java, in which
he decided that pointers were dangerous and should be hidden (but therein lies
another story). But the big thing that James made a feature of was to say:
I want my Java systems to compile down to what he called 'bytecode'. In other words
it was a sort of pseudo-assembler with really, like, single-byte op codes like A
and Z, and whatever. And, yes, bytecode became flavour of the month we all go down
to bytecode but then what do you do? Well, you've got choices. You could either
write an interpreter for bytecode which will be easy to change. It will be a
little slow, a bit big. If you really care passionately about having the ultimate
super-fast and efficient binary, you can always compile bytecode; get it
smaller and all that. So you had options. But the idea was that, yes, you would have
an intermediate code. Even so, it's not a one-size-fits-all. There's still ... it was
ideal for what James wanted to do but its extensibility to be a universal
panacea? Not so. You see, let me give you another good example of why some people
might want to move the semantic barrier a bit higher I mean bytecode is fairly
low-level. What about if we move it up so we're getting more airy-fairy?
heaven knows might encounter Haskell way up there somewhere!
Classic example of course is a development of C++ and as many of you
know, as its name implies, it goes beyond C. It adds classes and all sorts of
other features to C. And the idea from Bjarne Stroustrup,
the inventor, was that to get something going, in the first instance at least, he
would, of course, do the obvious thing. His compiler would compile C++ down to
C. And then you could put it just through an ordinary C compiler, for the back end.
So, you see, his 'UNCOL'is at a much higher level of sophistication than
pseudo-assembler type bytecode level. And you might say: "Oh well, that's great,
I mean, it's obviously suits C++ to do that". Yes, it did, but there are big problems
with this approach Once you broadcast the fact that
actually C++ your "Mark I" compiler is producing C under the hood, you will have
the devil's own job in convincing benevolent hackers, who think they can
generate better C code than Bjarne Stroustrup can, getting in behind the
scenes and messing about with the way he does classes for example. So, I suppose
what I'm saying more generally about this, is that very often you will have a
very good solution for a language system to establish a bridge-head, and to get
something working. But, in the longer run, you might want a more direct version
that isn't is prone to hacker intrusion, gross abuse or just things going a bit
wrong because of the nature of the intermediate language being so rich and
having a mind of its own. Now, you might say: "That can't be an issue, can it?" Yes, it can.
Because this whole question of 'level' of your intermediate code. This thing gets
me there. So, why do I need to go direct? Let me give you another example
Not C++ this time. Another well-known example for many of you is PDF. It's been
so well-established for so long now since 1989,
that many of you using it will not know that in the early days it came off the
back of Adobe's very successful language called PostScript. And PostScript was
there as, you know, the universal graphics language.
It drove laser printers it drove whatever It was a wonderful achievement. In order to
get a PDF - the way you did it originally was you compiled your PostScript with
with an Adobe-provided utility called Distiller. But the problem was in many
ways it was very graphically sophisticated but it was Turing Complete.
You could do anything in it [given enough memory] and, indeed, I often thought: "Well the next
program I wrote in PostScript, before I do any typesetting as ordered by the
customer, I will get my program to solve a commands function first. Can you
imagine the delay: "I'm sorry, I'm going to compute ackermann(3,1) before I turn my
attention to doing your miserable little piece of typesetting. But in principle
you could have done that - as long as it didn't run out of memory. But, you know,
I'm just saying this to make the silly point that that's perfectly do-able!
You sometimes found that your PDF, produced out of compiling PostScript with Distiller was
yards bigger than the input. Not very often but sometimes. So, there again, you
see, in order to stop abuse and to point the way to the future Adobe
very quickly said: "What we must do, for those that don't know about PostScript,
have no need for it, is to give a direct route to PDF. And they called it PDFwriter,
back in the early days. And then, of course, people don't not wantingto be
bounden to Adobe, quite rightly said: "Fabulous! What we need to do is to
replicate something of what Distiller does. We'll write utilities with names
like 'ps2pdf', which you'll typically find in PostScript offerings on Linux
and all this kind of stuff. But it makes the point that very often that
directness of approach gives you a good result
and stops people messing about under the hood and doing things which are
ridiculous and expensive. If you start saying: "No, from now on it's much quicker
to go direct, we know how to do it. Let's do it; let's keep it clean". So that is I
guess, I think, a feature. Still I keep reading stories of people using
intermediate codes for compiling programming languages who suddenly saying
"Well, 20 years down the line we think intermediate codes are bad. It's far
better to do it direct, in some other way" And all you can say, out of this. is that
every time you get into porting software you learn something every time about the
pros and cons of having an intermediate representation. Or do you jump over it
and go direct? There is no universal right answer. The more you look at the
scene at the moment about program language implementation the more you
realize that a huge number of the offerings out there might look to you
like straightforward point-to-point compilers you know: " ... I'm running on ...
whatever I'm running on at the moment ... I'm running on an ARM chip. It's all self-hosted on the
ARM chip. It compiles ARM code for further use on further ARM chips. It
doesn't do anything else!" Not true. If you look under the hood of gcc, of course
Stallman and the GNU effort did such a wonderful job in creating for us a new
version of 'cc'; when you look in there at the possible back-ends for different
architectures you realize it's really a cross-compilation system. You can compile
from anything to anything. Now other people, other than the GNU effort, got
there eventually and realized the same thing. I mean I think Microsoft, in the
mid 80s, did actually had the nerve to develop something that I think they
called "ntermediate Language". I don't know whether Microsoft did try and
actually copyright the phrase "Intermediate Language" but it's part of
the same mindset. it's not just them. {It's also] Apple and Steve Jobs
always used to have: "It may have existed[before] but it was done by a bunch o
no-hopers. And until we discovered it; packaged it; marketed it and put it up for you
you might as well think that it never existed!" And that was Jobs through and through.
But maybe all big computer companies have a little bit of this inside them:
" ... it didn't really exist, in a usable way, until *we* discovered it".