Subtitles section Play video Print subtitles On "Computerphile" we just love provocative and mysterious titles and carrying on from the last time we spoke let's say this is going to be a chat about what came to be called the UNCOL problem. "Universal Computer [Oriented] Language" I think it stands for. It was more specific than just any old computer language. It was :" ... is there a unique intermediate language which would suit everybody?" You know, not as high as C even and not quite down at the absolute binary level but more like a sort of absolutely Universal Assembler - a pseudo-assembler. It's not really hardware implemented on any machine but it's it's one that we can all work with and every compiler in the whole world - all they would have to do is produce the UNCOL language, if we can only agree what it is, and then every system could talk to every other system at this agreed low level. Well, as you can imagine, it doesn't work like that. It very soon became obvious that, yes, this business of putting a level in there and saying: "We'll all compile to intermediate code", is fine but when you start looking at what facilities it should have what facilities it shouldn't have yo're up against the fact that computer hardware designers like to do things their own way. I mean, numbers of registers? Might be 16, might be fewer that's no big deal Some of them have arranged those registers [as pointers into] in a formal or informal stack. Others don't. Should we always assume that we have stack capabilities? And I think somebody pointed out to me - I think Ron Ron not who originally created these notes, he said "The thing in the end that kills you is that they all do input-output differently; there's almost no agreement about how you do I/O". So, fairly soon the idea of finding one unique into me language had to be forgotten about. But the idea of different intermediate languages at different levels of sophistication really did gain traction in the 1980s. We mentioned that Steve Bourne had his Z-code as part of his Algol 68 project, way back in the early 80s. A little bit later on, I think it was in the 80s, many of you will know this one better, James Gosling developed the language Java, in which he decided that pointers were dangerous and should be hidden (but therein lies another story). But the big thing that James made a feature of was to say: I want my Java systems to compile down to what he called 'bytecode'. In other words it was a sort of pseudo-assembler with really, like, single-byte op codes like A and Z, and whatever. And, yes, bytecode became flavour of the month we all go down to bytecode but then what do you do? Well, you've got choices. You could either write an interpreter for bytecode which will be easy to change. It will be a little slow, a bit big. If you really care passionately about having the ultimate super-fast and efficient binary, you can always compile bytecode; get it smaller and all that. So you had options. But the idea was that, yes, you would have an intermediate code. Even so, it's not a one-size-fits-all. There's still ... it was ideal for what James wanted to do but its extensibility to be a universal panacea? Not so. You see, let me give you another good example of why some people might want to move the semantic barrier a bit higher I mean bytecode is fairly low-level. What about if we move it up so we're getting more airy-fairy? heaven knows might encounter Haskell way up there somewhere! Classic example of course is a development of C++ and as many of you know, as its name implies, it goes beyond C. It adds classes and all sorts of other features to C. And the idea from Bjarne Stroustrup, the inventor, was that to get something going, in the first instance at least, he would, of course, do the obvious thing. His compiler would compile C++ down to C. And then you could put it just through an ordinary C compiler, for the back end. So, you see, his 'UNCOL'is at a much higher level of sophistication than pseudo-assembler type bytecode level. And you might say: "Oh well, that's great, I mean, it's obviously suits C++ to do that". Yes, it did, but there are big problems with this approach Once you broadcast the fact that actually C++ your "Mark I" compiler is producing C under the hood, you will have the devil's own job in convincing benevolent hackers, who think they can generate better C code than Bjarne Stroustrup can, getting in behind the scenes and messing about with the way he does classes for example. So, I suppose what I'm saying more generally about this, is that very often you will have a very good solution for a language system to establish a bridge-head, and to get something working. But, in the longer run, you might want a more direct version that isn't is prone to hacker intrusion, gross abuse or just things going a bit wrong because of the nature of the intermediate language being so rich and having a mind of its own. Now, you might say: "That can't be an issue, can it?" Yes, it can. Because this whole question of 'level' of your intermediate code. This thing gets me there. So, why do I need to go direct? Let me give you another example Not C++ this time. Another well-known example for many of you is PDF. It's been so well-established for so long now since 1989, that many of you using it will not know that in the early days it came off the back of Adobe's very successful language called PostScript. And PostScript was there as, you know, the universal graphics language. It drove laser printers it drove whatever It was a wonderful achievement. In order to get a PDF - the way you did it originally was you compiled your PostScript with with an Adobe-provided utility called Distiller. But the problem was in many ways it was very graphically sophisticated but it was Turing Complete. You could do anything in it [given enough memory] and, indeed, I often thought: "Well the next program I wrote in PostScript, before I do any typesetting as ordered by the customer, I will get my program to solve a commands function first. Can you imagine the delay: "I'm sorry, I'm going to compute ackermann(3,1) before I turn my attention to doing your miserable little piece of typesetting. But in principle you could have done that - as long as it didn't run out of memory. But, you know, I'm just saying this to make the silly point that that's perfectly do-able! You sometimes found that your PDF, produced out of compiling PostScript with Distiller was yards bigger than the input. Not very often but sometimes. So, there again, you see, in order to stop abuse and to point the way to the future Adobe very quickly said: "What we must do, for those that don't know about PostScript, have no need for it, is to give a direct route to PDF. And they called it PDFwriter, back in the early days. And then, of course, people don't not wantingto be bounden to Adobe, quite rightly said: "Fabulous! What we need to do is to replicate something of what Distiller does. We'll write utilities with names like 'ps2pdf', which you'll typically find in PostScript offerings on Linux and all this kind of stuff. But it makes the point that very often that directness of approach gives you a good result and stops people messing about under the hood and doing things which are ridiculous and expensive. If you start saying: "No, from now on it's much quicker to go direct, we know how to do it. Let's do it; let's keep it clean". So that is I guess, I think, a feature. Still I keep reading stories of people using intermediate codes for compiling programming languages who suddenly saying "Well, 20 years down the line we think intermediate codes are bad. It's far better to do it direct, in some other way" And all you can say, out of this. is that every time you get into porting software you learn something every time about the pros and cons of having an intermediate representation. Or do you jump over it and go direct? There is no universal right answer. The more you look at the scene at the moment about program language implementation the more you realize that a huge number of the offerings out there might look to you like straightforward point-to-point compilers you know: " ... I'm running on ... whatever I'm running on at the moment ... I'm running on an ARM chip. It's all self-hosted on the ARM chip. It compiles ARM code for further use on further ARM chips. It doesn't do anything else!" Not true. If you look under the hood of gcc, of course Stallman and the GNU effort did such a wonderful job in creating for us a new version of 'cc'; when you look in there at the possible back-ends for different architectures you realize it's really a cross-compilation system. You can compile from anything to anything. Now other people, other than the GNU effort, got there eventually and realized the same thing. I mean I think Microsoft, in the mid 80s, did actually had the nerve to develop something that I think they called "ntermediate Language". I don't know whether Microsoft did try and actually copyright the phrase "Intermediate Language" but it's part of the same mindset. it's not just them. {It's also] Apple and Steve Jobs always used to have: "It may have existed[before] but it was done by a bunch o no-hopers. And until we discovered it; packaged it; marketed it and put it up for you you might as well think that it never existed!" And that was Jobs through and through. But maybe all big computer companies have a little bit of this inside them: " ... it didn't really exist, in a usable way, until *we* discovered it".
B1 intermediate language adobe direct compiler level The UNCOL Problem - Computerphile 2 0 林宜悉 posted on 2020/03/27 More Share Save Report Video vocabulary