This paper was written while the author was with YOURDON Inc.
This document has been made available by courtesy of the author.
It has been re-typed by Phil Budne, who is to blame for any transcription errors.
by
P.J. Plauger
A language is described that was implemented on a PDP-11 computer for writing system-level code for the PDP-11 family of minicomputers. The Little Implementation Language LIL offers a number of features that facilitate writing structured, high-level code with no sacrifice in efficiency over assembly language. The discussion ends with a harsh evaluation of its future usefulness.
There is a broad gap between the fanciest macro assemblers and the simplest system implementation languages (SILs), a gap that needs filling. No matter now elaborate an assembler is, it is still just an assembler, with all the limitations on productivity attendant on writing code at too detailed (an unstructured) a level. SILs on the other hand, seem to impose a layer of insulation between programmer and machine that causes more trouble than benefit in an environment where program size and execution time are very important, as in mini- and microcomputers. What is desired, then, is a language that lets one talk about the matters that concern systems level programmers, but at a level that permits economic expression and encourages the well structured coding style we now know to be important.
It is no surprise that such languages are not common. People who think in terms of high-level languages have traditionally been problem-oriented; people who write systems have been reluctant to yield control of code generation to a language processor. Is is only recently that, thanks to the success of some of the some of the earlier SILs, the use of compilers has earned some grudging respect from systems writers. The cobbler's children are always the last to get shoes.
This paper describes a compiler, for the PDP-11 family of computers, that lies in this region. It can be considered a narrative assembler, in the sense that the users knows rather precisely that code is generated, but can specify it with an economy of notation. It can also be considered a little implementation language, because it deals with multi-line, nested control structures rather than, at a time code, and because it uses data type attributes to help direct the generation of instructions.
The LIL language was defined in Backus-Naur Form, which was used by a compiler-compiler to produce a parser. The parser was combined with semantic routines written in the SIL called C (developed by Dennis Ritchie) running under the UNIX operating system. Significant portions of the LIL compiler were subsequently rewritten in LIL to prove its effectiveness, and the resulting language was used to help write a small PDP-11/10 operating system at Bell Labs. The compiler is one-pass and produces object code that may be freely intermixed with that of C or the UNIX assembler.
In its final form LIL probably most closely resembles Wirth's PL/360, in that it is a structured, machine dependent language with data types. Its control flow syntax, however, is modeled closely after that of C, for the simple reason that C was successful and there was no good excuse for introducing a new notation. A number of other ideas were carried over from an earlier (unstructured) language written by A. G. Fraser and P. D. Jensen for the GTE Tempo I.
What makes LIL unique, and uniquely suited for minicomputer programming, is the way its various design features work together to do what is wanted without also doing many things unwanted. But to understand the language as a whole, it is necessary first to understand its parts. Here are it's principal concepts, taken separately;
The data types of the
language correspond rather directly to the addressing modes of
the PDP-11. (For another machine, a different set of modes is
defined.) This there are the bases types
register (R
),
memory reference (M
), and
immediate (I
), and the derived types
indexed (I[R]
or [M]R
),
autoincrement ([R]++
) and
autodecrement (--[R]
).
The latter two modes support pushdown stacks and array walking in the
PDP-11. Any of the above can further be made
indirect ([I[R]]
, [M]
, [[R]]
, etc.),
or of type byte.
Arithmetic expressions describe the data manipulation code to be generated. Thus
R3+X;
generates an instruction that adds X to R3. This need not be an
add
instruction; If X is known at compile time to have
the value +1, an inc
is generated; if X is -1, a
dec
is used. Similarly,
R3=X;
generates a clr
if X is zero, rather than a
mov
. Such decisions ensure that the best instruction is
used whenever there is a choice, thereby encouraging the use of
parameters in place of flags with ``magic'' values like zero and one.
It means that the user needs to know only a handful of operators
instead of a page full of opcodes, a service that would be even more
valuable on more irregular machines such as the IBM 360.
The compiler will not, however, accept an arbitrary combination of operands s with an operator. While it is not a rigorous rule, it is generally true that at most one instruction will ever be generated for each infix operator. That way the users is assured that his code will not puff up unmanageably in translation, so he can remain aware of the limitations of the machine and still have a uniform notation for describing what is possible. (The ``at most'' part of the rule leaves the door open for selective optimization).
An interesting rule is applied to multiple operator expressions, One cannot say
R3:=X+Y+Z;
since that implies that the right hand side is computed someplace else, then moved into R3. At the lowest level there is no ``someplace else.'' So all operators including the move operators = and ->, are given equal weight; an expression is evaluated strictly left to right using the leftmost operator repeatedly. Thus
R3=X+Y+Z;
is the same as
R3=X;R3+Y;R3+Z;
and long sequences of code invoking a single accumulation can be written naturally on one line
R0=X+Y[R1=i**1] + (R4=Z) -> W
Note the use of subexpressions inside brackets and parentheses to alter the left to right flow long enough to set up other operands. This line is equivalent to
R0-X;R1=i;R1**1; # shift R1 left one R0+Y[R1];R4=Z; R0+RY;R0->W # shift same as W=R0
Conditional expressions determine the conditional branches to be generated in control flow statements. These can be relational
if (R0<0)R2=-R2; # negate R2 if<0
or present conditions
if (carry){msp+1;carries+1;}
or expressions
if (i<MAXI && control>0) control+1;
The conditional operators && and || bind in the usual manner but guarantee strict left to right testing with no unnecessary (or unwanted) evaluation. Thus
while(0<-R0&&R0<=MAXI&&A[R0]~= target) R0+1;
will never reference A[R0]
if R0 is out of range.
The control flow statements each control a single statement, but as
can be seen above that ``statement'' may be an arbitrary reference in
braces. if
...else
is provided as well as
loops of all flavors (tests at top, bottom, or in the middle). There
is a goto
, but labels are introduced only in conjunction
with groups;
linecount{0;} doread{R0=READCODE;SYSREAD:RTS:}
This encourages one to think through the scope of code headed by a label, whether it be a data area, a subroutine, or even a block of code jumped to. Note, by the way, how constants are merely specified in line, so that special instructions need no special treatment.
One final feature worth particular attention is the compile time expression. Enclosing an expression in double quotes instructs the compiler to do the arithmetic right away instead of generating run-time code. Thus
R0="A-2"[R1];
generates
mov A-2(R1),R0
and not a subtract instruction in addition to the move. Conditional expressions can even be used at compile time to conditionally generate code:
if ("MACHINE<35") R0**1**1**1; else R0**3;
Only one of the two lines is actually compiled, and no branches are generated.
LIL makes a limited attempt to optimize the code produced, principally in conjunction with tests and branches, since that is the only code that it tries seriously to control. It will pick the best of a choice of instructions or addressing modes, but it would never change the three single shifts in the last example into one triple shift. It assumes in general that the user knows what he is doing and does not want to be protected from himself.
Some optimization is performed just to encourage more uniform notation and to avoid the need for extra operators. So
if (a>0)
must generate a tst
but
if (a+b>0)
need not, since the condition code is known to be set by the
immediately preceding add
. And, by the way,
if (>)
is legal and assumes the condition code is properly set.
Conditional branches in the PDP-11 can only be specified to a 512-word region around the current location; beyond that a conditional jump (of the opposite sense) over a two word jump must be used. LIL maintains a 256-word ``delay line'' for the emitted code to detect as many short branches as possible. (It does not catch them all.) Beyond that, it strives to produce the most economical set of branches for a given test and eliminates most of the branches to branches that occur in structured programming. Often the code produced is shorter than a hand coder would dare produce, because of those savings.
Optimization, then is basically of the ``peephole'' type only. This seems to be all that is necessary or desirable at the systems level.
The use of LIL makes for very readable code that is much easier to write, maintain and modify than assembly language. Despite its considerable power in some dimensions it is indeed a little language -- the conventions can be summarized in a few coherent rules and the compiler itself comprises less than a thousand lines of code. More important, it requires no runtime support and permits the user to write code at a high level with no sacrifice in space or speed. For many minicomputer and microcomputer software development projects it appears to be well suited.
LIL is, however, a failure.
Its stiffest competition at Bell Labs is the language C, which is higher level, and machine independent. Every time it looked like C was too expensive to use for a particular project, LIL was considered. But almost every time, it proved easier (and more rewarding) to improve C, or its runtime support, or the hardware, than to invest time in yet another language.
LIL is superior to assembly code, to be sure, but there are few remaining crannies where C has not penetrated. Since C compiles to assembly language, why not just use the assembler to handle the few hundred lines of intractable code? It has to be around anyway.
Why not indeed? A machine independent language is always superior -- even for writing machine dependent code (it's easier to find trained programmers) -- so long as the overhead can be endured. Its is clear now that writing straightforward code and then measuring it is the formula for the best end product. At worst there will be 5-15 per cent overhead, which is seldom critical. Once system writers become mature enough to recognize this basic truth, they gravitate naturally toward machine independent SILs.
If there is a role at all for languages like LIL, it will be in microcomputers and very small minicomputer configurations. Not so much because the need is more legitimate but because of the desirability of a sil such as C will be more slowly recognized. Otherwise it looks like the little implementation language is an idea whose time as come -- and gone.