This document has been made available by courtesy of Bell Labs/Lucent Technologies, and the author.
It has been re-typed by Phil Budne, who is to blame for any transcription errors.
Date: June 19, 1974
Bell Labs Technical Memo: TM-74-1352-8
Author: P.J. Plauger
LIL is a Little Implementation Language for PDP-11 computers, suitable for writing system level code or in any situation where assembly-language coding is traditionally called for. A LIL compiler is available for use under the UNIX operating system. The object code produced is compatible with, and may be freely intermixed with, that produced by the UNIX assembler, Fortran, or C compiler.
This document is a reference manual for the LIL language. A knowledge of machine level coding on the PDP-11 is assumed, and some knowledge of UNIX operating procedures is required to use the compiler. A tutorial introduction to programming in LIL is provided in TM 74-1352-6.
LIL is a Little Implementation Language for the PDP-11 family of computers. Is is little in the important sense that there are very few constructs and combining rules, and fewer exceptions. Its is an implementation language, which means that it is possible to express any of the code needed to implement an operating system including I/O drivers, interrupt routines, and transfer tables. Nevertheless it remains a moderately high-level language, with the improved readability and reliability that one expects from avoiding assembly-language level coding.
This document is a reference manual for LIL. It provides a comprehensive, and reasonably precise, description of the language, but makes no attempt to introduce concepts in a tutorial order.
The LIL compiler is invoked by
lc args
where args are sources files if they end in .l, flags if they begin with -, and object files otherwise. If more than one source file is specified, or the -c flag is present, object files for each file whose name ends in .l is written on a file with a similar name, only ending in .o.
Unless -c (compile only) or -P (preprocess only) is present, all object files are presented to the loader for binding, along with the standard C libraries. Profiling may be invoked with -p (see cc (I), prof (I), ld (I), monitor (III) in the UNIX manual).
If the first character of a LIL source file is a # a string-oriented preprocessor is invoked. This scans the source for lines beginning with a # and recognizes the following non-empty command lines:
The -P option on the invocation line may be used to cause preprocessor output for each file whose name ends in .l to be written on a file with a similar name, only ending in .i. PL/I style comments are permitted only preprocessor command lines.
The source file is treated as a series of tokens, possibly separated by white space, which consists of one or more blanks, tabs, newlines, and/or comments. Comments are arbitrary strings beginning with a % and ending with the next newline. Except for its important function of separating tokens, white space is ignored, and so should be used freely to emphasize the logical structure of the code.
Tokens are one of the following strings of characters:
\0 becomes | 000 | NUL |
\a | 006 | ACK |
\e | 004 | EOT |
\n | 012 | NL |
\p | 033 | PRE (ESC) |
\r | 015 | CR |
\t | 011 | TAB |
All other characters following \ are mapped into themselves (including \, newline and single quote).
Other ascii sequences not shown here will produce a diagnostic message and be skipped.
Certain identifiers are predefined as keywords, i.e. tokens with special syntactic meaning. These identifiers are:
break goto rts byte if sizeof continue jsr sys do local while else mem word extern reg
Keywords may not be redefined in local regions, unlike other identifiers.
It is possible, however to remove an identifier from the keyword class, once and for all, by means of the escape \. Any occurrence of a keyword immediately preceded by a \ is treated as a reference to a hitherto undefined identifier of that name. This feature should only be used when absolutely necessary, as when interfacing to a foreign language routine, and then only near the end of a LIL source file.
With this understanding, the term identifier will be used hereafter to man any non-keyword, or a keywords than has been properly escaped.
Identifiers,, octal numbers, decimal numbers, and character strings are the simplest forms of locatives, which are used to specify the location of operands for generating machine instructions, or which are the actual operands of compile-time operations. A locative has a value, which may be defined or or undefined, absolute or relocatable with respect to some bias. Each locative also has a type, which corresponds closely to a valid PDP-11 addressing mode or specifies a condition, a size in bytes, and the optional attributes external and byte. Octal numbers, decimal numbers, and short character strings, for instance, are by default type immediate, and have values which are defined and absolute.
Unary operators are provided to alter type and other attributes. In addition, a number of predefined identifiers are provided from which type can be inherited. The basic types are:
More elaborate types are built from these primitives, or from expressions whose result is register, memory, or immediate. Square brackets are used to indicate indexing or indirection; in conjunction with additional unary operators the remaining types are specified:
The final type which locatives may assume is condition, which specifies what state of the condition code must obtain for a given test to succeed. Such locatives are always defined absolute, and have mystical values which closely resemble the appropriate op codes for conditional branches. Conditions usually arise from relational expressions or tests and seldom need to be dealt with explicitly.
LIL begins each compilation with a number of predefined identifiers which are often of use in writing code.
All eight registers may be referenced by their conventional names:
r0 r4 r1 r5 r2 sp r3 pc
The symbol dot `.' is updated at the end of each statement to point to the next location at which code is to be generated. Similarly, setting dot to a new values causes loading to be diverted to the place specified. The location counters for the bss, text, and data sections (see ld(I) in the UNIX manual) are named .bss, .text and .data, respectively. In addition, the absolute location counter .abs is provided for convenience, to remember the latest absolute value dot may have assumed. All of these locatives are of type memory.
Conditions corresponding to testing just one of the condition bits may be specified using the predefined:
carry minus oflow zero
These correspond to the C, N, V, and Z bits, respectively. Unconditional tests may be written with
true false
in an obvious fashion.
All of these predefined identifiers may be superseded by local definitions, making the original version unreachable but still not interfering with their maintenance by the compiler. There is one additional predefined identifier, however, which serves as a flag; its latest edition is always consulted by LIL. The is .temp, which is used to specify whether or not the top element of the stack [sp] is to be used to hold an argument on function calls. If the standard function call notation is used, therefore, this must be considered a reserved identifier at all times.
Data manipulation instructions are generated by writing expressions involving locatives and binary infix operators:
r0 = x; % causes r0 to be loaded from x r0 + 1; % causes r0 to be incremented r0 = x + 1; % load then increment r0
The interpretation of the last statement differs markedly from that used in most languages. All infix operators are of equal precedence in LIL, evaluation proceed strictly from left to right. The left operand is retained for use with each subsequent operator, permitting the shorthands shown in the example. (The third line is equivalent to the first two.)
Parentheses may be used to modify this strictly left to right order of evaluation:
r0 = (x + 1); % increment x then move into r0
The rule is: evaluate the left operand, performing any calculations inside parentheses or brackets, then do the same for the right operand, then perform the operation specified between the two operands. The leftmost locative inside parentheses becomes the locative to use as operand.
All unary operators bind tighter than binary operators, as does the specification of indexing or indirect addressing with square brackets. Individual unary and binary operators will be described in a later section.
To perform arithmetic with symbols without generating code, and to define identifiers, one writes what look like normal expressions, surrounded by double quotes. Thus
"x = r0"; % defines x "r0 + 1" = x; % loads x into r1
Compile time expressions are actually evaluated in a completely different context than run-time expressions, using the values of the locatives themselves are operands. Most operators have compile-time definitions analogous to their meaning at run time. Unless it contains an explicit assignment operator (= or ->), a compile-time expression is computed without changing the value of any identifiers.
Compile-time binary operators will also be tabulated later on.
An expression, terminated by a semicolon, is one of the simplest statements permitted in LIL. An immediate or memory locative standing alone, followed by a semicolon, is treated as a request to generate one word of code whose value is the value of the locative. Character strings of any length maybe used in this context to generate a series of words; the last is padded with a null byte if the number of bytes is off. It is also permissible to write just a semicolon to specify a null statement.
The simplest conditional statement is:
if (test) statement;
test may be any locative, or a conditional expression containing parentheses, the prefix not operator ~, the infix and operator &&, and/or the infix or operator ||. Conditional expressions are evaluated left to right, following the same rules as for normal expressions, except that the right operand of && will not be evaluated when the left operand is false. Moreover, && binds tighter than || for the sake of determining the truth value of the entire test. If a locative is not a condition, is is replaced by that condition obtained by testing if the locative is nonzero.
The not operator, of course reverses the sense of any condition. Care must be taken, however, to ensure that its operand is unmistakably a condition, for ~ applied to an immediate value ones-complements its value.
The ifcauses the controlled statement to be skipped if the test is not met, otherwise the statement is obeyed. To specify an alternative action when the test fails, write
if (test) statement; else statement;
The else part is skipped over if the first statement is executed. The first statement may be another if, in which case there is a possible ambiguity in pairing subsequent else clauses. The ambiguity is arbitrarily removed by binding each else clause to the innermost `unelsed' if.
Loops are specified by
while (test) statement;
if the test should occur at the top, or by
do statement; while (test) statement;
if the test should occur at the end, if the test should occur somewhere in the middle of the loop or at the end (second statement is null). Both forms exist when the test fails and contain an implicit branch back to the beginning from the end of the form.
Conditional and loop statements can be made much more powerful by having them control groups. A group is one or more statements in succession surrounded by braces:
if (r0 < 0) % if r0 is negative {r0 =- r0; % negate r0 r1 + 1; } % and count
No semicolon is used after a group, nor is the semicolon left off the last statement in the group. Now the if will skip two statements if the test fails. Groups may, of course, contain other groups to any depth of nesting.
A second form is the labeled group:
label{ statement; statement; ...; }
where label is an identifier. In addition to grouping statements for control purposes, this form also delimits a local region, in which identifiers may be redefined without clashing with other usage, and causes the label to be defined as type memory with the value of the location counter at the start of the group (although the definition actually occurs at the close of the group).
Two control statements work in conjunction with loops or labeled group. The forms
break; continue;
cause control to be transferred out of the innermost loop or labeled group, or back to its top, respectively. And the forms
break label; continue label;
perform similarly, except that all containing loops or groups whose labels do not match the label in the statement are ignored. Instead, control in transferred out of, or t, or to the top of, the innermost containing labeled group whose label matches.
A C compatible function call is provided:
fun(); % no arguments fun(arg, arg, arg, ...);
where arg is any locative or expression except a condition. Any argument expressions are evaluated left to right, then the arguments are moved onto the stack right to left. The function is called with a
jsr pc,*$fun
instruction and the arguments are popped of the stack.
  The latest edition of the flag variable .temp is consulted on each call with arguments to determine whether to use the top element of the stack [sp] to hold the rightmost argument. The flag is initially zero, indicating that [sp] is not to be used.
  Each function call becomes a locative of type register and value zero.
  Control can be transferred to an arbitrary destination by
goto loc;
where loc is any locative that may be used with a jmp instruction (a br will be generated instead, wherever possible). It should not be necessary to use goto except in unusual circumstances.
LIL begins loading in the text section, but permits code to be generated also in the data section, by redefining dot, provided no attempt is made to back up over generated code.
". = .data"; % switch to data area x{-1; } y{0; 0; 0; } ". = .text"; % switch back
By the rules for labeled groups, however, such a temporary diversion may also be written as
". = .data"{ x{-1; } y{0; 0; 0; } }
provided x and y have been previously referenced or declared.
Dot can be set to any absolute value, or to point anywhere in the bss section, so long as not attempt is made to generate code there.
UNIX treats all undefined external references with nonzero size as labeled common blocks, and refuses to satisfy such references with text symbols. LIL will set the size attribute to zero on any identifier used in a function call. Nonstandard entries which are not defined in the current file, however, should be declared by
"extern entry()";
Unary operators have the same meaning at compile-time as at run-time, for none of them generate code (except =). Instead, they modify references to symbols or expressions, by changing their type, value, or other attributes. For convenience the following symbols are defined:
The unary operators are:
The type of x is left unchanged in the following:
x
and y
. At most one
of the two may be relocatable or undefined.
x
and y
.
If y
is relocatable or undefined, it must have the same bias
as x
.
If x and y are not both byte or both word, then one of them must be a non-byte register or immediate. The instruction generated will then by byte mode if either operand is byte. The following is a metalinguatic description of the decisions made by LIL in producing code for each operator.
x>=y, x>y, x==y, x<y, x<=y, x~=y [compare signed] x> >=y, x> >y, x< <y, x< <=y [compare unsigned] if (y==0 && c set on x) do nothing else if (y==0) tst(b) x else if (x==0) tst(b) y else cmp(b) x,y x=y, y->x if (y==0) clr(b) x else if (x=={carry oflow zero minus} && y=={true false}) setx or secx else if (y==minus) sxt x else mov(b) y,x x=-y if (y==0) clr(b) x else if (x==y) neg(b) x x=~y if (x=={carry oflow zero minus} && y=={true false}) secx or setx else if (x==y) com(b) x x+y if (y==1) inc(b) x else if (y==carry) adc(b) x else add y, x x-y if (y==1) dec(b) x else if (y==carry) sbc(b) x else sub y, x x|y bis(b) y,x x&n bic(b) $!n,x x&~y bic(b) y, x x~~r xor r,x r*y mul y,r r/y div y,r x?y if (y==0) tst(b) x else cmp(b) x,y x?&y bit(b) x,y x<>n if (n==1) rol(b) x else if (n== -1) ror(b) x x<*>n if (n==8) swab x else if (x odd reg) ashc n,x x**n if (n==1) asl(b) x else if (n== -1) asr(b) x else if (x a register) ash n,x r***n ashc n,r
Both the sizeof and ||| operators are also defined at run-time and have the same effect as at compile-time. They do not directly cause code to be generated.
The compiler is aware of what is happening to the condition code most of the time. It knows, for instance, that swab and function calls do not leave the code in a state implied by the language, and so will generate a tst if the result of either is to be used in a test. The PDP-11 is somewhat whimsical about the setting of the carry bit, however, and LIL makes no real attempt to second guess the machine. (It makes a difference whether you add 1 or 2 to something, for instance.) It is always a good idea to be very careful when testing for special conditions.
If the test in any conditional or loop statement is unconditionally true of false, no code is compiled for the test. If the test is false, in fact, no code is compiled for the statement controlled by the test (the branch back to the top of a loop is also omitted). In the if-then form, one and only one of the two statements is compiled. Compile-time parameters, in conjunction with compile-time relational expressions, can thus be used to cause selective generation of code.
LIL descends from an implementation language for the GTE TEMPO-I, written by P.D. Jensen and A.G. Fraser. Compile-time notation and the rigorous left-to-right expression evaluation are carryovers. The current syntax is heavily influenced by that of the language C, designed by D.M. Ritchie; the preprocessor and invocation control are a direct steal.
All compilation decisions are made in one pass of a simple syntax-direct translator, written by the author. Control tables for the translator were produced by Steve Johnson's compiler-compiler YACC, working from a 50-production grammar. Doug Bayer wrote the last pass, which assembles code and tables into standard UNIX object format.
P.J. Plauger
MH-1352-PJP