LET'S BUILD A COMPILER!
By
Jack W. Crenshaw, Ph.D.
27 August 1989
Part XIII: PROCEDURES
*****************************************************************
* *
* COPYRIGHT NOTICE *
* *
* Copyright (C) 1989 Jack W. Crenshaw. All rights reserved. *
* *
*****************************************************************
INTRODUCTION
At last we get to the good part!
At this point we've studied almost all the basic features of
compilers and parsing. We have learned how to translate
arithmetic expressions, Boolean expressions, control constructs,
data declarations, and I/O statements. We have defined a
language, TINY 1.3, that embodies all these features, and we have
written a rudimentary compiler that can translate them. By
adding some file I/O we could indeed have a working compiler that
could produce executable object files from programs written in
TINY. With such a compiler, we could write simple programs that
could read integer data, perform calculations with it, and output
the results.
That's nice, but what we have is still only a toy language. We
can't read or write even a single character of text, and we still
don't have procedures.
It's the features to be discussed in the next couple of
installments that separate the men from the toys, so to speak.
"Real" languages have more than one data type, and they support
procedure calls. More than any others, it's these two features
that give a language much of its character and personality. Once
we have provided for them, our languages, TINY and its
successors, will cease to become toys and will take on the
character of real languages, suitable for serious programming
jobs.
For several installments now, I've been promising you sessions on
these two important subjects. Each time, other issues came up
that required me to digress and deal with them. Finally, we've
been able to put all those issues to rest and can get on with the
mainstream of things. In this installment, I'll cover
procedures. Next time, we'll talk about the basic data types.
ONE LAST DIGRESSION
This has been an extraordinarily difficult installment for me to
write. The reason has nothing to do with the subject itself ...
I've known what I wanted to say for some time, and in fact I
presented most of this at Software Development '89, back in
February. It has more to do with the approach. Let me explain.
When I first began this series, I told you that we would use
several "tricks" to make things easy, and to let us learn the
concepts without getting too bogged down in the details. Among
these tricks was the idea of looking at individual pieces of a
compiler at a time, i.e. performing experiments using the Cradle
as a base. When we studied expressions, for example, we dealt
with only that part of compiler theory. When we studied control
structures, we wrote a different program, still based on the
Cradle, to do that part. We only incorporated these concepts into
a complete language fairly recently. These techniques have served
us very well indeed, and led us to the development of a compiler
for TINY version 1.3.
When I first began this session, I tried to build upon what we
had already done, and just add the new features to the existing
compiler. That turned out to be a little awkward and tricky ...
much too much to suit me.
I finally figured out why. In this series of experiments, I had
abandoned the very useful techniques that had allowed us to get
here, and without meaning to I had switched over into a new
method of working, that involved incremental changes to the full
TINY compiler.
You need to understand that what we are doing here is a little
unique. There have been a number of articles, such as the Small
C articles by Cain and Hendrix, that presented finished compilers
for one language or another. This is different. In this series
of tutorials, you are watching me design and implement both a
language and a compiler, in real time.
In the experiments that I've been doing in preparation for this
article, I was trying to inject the changes into the TINY
compiler in such a way that, at every step, we still had a real,
working compiler. In other words, I was attempting an
incremental enhancement of the language and its compiler, while
at the same time explaining to you what I was doing.
That's a tough act to pull off! I finally realized that it was
dumb to try. Having gotten this far using the idea of small
experiments based on single-character tokens and simple,
special-purpose programs, I had abandoned them in favor of
working with the full compiler. It wasn't working.
So we're going to go back to our roots, so to speak. In this
installment and the next, I'll be using single-character tokens
again as we study the concepts of procedures, unfettered by the
other baggage that we have accumulated in the previous sessions.
As a matter of fact, I won't even attempt, at the end of this
session, to merge the constructs into the TINY compiler. We'll
save that for later.
After all this time, you don't need more buildup than that, so
let's waste no more time and dive right in.
THE BASICS
All modern CPU's provide direct support for procedure calls, and
the 68000 is no exception. For the 68000, the call is a BSR
(PC-relative version) or JSR, and the return is RTS. All we have
to do is to arrange for the compiler to issue these commands at
the proper place.
Actually, there are really THREE things we have to address. One
of them is the call/return mechanism. The second is the
mechanism for DEFINING the procedure in the first place. And,
finally, there is the issue of passing parameters to the called
procedure. None of these things are really very difficult, and
we can of course borrow heavily on what people have done in other
languages ... there's no need to reinvent the wheel here. Of the
three issues, that of parameter passing will occupy most of our
attention, simply because there are so many options available.
A BASIS FOR EXPERIMENTS
As always, we will need some software to serve as a basis for
what we are doing. We don't need the full TINY compiler, but we
do need enough of a program so that some of the other constructs
are present. Specifically, we need at least to be able to handle
statements of some sort, and data declarations.
The program shown below is that basis. It's a vestigial form of
TINY, with single-character tokens. It has data declarations,
but only in their simplest form ... no lists or initializers. It
has assignment statements, but only of the kind
<ident> = <ident>
In other words, the only legal expression is a single variable
name. There are no control constructs ... the only legal
statement is the assignment.
Most of the program is just the standard Cradle routines. I've
shown the whole thing here, just to make sure we're all starting
from the same point:
{--------------------------------------------------------------}
program Calls;
{--------------------------------------------------------------}
{ Constant Declarations }
const TAB = ^I;
CR = ^M;
LF = ^J;
{--------------------------------------------------------------}
{ Variable Declarations }
var Look: char; { Lookahead Character }
var ST: Array['A'..'Z'] of char;
{--------------------------------------------------------------}
{ Read New Character From Input Stream }
procedure GetChar;
begin
Read(Look);
end;
{--------------------------------------------------------------}
{ Report an Error }
procedure Error(s: string);
begin
WriteLn;
WriteLn(^G, 'Error: ', s, '.');
end;
{--------------------------------------------------------------}
{ Report Error and Halt }
procedure Abort(s: string);
begin
Error(s);
Halt;
end;
{--------------------------------------------------------------}
{ Report What Was Expected }
procedure Expected(s: string);
begin
Abort(s + ' Expected');
end;
{--------------------------------------------------------------}
{ Report an Undefined Identifier }
procedure Undefined(n: string);
begin
Abort('Undefined Identifier ' + n);
end;
{--------------------------------------------------------------}
{ Report an Duplicate Identifier }
procedure Duplicate(n: string);
begin
Abort('Duplicate Identifier ' + n);
end;
{--------------------------------------------------------------}
{ Get Type of Symbol }
function TypeOf(n: char): char;
begin
TypeOf := ST[n];
end;
{--------------------------------------------------------------}
{ Look for Symbol in Table }
function InTable(n: char): Boolean;
begin
InTable := ST[n] <> ' ';
end;
{--------------------------------------------------------------}
{ Add a New Symbol to Table }
procedure AddEntry(Name, T: char);
begin
if Intable(Name) then Duplicate(Name);
ST[Name] := T;
end;
{--------------------------------------------------------------}
{ Check an Entry to Make Sure It's a Variable }
procedure CheckVar(Name: char);
begin
if not InTable(Name) then Undefined(Name);
if TypeOf(Name) <> 'v' then Abort(Name + ' is not a
variable');
end;
{--------------------------------------------------------------}
{ Recognize an Alpha Character }
function IsAlpha(c: char): boolean;
begin
IsAlpha := upcase(c) in ['A'..'Z'];
end;
{--------------------------------------------------------------}
{ Recognize a Decimal Digit }
function IsDigit(c: char): boolean;
begin
IsDigit := c in ['0'..'9'];
end;
{--------------------------------------------------------------}
{ Recognize an AlphaNumeric Character }
function IsAlNum(c: char): boolean;
begin
IsAlNum := IsAlpha(c) or IsDigit(c);
end;
{--------------------------------------------------------------}
{ Recognize an Addop }
function IsAddop(c: char): boolean;
begin
IsAddop := c in ['+', '-'];
end;
{--------------------------------------------------------------}
{ Recognize a Mulop }
function IsMulop(c: char): boolean;
begin
IsMulop := c in ['*', '/'];
end;
{--------------------------------------------------------------}
{ Recognize a Boolean Orop }
function IsOrop(c: char): boolean;
begin
IsOrop := c in ['|', '~'];
end;
{--------------------------------------------------------------}
{ Recognize a Relop }
function IsRelop(c: char): boolean;
begin
IsRelop := c in ['=', '#', '<', '>'];
end;
{--------------------------------------------------------------}
{ Recognize White Space }
function IsWhite(c: char): boolean;
begin
IsWhite := c in [' ', TAB];
end;
{--------------------------------------------------------------}
{ Skip Over Leading White Space }
procedure SkipWhite;
begin
while IsWhite(Look) do
GetChar;
end;
{--------------------------------------------------------------}
{ Skip Over an End-of-Line }
procedure Fin;
begin
if Look = CR then begin
GetChar;
if Look = LF then
GetChar;
end;
end;
{--------------------------------------------------------------}
{ Match a Specific Input Character }
procedure Match(x: char);
begin
if Look = x then GetChar
else Expected('''' + x + '''');
SkipWhite;
end;
{--------------------------------------------------------------}
{ Get an Identifier }
function GetName: char;
begin
if not IsAlpha(Look) then Expected('Name');
GetName := UpCase(Look);
GetChar;
SkipWhite;
end;
{--------------------------------------------------------------}
{ Get a Number }
function GetNum: char;
begin
if not IsDigit(Look) then Expected('Integer');
GetNum := Look;
GetChar;
SkipWhite;
end;
{--------------------------------------------------------------}
{ Output a String with Tab }
procedure Emit(s: string);
begin
Write(TAB, s);
end;
{--------------------------------------------------------------}
{ Output a String with Tab and CRLF }
procedure EmitLn(s: string);
begin
Emit(s);
WriteLn;
end;
{--------------------------------------------------------------}
{ Post a Label To Output }
procedure PostLabel(L: string);
begin
WriteLn(L, ':');
end;
{--------------------------------------------------------------}
{ Load a Variable to the Primary Register }
procedure LoadVar(Name: char);
begin
CheckVar(Name);
EmitLn('MOVE ' + Name + '(PC),D0');
end;
{--------------------------------------------------------------}
{ Store the Primary Register }
procedure StoreVar(Name: char);
begin
CheckVar(Name);
EmitLn('LEA ' + Name + '(PC),A0');
EmitLn('MOVE D0,(A0)')
end;
{--------------------------------------------------------------}
{ Initialize }
procedure Init;
var i: char;
begin
GetChar;
SkipWhite;
for i := 'A' to 'Z' do
ST[i] := ' ';
end;
{--------------------------------------------------------------}
{ Parse and Translate an Expression }
{ Vestigial Version }
procedure Expression;
begin
LoadVar(GetName);
end;
{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }
procedure Assignment;
var Name: char;
begin
Name := GetName;
Match('=');
Expression;
StoreVar(Name);
end;
{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }
procedure DoBlock;
begin
while not(Look in ['e']) do begin
Assignment;
Fin;
end;
end;
{--------------------------------------------------------------}
{ Parse and Translate a Begin-Block }
procedure BeginBlock;
begin
Match('b');
Fin;
DoBlock;
Match('e');
Fin;
end;
{--------------------------------------------------------------}
{ Allocate Storage for a Variable }
procedure Alloc(N: char);
begin
if InTable(N) then Duplicate(N);
ST[N] := 'v';
WriteLn(N, ':', TAB, 'DC 0');
end;
{--------------------------------------------------------------}
{ Parse and Translate a Data Declaration }
procedure Decl;
var Name: char;
begin
Match('v');
Alloc(GetName);
end;
{--------------------------------------------------------------}
{ Parse and Translate Global Declarations }
procedure TopDecls;
begin
while Look <> 'b' do begin
case Look of
'v': Decl;
else Abort('Unrecognized Keyword ' + Look);
end;
Fin;
end;
end;
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
TopDecls;
BeginBlock;
end.
{--------------------------------------------------------------}
Note that we DO have a symbol table, and there is logic to check
a variable name to make sure it's a legal one. It's also worth
noting that I have included the code you've seen before to
provide for white space and newlines. Finally, note that the
main program is delimited, as usual, by BEGIN-END brackets.
Once you've copied the program to Turbo, the first step is to
compile it and make sure it works. Give it a few declarations,
and then a begin-block. Try something like:
va (for VAR A)
vb (for VAR B)
vc (for VAR C)
b (for BEGIN)
a=b
b=c
e. (for END.)
As usual, you should also make some deliberate errors, and verify
that the program catches them correctly.
DECLARING A PROCEDURE
If you're satisfied that our little program works, then it's time
to deal with the procedures. Since we haven't talked about
parameters yet, we'll begin by considering only procedures that
have no parameter lists.
As a start, let's consider a simple program with a procedure, and
think about the code we'd like to see generated for it:
PROGRAM FOO;
.
.
PROCEDURE BAR; BAR:
BEGIN .
. .
. .
END; RTS
BEGIN { MAIN PROGRAM } MAIN:
. .
. .
FOO; BSR BAR
. .
. .
END. END MAIN
Here I've shown the high-order language constructs on the left,
and the desired assembler code on the right. The first thing to
notice is that we certainly don't have much code to generate
here! For the great bulk of both the procedure and the main
program, our existing constructs take care of the code to be
generated.
The key to dealing with the body of the procedure is to recognize
that although a procedure may be quite long, declaring it is
really no different than declaring a variable. It's just one
more kind of declaration. We can write the BNF:
<declaration> ::= <data decl> | <procedure>
This means that it should be easy to modify TopDecl to deal with
procedures. What about the syntax of a procedure? Well, here's
a suggested syntax, which is essentially that of Pascal:
<procedure> ::= PROCEDURE <ident> <begin-block>
There is practically no code generation required, other than that
generated within the begin-block. We need only emit a label at
the beginning of the procedure, and an RTS at the end.
Here's the required code:
{--------------------------------------------------------------}
{ Parse and Translate a Procedure Declaration }
procedure DoProc;
var N: char;
begin
Match('p');
N := GetName;
Fin;
if InTable(N) then Duplicate(N);
ST[N] := 'p';
PostLabel(N);
BeginBlock;
Return;
end;
{--------------------------------------------------------------}
Note that I've added a new code generation routine, Return, which
merely emits an RTS instruction. The creation of that routine is
"left as an exercise for the student."
To finish this version, add the following line within the Case
statement in DoBlock:
'p': DoProc;
I should mention that this structure for declarations, and the
BNF that drives it, differs from standard Pascal. In the Jensen
& Wirth definition of Pascal, variable declarations, in fact ALL
kinds of declarations, must appear in a specific sequence, i.e.
labels, constants, types, variables, procedures, and main
program. To follow such a scheme, we should separate the two
declarations, and have code in the main program something like
DoVars;
DoProcs;
DoMain;
However, most implementations of Pascal, including Turbo, don't
require that order and let you freely mix up the various
declarations, as long as you still don't try to refer to
something before it's declared. Although it may be more
aesthetically pleasing to declare all the global variables at the
top of the program, it certainly doesn't do any HARM to allow
them to be sprinkled around. In fact, it may do some GOOD, in
the sense that it gives you the opportunity to do a little
rudimentary information hiding. Variables that should be
accessed only by the main program, for example, can be declared
just before it and will thus be inaccessible by the procedures.
OK, try this new version out. Note that we can declare as many
procedures as we choose (as long as we don't run out of single-
character names!), and the labels and RTS's all come out in the
right places.
It's worth noting here that I do _NOT_ allow for nested
procedures. In TINY, all procedures must be declared at the
global level, the same as in C. There has been quite a
discussion about this point in the Computer Language Forum of
CompuServe. It turns out that there is a significant penalty in
complexity that must be paid for the luxury of nested procedures.
What's more, this penalty gets paid at RUN TIME, because extra
code must be added and executed every time a procedure is called.
I also happen to believe that nesting is not a good idea, simply
on the grounds that I have seen too many abuses of the feature.
Before going on to the next step, it's also worth noting that the
"main program" as it stands is incomplete, since it doesn't have
the label and END statement. Let's fix that little oversight:
{--------------------------------------------------------------}
{ Parse and Translate a Main Program }
procedure DoMain;
begin
Match('b');
Fin;
Prolog;
DoBlock;
Epilog;
end;
{--------------------------------------------------------------}