Year

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

Month

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Index

February 1998

OT: Following in the footsteps of JCL Alex Oren
- OT: Following in the footsteps of JCL Nathan Yospe
- OT: Following in the footsteps of JCL Richard Woolcock
- OT: Following in the footsteps of JCL Chris Gray
- - OT: Following in the footsteps of JCL coder@ibm.net
- OT: Following in the footsteps of JCL Marc Eyrignoux
Ada? Andrew C.M. McClintock
- Ada? Cynbe ru Taren
- Ada? Jon Leonard
Monthly FAQ posting Koster, Raph
- Monthly FAQ posting Adam Wiggins
Monthly FAQ posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Chris Gray
- Monthly FAQ Posting Ling
- - Monthly FAQ Posting J C Lawrence
  - - Monthly FAQ Posting J C Lawrence
  - Monthly FAQ Posting Alex Oren
  - Monthly FAQ Posting Greg Miller
- Monthly FAQ Posting s001gmu@nova.wright.edu
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling Lo
- Monthly FAQ Posting Koster, Raph
- Monthly FAQ Posting Ling Lo
- - Monthly FAQ Posting Greg Miller
  - - Monthly FAQ Posting Marian Griffith
    - - Monthly FAQ Posting Greg Miller
- Monthly FAQ Posting Ling Lo
Databases Shawn Halpenny
- Databases coder@ibm.net
- Databases T. Alexander Popiel
- - Databases Greg Miller
  - - Databases Sellers, Michael
    - - Databases Greg Miller
- Databases Dream Weaver at home
- Databases Laurent Bossavit
- - Databases Kevin Littlejohn
  - - Databases J C Lawrence
OT: This is a test coder@ibm.net
- OT: This is a test Alex Oren
Clients and things [Was: OT: This is a test] Matt Chatterley
- Clients and things [Was: OT: This is a test] coder@ibm.net
- - Clients and things [Was: OT: This is a test] Matt Chatterley
MUD Development Digest Dr. Cat
DBs and Events Greg Munt
- DBs and Events Nathan Yospe
- - DBs and Events Greg Munt
  - - DBs and Events Nathan Yospe
    - - DBs and Events Felix A. Croes
      - DBs and Events Jon A. Lambert
      - DBs and Events coder@ibm.net
    - DBs and Events s001gmu@nova.wright.edu
- DBs and Events Jon A. Lambert
- DBs and Events coder@ibm.net
(subject missing) Ben Greear
META: Unsubscribed users dur to bounces coder@ibm.net
- META: Unsubscribed users dur to bounces Adam Wiggins
Source Code Release Greg Munt
- Source Code Release Ben Greear
- - Source Code Release Greg Munt
  - - Source Code Release Ben Greear
    - - Source Code Release Greg Munt
- Source Code Release Richard Woolcock
- - Source Code Release Ben Greear
  - Source Code Release Chris Gray
  - Source Code Release Greg Munt
  - - Source Code Release coder@ibm.net
    - Source Code Release Richard Woolcock
- Source Code Release Stephen Zepp
- Source Code Release Jon A. Lambert
- - Source Code Release Greg Munt
  - - Source Code Release Jon A. Lambert
    - - Source Code Release Greg Munt
      - Source Code Release Travis Casey
        Source Code Release Jon A. Lambert
    - Source Code Release Jon A. Lambert
[RESEARCH]MUD articles archive (fwd) Greg Munt
Socket programming (Was: The impact of the web on muds) Jon Leonard
- Socket programming (was: The impact of the web on muds) Vadim Tkachenko
- - Socket programming (was: The impact of the web on muds) Richard Woolcock
byte-code anyone? Chris Gray
- byte-code anyone? Jon Leonard
- - byte-code anyone? Chris Gray
  - - byte-code anyone? Jon Leonard
    - - byte-code anyone? Chris Gray
user-centered design (was Clients) Mike Sellers
OT: Linux g++ Greg Munt
- OT: Linux g++ Ben Greear
- - OT: Linux g++ coder@ibm.net
  - - OT: Linux g++ Shawn Halpenny
- OT: Linux g++ Chris Gray
OT: Clients Vadim Tkachenko
- OT: Clients Adam Wiggins
- OT: Clients coder@ibm.net
META: OS wars coder@ibm.net
- META: OS wars Mike Sellers
Clients Stephen Zepp
- Clients Ling
- Clients Travis Casey
- - Clients Katrina McClelan
  - - Clients Nathan F Yospe
    - - Clients Chris Gray
    - Clients coder@ibm.net
    - - Clients Katrina McClelan
      - Clients Travis Casey
        Clients Chris Gray
        Clients Vadim Tkachenko
        Clients J C Lawrence
- Clients Greg Munt
- - Clients Stephen Zepp
  - Clients Matt Chatterley
Moore's Law sucks (was: 3D graphics) Brandon J. Rickman
- Moore's Law sucks (was: 3D graphics) Adam Wiggins
- - Moore's Law sucks (was: 3D graphics) Chris Gray
  - Moore's Law sucks (was: 3D graphics) coder@ibm.net
  - Moore's Law sucks (was: 3D graphics) Brandon J. Rickman
- Moore's Law sucks (was: 3D graphics) Mike Sellers
- Moore's Law sucks (was: 3D graphics) Chris Gray
- Moore's Law sucks (was: 3D graphics) Ben Greear
- - Moore's Law sucks (was: 3D graphics) Ling
  - Moore's Law sucks (was: 3D graphics) Brandon J. Rickman
- Moore's Law sucks (was: 3D graphics) coder@ibm.net
- Moore's Law sucks (was: 3D graphics) Alex Oren
Version Control (was: DBs and Events) Vadim Tkachenko
- Version Control (was: DBs and Events) coder@ibm.net
- - Version Control (was: DBs and Events) Vadim Tkachenko
  - - Version Control (was: DBs and Events) coder@ibm.net
    - Version Control (was: DBs and Events) coder@ibm.net
    - - Version Control (was: DBs and Events) Jon A. Lambert
  - Version Control (was: DBs and Events) s001gmu@nova.wright.edu
  - Version Control (was: DBs and Events) Raph & Kristen Koster
  - - Version Control (was: DBs and Events) coder@ibm.net
    - - Version Control (was: DBs and Events) Felix A. Croes
Net protocols for MUDing (was: Moore's Law sucks) Jon Leonard
- Net protocols for MUDing (was: Moore's Law sucks) coder@ibm.net
- - Net protocols for MUDing (was: Moore's Law sucks) s001gmu@nova.wright.edu
  - Net protocols for MUDing (was: Moore's Law sucks) Vadim Tkachenko
  - Net protocols for MUDing (was: Moore's Law sucks) Chris Gray
  - - Net protocols for MUDing (was: Moore's Law sucks) Caliban Tiresias Darklock
    - - Net protocols for MUDing (was: Moore's Law sucks) J C Lawrence
    - Net protocols for MUDing (was: Moore's Law sucks) J C Lawrence
  - Net protocols for MUDing (was: Moore's Law sucks) Adam Wiggins
  - - Net protocols for MUDing (was: Moore's Law sucks) Chris Gray
    - - Net protocols for MUDing (was: Moore's Law sucks) Vadim Tkachenko
      - Net protocols for MUDing (was: Moore's Law sucks) Chris Gray
        Net protocols for MUDing (was: Moore's Law sucks) Vadim Tkachenko
        Net protocols for MUDing (was: Moore's Law sucks) Chris Gray
        Net protocols for MUDing (was: Moore's Law sucks) Adam Wiggins
        Net protocols for MUDing (was: Moore's Law sucks) Chris Gray
        Net protocols for MUDing (was: Moore's Law sucks) J C Lawrence
        Net protocols for MUDing (was: Moore's Law sucks) Chris Gray
        Net protocols for MUDing (was: Moore's Law sucks) Joel Dillon
        Net protocols for MUDing (was: Moore's Law sucks) Chris Gray
        Net protocols for MUDing (was: Moore's Law sucks) Adam Wiggins
        Net protocols for MUDing (was: Moore's Law sucks) Ben Greear
        Net protocols for MUDing (was: Moore's Law sucks) Adam Wiggins
    - Net protocols for MUDing (was: Moore's Law sucks) s001gmu@nova.wright.edu
- Net protocols for MUDing (was: Moore's Law sucks) Jon A. Lambert
- Net protocols for MUDing (was: Moore's Law sucks) J C Lawrence
VEIL (was: Clients) Brandon Gillespie
LDMs (large dynamic maps) was Unique items Mike Sellers
Describing the environment Stephen Zepp
- Describing the environment Richard Woolcock
Back on the list Niklas Elmqvist
- Back on the list Chris Gray
- Back on the list coder@ibm.net
Unique items The Eternal City
- Unique items coder@ibm.net
Position sorting Adam Wiggins
- Position sorting coder@ibm.net
Unique items coder@ibm.net
- Unique items Nathan F Yospe
BOOK: Myer's Silverlock coder@ibm.net
- BOOK: Myer's Silverlock Chris Gray
- BOOK: Myer's Silverlock Adam Wiggins
Dynamic Loading of Modules (was: Back on the list) Niklas Elmqvist
- Dynamic Loading of Modules (was: Back on the list) Vadim Tkachenko
- Dynamic Loading of Modules (was: Back on the list) J C Lawrence
- Dynamic Loading of Modules (was: Back on the list Jon A. Lambert
Net protocols for MUDing s001gmu@nova.wright.edu
- Net protocols for MUDing Stephen Zepp
- - Net protocols for MUDing Chris Gray
  - Net protocols for MUDing Adam Wiggins
  - Net protocols for MUDing J C Lawrence
  - - Net protocols for MUDing Shawn Halpenny
    - - Net protocols for MUDing J C Lawrence
    - Net protocols for MUDing Chris Gray
Dynamic Loading of Modules Niklas Elmqvist
Senses (was: The MLI Project) s001gmu@nova.wright.edu
bar-time (was The MLI Project) Mike Sellers
3D engines for MUDs (was: The MLI Project) Niklas Elmqvist
- 3D engines for MUDs (was: The MLI Project) J C Lawrence
- - 3D engines for MUDs (was: The MLI Project) Michael Hohensee
  - - 3D engines for MUDs (was: The MLI Project) Miroslav Silovic
    - - 3D engines for MUDs (was: The MLI Project) Michael Hohensee
Why not compile java into object code? Ben Greear
- Why not compile java into object code? Cynbe ru Taren
- Why not compile java into object code? Caliban Tiresias Darklock
- - Why not compile java into object code? Nathan F Yospe
  - Why not compile java into object code? Niklas Elmqvist
  - Why not compile java into object code? Ben Greear
- Why not compile java into object code? Jon A. Lambert
- Why not compile java into object code? Travis Casey
- Why not compile java into object code? Chris Gray
Tutorial: Comments on Hand-crafting a compiler Jon A. Lambert
Tutorial: Let's build a Compiler! - Part I: Introduction Jon A. Lambert
Tutorial: Let's build a Compiler! - Part II: Expression Parsing Jon A. Lambert
Tutorial: Let's build a Compiler! - Part I: Introductio Chris Gray
- Tutorial: Let's build a Compiler! - Part I: Introductio s001gmu@nova.wright.edu
- - Tutorial: Let's build a Compiler! - Part I: Introductio coder@ibm.net
MUD Development Digest Dr. Cat
- MUD Development Digest Koster, Raph
- MUD Development Digest Mike Sellers
Tutorial: Let's build a Compiler! - Part III: More Expressions Jon A. Lambert
Tutorial: Let's build a Compiler! - Part IV: Interpreters Jon A. Lambert
Tutorial: Let's build a Compiler! - Part V: Control Constructs Jon A. Lambert
Tutorial: Let's build a Compiler! - Part VI: Boolean Expressions Jon A. Lambert
Tutorial: Let's build a Compiler! - Comments Jon A. Lambert
Tutorial: Let's build a Compiler! - Part VII: Lexical Scanning Jon A. Lambert
Tutorial: Let's build a Compiler! - Part VIII: A Little Philosophy Jon A. Lambert
Tutorial: Let's build a Compiler! - Part IX: A Top View Jon A. Lambert
Tutorial: Let's build a Compiler! - Part X: Introducing TINY Jon A. Lambert

LET'S BUILD A COMPILER!

By

Jack W. Crenshaw, Ph.D.

21 May 1989

Part X: INTRODUCING "TINY"

*****************************************************************
* *
* COPYRIGHT NOTICE *
* *
* Copyright (C) 1989 Jack W. Crenshaw. All rights reserved. *
* *
*****************************************************************

INTRODUCTION

In the last installment, I showed you the general idea for the
top-down development of a compiler. I gave you the first few
steps of the process for compilers for Pascal and C, but I
stopped far short of pushing it through to completion. The
reason was simple: if we're going to produce a real, functional
compiler for any language, I'd rather do it for KISS, the
language that I've been defining in this tutorial series.

In this installment, we're going to do just that, for a subset of
KISS which I've chosen to call TINY.

The process will be essentially that outlined in Installment IX,
except for one notable difference. In that installment, I
suggested that you begin with a full BNF description of the
language. That's fine for something like Pascal or C, for which
the language definition is firm. In the case of TINY, however,
we don't yet have a full description ... we seem to be defining
the language as we go. That's OK. In fact, it's preferable,
since we can tailor the language slightly as we go, to keep the
parsing easy.

So in the development that follows, we'll actually be doing a
top-down development of BOTH the language and its compiler. The
BNF description will grow along with the compiler.

In this process, there will be a number of decisions to be made,
each of which will influence the BNF and therefore the nature of
the language. At each decision point I'll try to remember to
explain the decision and the rationale behind my choice. That
way, if you happen to hold a different opinion and would prefer a
different option, you can choose it instead. You now have the
background to do that. I guess the important thing to note is
that nothing we do here is cast in concrete. When YOU'RE
designing YOUR language, you should feel free to do it YOUR way.

Many of you may be asking at this point: Why bother starting over
from scratch? We had a working subset of KISS as the outcome of
Installment VII (lexical scanning). Why not just extend it as
needed? The answer is threefold. First of all, I have been
making a number of changes to further simplify the program ...
changes like encapsulating the code generation procedures, so
that we can convert to a different target machine more easily.
Second, I want you to see how the development can indeed be done
from the top down as outlined in the last installment. Finally,
we both need the practice. Each time I go through this exercise,
I get a little better at it, and you will, also.

GETTING STARTED

Many years ago there were languages called Tiny BASIC, Tiny
Pascal, and Tiny C, each of which was a subset of its parent full
language. Tiny BASIC, for example, had only single-character
variable names and global variables. It supported only a single
data type. Sound familiar? At this point we have almost all the
tools we need to build a compiler like that.

Yet a language called Tiny-anything still carries some baggage
inherited from its parent language. I've often wondered if this
is a good idea. Granted, a language based upon some parent
language will have the advantage of familiarity, but there may
also be some peculiar syntax carried over from the parent that
may tend to add unnecessary complexity to the compiler. (Nowhere
is this more true than in Small C.)

I've wondered just how small and simple a compiler could be made
and still be useful, if it were designed from the outset to be
both easy to use and to parse. Let's find out. This language
will just be called "TINY," period. It's a subset of KISS, which
I also haven't fully defined, so that at least makes us
consistent (!). I suppose you could call it TINY KISS. But that
opens up a whole can of worms involving cuter and cuter (and
perhaps more risque) names, so let's just stick with TINY.

The main limitations of TINY will be because of the things we
haven't yet covered, such as data types. Like its cousins Tiny C
and Tiny BASIC, TINY will have only one data type, the 16-bit
integer. The first version we develop will also have no
procedure calls and will use single-character variable names,
although as you will see we can remove these restrictions without
much effort.

The language I have in mind will share some of the good features
of Pascal, C, and Ada. Taking a lesson from the comparison of
the Pascal and C compilers in the previous installment, though,
TINY will have a decided Pascal flavor. Wherever feasible, a
language structure will be bracketed by keywords or symbols, so
that the parser will know where it's going without having to
guess.

One other ground rule: As we go, I'd like to keep the compiler
producing real, executable code. Even though it may not DO much
at the beginning, it will at least do it correctly.

Finally, I'll use a couple of Pascal restrictions that make
sense: All data and procedures must be declared before they are
used. That makes good sense, even though for now the only data
type we'll use is a word. This rule in turn means that the only
reasonable place to put the executable code for the main program
is at the end of the listing.

The top-level definition will be similar to Pascal:

<program> ::= PROGRAM <top-level decl> <main> '.'

Already, we've reached a decision point. My first thought was to
make the main block optional. It doesn't seem to make sense to
write a "program" with no main program, but it does make sense if
we're allowing for multiple modules, linked together. As a
matter of fact, I intend to allow for this in KISS. But then we
begin to open up a can of worms that I'd rather leave closed for
now. For example, the term "PROGRAM" really becomes a misnomer.
The MODULE of Modula-2 or the Unit of Turbo Pascal would be more
appropriate. Second, what about scope rules? We'd need a
convention for dealing with name visibility across modules.
Better for now to just keep it simple and ignore the idea
altogether.

There's also a decision in choosing to require the main program
to be last. I toyed with the idea of making its position
optional, as in C. The nature of SK*DOS, the OS I'm compiling
for, make this very easy to do. But this doesn't really make
much sense in view of the Pascal-like requirement that all data
and procedures be declared before they're referenced. Since the
main program can only call procedures that have already been
declared, the only position that makes sense is at the end, a la
Pascal.

Given the BNF above, let's write a parser that just recognizes
the brackets:

{--------------------------------------------------------------}
{ Parse and Translate a Program }

procedure Prog;
begin
Match('p');
Header;
Prolog;
Match('.');
Epilog;
end;
{--------------------------------------------------------------}

The procedure Header just emits the startup code required by the
assembler:

{--------------------------------------------------------------}
{ Write Header Info }

procedure Header;
begin
WriteLn('WARMST', TAB, 'EQU $A01E');
end;
{--------------------------------------------------------------}

The procedures Prolog and Epilog emit the code for identifying
the main program, and for returning to the OS:

{--------------------------------------------------------------}
{ Write the Prolog }

procedure Prolog;
begin
PostLabel('MAIN');
end;

{--------------------------------------------------------------}
{ Write the Epilog }

procedure Epilog;
begin
EmitLn('DC WARMST');
EmitLn('END MAIN');
end;
{--------------------------------------------------------------}

The main program just calls Prog, and then looks for a clean
ending:

{--------------------------------------------------------------}
{ Main Program }

begin
Init;
Prog;
if Look <> CR then Abort('Unexpected data after ''.''');
end.
{--------------------------------------------------------------}

At this point, TINY will accept only one input "program," the
null program:

PROGRAM . (or 'p.' in our shorthand.)

Note, though, that the compiler DOES generate correct code for
this program. It will run, and do what you'd expect the null
program to do, that is, nothing but return gracefully to the OS.

As a matter of interest, one of my favorite compiler benchmarks
is to compile, link, and execute the null program in whatever
language is involved. You can learn a lot about the
implementation by measuring the overhead in time required to
compile what should be a trivial case. It's also interesting to
measure the amount of code produced. In many compilers, the code
can be fairly large, because they always include the whole run-
time library whether they need it or not. Early versions of
Turbo Pascal produced a 12K object file for this case. VAX C
generates 50K!

The smallest null programs I've seen are those produced by
Modula-2 compilers, and they run about 200-800 bytes.

In the case of TINY, we HAVE no run-time library as yet, so the
object code is indeed tiny: two bytes. That's got to be a
record, and it's likely to remain one since it is the minimum
size required by the OS.

The next step is to process the code for the main program. I'll
use the Pascal BEGIN-block:

<main> ::= BEGIN <block> END

Here, again, we have made a decision. We could have chosen to
require a "PROCEDURE MAIN" sort of declaration, similar to C. I
must admit that this is not a bad idea at all ... I don't
particularly like the Pascal approach since I tend to have
trouble locating the main program in a Pascal listing. But the
alternative is a little awkward, too, since you have to deal with
the error condition where the user omits the main program or
misspells its name. Here I'm taking the easy way out.

Another solution to the "where is the main program" problem might
be to require a name for the program, and then bracket the main
by

BEGIN <name>
END <name>

similar to the convention of Modula 2. This adds a bit of
"syntactic sugar" to the language. Things like this are easy to
add or change to your liking, if the language is your own design.

To parse this definition of a main block, change procedure Prog
to read:

{--------------------------------------------------------------}
{ Parse and Translate a Program }

procedure Prog;
begin
Match('p');
Header;
Main;
Match('.');
end;
{--------------------------------------------------------------}

and add the new procedure:

{--------------------------------------------------------------}
{ Parse and Translate a Main Program }

procedure Main;
begin
Match('b');
Prolog;
Match('e');
Epilog;
end;
{--------------------------------------------------------------}

Now, the only legal program is:

PROGRAM BEGIN END . (or 'pbe.')

Aren't we making progress??? Well, as usual it gets better. You
might try some deliberate errors here, like omitting the 'b' or
the 'e', and see what happens. As always, the compiler should
flag all illegal inputs.

DECLARATIONS

The obvious next step is to decide what we mean by a declaration.
My intent here is to have two kinds of declarations: variables
and procedures/functions. At the top level, only global
declarations are allowed, just as in C.

For now, there can only be variable declarations, identified by
the keyword VAR (abbreviated 'v'):

<top-level decls> ::= ( <data declaration> )*

<data declaration> ::= VAR <var-list>

Note that since there is only one variable type, there is no need
to declare the type. Later on, for full KISS, we can easily add
a type description.

The procedure Prog becomes:

{--------------------------------------------------------------}
{ Parse and Translate a Program }

procedure Prog;
begin
Match('p');
Header;
TopDecls;
Main;
Match('.');
end;
{--------------------------------------------------------------}

Now, add the two new procedures:

{--------------------------------------------------------------}
{ Process a Data Declaration }

procedure Decl;
begin
Match('v');
GetChar;
end;

{--------------------------------------------------------------}
{ Parse and Translate Global Declarations }

procedure TopDecls;
begin
while Look <> 'b' do
case Look of
'v': Decl;
else Abort('Unrecognized Keyword ''' + Look + '''');
end;
end;
{--------------------------------------------------------------}

Note that at this point, Decl is just a stub. It generates no
code, and it doesn't process a list ... every variable must occur
in a separate VAR statement.

OK, now we can have any number of data declarations, each
starting with a 'v' for VAR, before the BEGIN-block. Try a few
cases and see what happens.

DECLARATIONS AND SYMBOLS

That looks pretty good, but we're still only generating the null
program for output. A real compiler would issue assembler
directives to allocate storage for the variables. It's about
time we actually produced some code.

With a little extra code, that's an easy thing to do from
procedure Decl. Modify it as follows:

{--------------------------------------------------------------}
{ Parse and Translate a Data Declaration }

procedure Decl;
var Name: char;
begin
Match('v');
Alloc(GetName);
end;
{--------------------------------------------------------------}

The procedure Alloc just issues a command to the assembler to
allocate storage:

{--------------------------------------------------------------}
{ Allocate Storage for a Variable }

procedure Alloc(N: char);
begin
WriteLn(N, ':', TAB, 'DC 0');
end;
{--------------------------------------------------------------}

Give this one a whirl. Try an input that declares some
variables, such as:

pvxvyvzbe.

See how the storage is allocated? Simple, huh? Note also that
the entry point, "MAIN," comes out in the right place.

For the record, a "real" compiler would also have a symbol table
to record the variables being used. Normally, the symbol table
is necessary to record the type of each variable. But since in
this case all variables have the same type, we don't need a
symbol table for that reason. As it turns out, we're going to
find a symbol necessary even without different types, but let's
postpone that need until it arises.

Of course, we haven't really parsed the correct syntax for a data
declaration, since it involves a variable list. Our version only
permits a single variable. That's easy to fix, too.

The BNF for <var-list> is

<var-list> ::= <ident> (, <ident>)*

Adding this syntax to Decl gives this new version:

{--------------------------------------------------------------}
{ Parse and Translate a Data Declaration }

procedure Decl;
var Name: char;
begin
Match('v');
Alloc(GetName);
while Look = ',' do begin
GetChar;
Alloc(GetName);
end;
end;
{--------------------------------------------------------------}

OK, now compile this code and give it a try. Try a number of
lines of VAR declarations, try a list of several variables on one
line, and try combinations of the two. Does it work?

INITIALIZERS

As long as we're dealing with data declarations, one thing that's
always bothered me about Pascal is that it doesn't allow
initializing data items in the declaration. That feature is
admittedly sort of a frill, and it may be out of place in a
language that purports to be a minimal language. But it's also
SO easy to add that it seems a shame not to do so. The BNF
becomes:

<var-list> ::= <var> ( <var> )*

<var> ::= <ident> [ = <integer> ]

Change Alloc as follows:

{--------------------------------------------------------------}
{ Allocate Storage for a Variable }

procedure Alloc(N: char);
begin
Write(N, ':', TAB, 'DC ');
if Look = '=' then begin
Match('=');
WriteLn(GetNum);
end
else
WriteLn('0');
end;
{--------------------------------------------------------------}

There you are: an initializer with six added lines of Pascal.

OK, try this version of TINY and verify that you can, indeed,
give the variables initial values.

By golly, this thing is starting to look real! Of course, it
still doesn't DO anything, but it looks good, doesn't it?

Before leaving this section, I should point out that we've used
two versions of function GetNum. One, the earlier one, returns a
character value, a single digit. The other accepts a multi-digit
integer and returns an integer value. Either one will work here,
since WriteLn will handle either type. But there's no reason to
limit ourselves to single-digit values here, so the correct
version to use is the one that returns an integer. Here it is:

{--------------------------------------------------------------}
{ Get a Number }

function GetNum: integer;
var Val: integer;
begin
Val := 0;
if not IsDigit(Look) then Expected('Integer');
while IsDigit(Look) do begin
Val := 10 * Val + Ord(Look) - Ord('0');
GetChar;
end;
GetNum := Val;
end;
{--------------------------------------------------------------}

As a matter of fact, strictly speaking we should allow for
expressions in the data field of the initializer, or at the very
least for negative values. For now, let's just allow for
negative values by changing the code for Alloc as follows:

{--------------------------------------------------------------}
{ Allocate Storage for a Variable }

procedure Alloc(N: char);
begin
if InTable(N) then Abort('Duplicate Variable Name ' + N);
ST[N] := 'v';
Write(N, ':', TAB, 'DC ');
if Look = '=' then begin
Match('=');
If Look = '-' then begin
Write(Look);
Match('-');
end;
WriteLn(GetNum);
end
else
WriteLn('0');
end;
{--------------------------------------------------------------}

Now you should be able to initialize variables with negative
and/or multi-digit values.

THE SYMBOL TABLE

There's one problem with the compiler as it stands so far: it
doesn't do anything to record a variable when we declare it. So
the compiler is perfectly content to allocate storage for several
variables with the same name. You can easily verify this with an
input like

pvavavabe.

Here we've declared the variable A three times. As you can see,
the compiler will cheerfully accept that, and generate three
identical labels. Not good.

Later on, when we start referencing variables, the compiler will
also let us reference variables that don't exist. The assembler
will catch both of these error conditions, but it doesn't seem
friendly at all to pass such errors along to the assembler. The
compiler should catch such things at the source language level.

So even though we don't need a symbol table to record data types,
we ought to install one just to check for these two conditions.
Since at this point we are still restricted to single-character
variable names, the symbol table can be trivial. To provide for
it, first add the following declaration at the beginning of your
program:

var ST: array['A'..'Z'] of char;

and insert the following function:

{--------------------------------------------------------------}
{ Look for Symbol in Table }

function InTable(n: char): Boolean;
begin
InTable := ST[n] <> ' ';
end;
{--------------------------------------------------------------}

We also need to initialize the table to all blanks. The
following lines in Init will do the job:

var i: char;
begin
for i := 'A' to 'Z' do
ST[i] := ' ';
...

Finally, insert the following two lines at the beginning of
Alloc:

if InTable(N) then Abort('Duplicate Variable Name ' + N);
ST[N] := 'v';

That should do it. The compiler will now catch duplicate
declarations. Later, we can also use InTable when generating
references to the variables.

EXECUTABLE STATEMENTS

At this point, we can generate a null program that has some data
variables declared and possibly initialized. But so far we
haven't arranged to generate the first line of executable code.

Believe it or not, though, we almost have a usable language!
What's missing is the executable code that must go into the main
program. But that code is just assignment statements and control
statements ... all stuff we have done before. So it shouldn't
take us long to provide for them, as well.

The BNF definition given earlier for the main program included a
statement block, which we have so far ignored:

<main> ::= BEGIN <block> END

For now, we can just consider a block to be a series of
assignment statements:

<block> ::= (Assignment)*

Let's start things off by adding a parser for the block. We'll
begin with a stub for the assignment statement:

{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }

procedure Assignment;
begin
GetChar;
end;

{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }

procedure Block;
begin
while Look <> 'e' do
Assignment;
end;
{--------------------------------------------------------------}

Modify procedure Main to call Block as shown below:

{--------------------------------------------------------------}
{ Parse and Translate a Main Program }

procedure Main;
begin
Match('b');
Prolog;
Block;
Match('e');
Epilog;
end;
{--------------------------------------------------------------}

This version still won't generate any code for the "assignment
statements" ... all it does is to eat characters until it sees
the 'e' for 'END.' But it sets the stage for what is to follow.

The next step, of course, is to flesh out the code for an
assignment statement. This is something we've done many times
before, so I won't belabor it. This time, though, I'd like to
deal with the code generation a little differently. Up till now,
we've always just inserted the Emits that generate output code in
line with the parsing routines. A little unstructured, perhaps,
but it seemed the most straightforward approach, and made it easy
to see what kind of code would be emitted for each construct.

However, I realize that most of you are using an 80x86 computer,
so the 68000 code generated is of little use to you. Several of
you have asked me if the CPU-dependent code couldn't be collected
into one spot where it would be easier to retarget to another
CPU. The answer, of course, is yes.

To accomplish this, insert the following "code generation"
routines:

{---------------------------------------------------------------}
{ Clear the Primary Register }

procedure Clear;
begin
EmitLn('CLR D0');
end;

{---------------------------------------------------------------}
{ Negate the Primary Register }

procedure Negate;
begin
EmitLn('NEG D0');
end;

{---------------------------------------------------------------}
{ Load a Constant Value to Primary Register }

procedure LoadConst(n: integer);
begin
Emit('MOVE #');
WriteLn(n, ',D0');
end;

{---------------------------------------------------------------}
{ Load a Variable to Primary Register }

procedure LoadVar(Name: char);
begin
if not InTable(Name) then Undefined(Name);
EmitLn('MOVE ' + Name + '(PC),D0');
end;

{---------------------------------------------------------------}
{ Push Primary onto Stack }

procedure Push;
begin
EmitLn('MOVE D0,-(SP)');
end;

{---------------------------------------------------------------}
{ Add Top of Stack to Primary }

procedure PopAdd;
begin
EmitLn('ADD (SP)+,D0');
end;

{---------------------------------------------------------------}
{ Subtract Primary from Top of Stack }

procedure PopSub;
begin
EmitLn('SUB (SP)+,D0');
EmitLn('NEG D0');
end;

{---------------------------------------------------------------}
{ Multiply Top of Stack by Primary }

procedure PopMul;
begin
EmitLn('MULS (SP)+,D0');
end;

{---------------------------------------------------------------}
{ Divide Top of Stack by Primary }

procedure PopDiv;
begin
EmitLn('MOVE (SP)+,D7');
EmitLn('EXT.L D7');
EmitLn('DIVS D0,D7');
EmitLn('MOVE D7,D0');
end;

{---------------------------------------------------------------}
{ Store Primary to Variable }

procedure Store(Name: char);
begin
if not InTable(Name) then Undefined(Name);
EmitLn('LEA ' + Name + '(PC),A0');
EmitLn('MOVE D0,(A0)')
end;
{---------------------------------------------------------------}

The nice part of this approach, of course, is that we can
retarget the compiler to a new CPU simply by rewriting these
"code generator" procedures. In addition, we will find later
that we can improve the code quality by tweaking these routines a
bit, without having to modify the compiler proper.

Note that both LoadVar and Store check the symbol table to make
sure that the variable is defined. The error handler Undefined
simply calls Abort:

{--------------------------------------------------------------}
{ Report an Undefined Identifier }

procedure Undefined(n: string);
begin
Abort('Undefined Identifier ' + n);
end;
{--------------------------------------------------------------}

OK, we are now finally ready to begin processing executable code.
We'll do that by replacing the stub version of procedure
Assignment.

We've been down this road many times before, so this should all
be familiar to you. In fact, except for the changes associated
with the code generation, we could just copy the procedures from
Part VII. Since we are making some changes, I won't just copy
them, but we will go a little faster than usual.

The BNF for the assignment statement is:

<assignment> ::= <ident> = <expression>

<expression> ::= <first term> ( <addop> <term> )*

<first term> ::= <first factor> <rest>

<term> ::= <factor> <rest>

<rest> ::= ( <mulop> <factor> )*

<first factor> ::= [ <addop> ] <factor>

<factor> ::= <var> | <number> | ( <expression> )

This version of the BNF is also a bit different than we've used
before ... yet another "variation on the theme of an expression."
This particular version has what I consider to be the best
treatment of the unary minus. As you'll see later, it lets us
handle negative constant values efficiently. It's worth
mentioning here that we have often seen the advantages of
"tweaking" the BNF as we go, to help make the language easy to
parse. What you're looking at here is a bit different: we've
tweaked the BNF to make the CODE GENERATION more efficient!
That's a first for this series.

Anyhow, the following code implements the BNF:

{---------------------------------------------------------------}
{ Parse and Translate a Math Factor }

procedure Expression; Forward;

procedure Factor;
begin
if Look = '(' then begin
Match('(');
Expression;
Match(')');
end
else if IsAlpha(Look) then
LoadVar(GetName)
else
LoadConst(GetNum);
end;

{--------------------------------------------------------------}
{ Parse and Translate a Negative Factor }

procedure NegFactor;
begin
Match('-');
if IsDigit(Look) then
LoadConst(-GetNum)
else begin
Factor;
Negate;
end;
end;

{--------------------------------------------------------------}
{ Parse and Translate a Leading Factor }

procedure FirstFactor;
begin
case Look of
'+': begin
Match('+');
Factor;
end;
'-': NegFactor;
else Factor;
end;
end;

{--------------------------------------------------------------}
{ Recognize and Translate a Multiply }

procedure Multiply;
begin
Match('*');
Factor;
PopMul;
end;

{-------------------------------------------------------------}
{ Recognize and Translate a Divide }

procedure Divide;
begin
Match('/');
Factor;
PopDiv;
end;

{---------------------------------------------------------------}
{ Common Code Used by Term and FirstTerm }

procedure Term1;
begin
while IsMulop(Look) do begin
Push;
case Look of
'*': Multiply;
'/': Divide;
end;
end;
end;

{---------------------------------------------------------------}
{ Parse and Translate a Math Term }

procedure Term;
begin
Factor;
Term1;
end;

{---------------------------------------------------------------}
{ Parse and Translate a Leading Term }

procedure FirstTerm;
begin
FirstFactor;
Term1;
end;

{--------------------------------------------------------------}
{ Recognize and Translate an Add }

procedure Add;
begin
Match('+');
Term;
PopAdd;
end;

{-------------------------------------------------------------}
{ Recognize and Translate a Subtract }

procedure Subtract;
begin
Match('-');
Term;
PopSub;
end;

{---------------------------------------------------------------}
{ Parse and Translate an Expression }

procedure Expression;
begin
FirstTerm;
while IsAddop(Look) do begin
Push;
case Look of
'+': Add;
'-': Subtract;
end;
end;
end;

{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }

procedure Assignment;
var Name: char;
begin
Name := GetName;
Match('=');
Expression;
Store(Name);
end;
{--------------------------------------------------------------}

OK, if you've got all this code inserted, then compile it and
check it out. You should be seeing reasonable-looking code,
representing a complete program that will assemble and execute.
We have a compiler!

BOOLEANS

The next step should also be familiar to you. We must add
Boolean expressions and relational operations. Again, since
we've already dealt with them more than once, I won't elaborate
much on them, except where they are different from what we've
done before. Again, we won't just copy from other files because
I've changed a few things just a bit. Most of the changes just
involve encapsulating the machine-dependent parts as we did for
the arithmetic operations. I've also modified procedure
NotFactor somewhat, to parallel the structure of FirstFactor.
Finally, I corrected an error in the object code for the
relational operators: The Scc instruction I used only sets the
low 8 bits of D0. We want all 16 bits set for a logical true, so
I've added an instruction to sign-extend the low byte.

To begin, we're going to need some more recognizers:

{--------------------------------------------------------------}
{ Recognize a Boolean Orop }

function IsOrop(c: char): boolean;
begin
IsOrop := c in ['|', '~'];
end;

{--------------------------------------------------------------}
{ Recognize a Relop }

function IsRelop(c: char): boolean;
begin
IsRelop := c in ['=', '#', '<', '>'];
end;
{--------------------------------------------------------------}

Also, we're going to need some more code generation routines:

{---------------------------------------------------------------}
{ Complement the Primary Register }

procedure NotIt;
begin
EmitLn('NOT D0');
end;
{---------------------------------------------------------------}
Compilers: Toy available for ftp Chris Gray

MUD-Dev Archive

Year

Month

Index

February 1998