Year

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

Month

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Index

February 1998

OT: Following in the footsteps of JCL Alex Oren
- OT: Following in the footsteps of JCL Nathan Yospe
- OT: Following in the footsteps of JCL Richard Woolcock
- OT: Following in the footsteps of JCL Chris Gray
- - OT: Following in the footsteps of JCL coder@ibm.net
- OT: Following in the footsteps of JCL Marc Eyrignoux
Ada? Andrew C.M. McClintock
- Ada? Cynbe ru Taren
- Ada? Jon Leonard
Monthly FAQ posting Koster, Raph
- Monthly FAQ posting Adam Wiggins
Monthly FAQ posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Chris Gray
- Monthly FAQ Posting Ling
- - Monthly FAQ Posting J C Lawrence
  - - Monthly FAQ Posting J C Lawrence
  - Monthly FAQ Posting Alex Oren
  - Monthly FAQ Posting Greg Miller
- Monthly FAQ Posting s001gmu@nova.wright.edu
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling
- Monthly FAQ Posting Ling Lo
- Monthly FAQ Posting Koster, Raph
- Monthly FAQ Posting Ling Lo
- - Monthly FAQ Posting Greg Miller
  - - Monthly FAQ Posting Marian Griffith
    - - Monthly FAQ Posting Greg Miller
- Monthly FAQ Posting Ling Lo
Databases Shawn Halpenny
- Databases coder@ibm.net
- Databases T. Alexander Popiel
- - Databases Greg Miller
  - - Databases Sellers, Michael
    - - Databases Greg Miller
- Databases Dream Weaver at home
- Databases Laurent Bossavit
- - Databases Kevin Littlejohn
  - - Databases J C Lawrence
OT: This is a test coder@ibm.net
- OT: This is a test Alex Oren
Clients and things [Was: OT: This is a test] Matt Chatterley
- Clients and things [Was: OT: This is a test] coder@ibm.net
- - Clients and things [Was: OT: This is a test] Matt Chatterley
MUD Development Digest Dr. Cat
DBs and Events Greg Munt
- DBs and Events Nathan Yospe
- - DBs and Events Greg Munt
  - - DBs and Events Nathan Yospe
    - - DBs and Events Felix A. Croes
      - DBs and Events Jon A. Lambert
      - DBs and Events coder@ibm.net
    - DBs and Events s001gmu@nova.wright.edu
- DBs and Events Jon A. Lambert
- DBs and Events coder@ibm.net
(subject missing) Ben Greear
META: Unsubscribed users dur to bounces coder@ibm.net
- META: Unsubscribed users dur to bounces Adam Wiggins
Source Code Release Greg Munt
- Source Code Release Ben Greear
- - Source Code Release Greg Munt
  - - Source Code Release Ben Greear
    - - Source Code Release Greg Munt
- Source Code Release Richard Woolcock
- - Source Code Release Ben Greear
  - Source Code Release Chris Gray
  - Source Code Release Greg Munt
  - - Source Code Release coder@ibm.net
    - Source Code Release Richard Woolcock
- Source Code Release Stephen Zepp
- Source Code Release Jon A. Lambert
- - Source Code Release Greg Munt
  - - Source Code Release Jon A. Lambert
    - - Source Code Release Greg Munt
      - Source Code Release Travis Casey
        Source Code Release Jon A. Lambert
    - Source Code Release Jon A. Lambert
[RESEARCH]MUD articles archive (fwd) Greg Munt
Socket programming (Was: The impact of the web on muds) Jon Leonard
- Socket programming (was: The impact of the web on muds) Vadim Tkachenko
- - Socket programming (was: The impact of the web on muds) Richard Woolcock
byte-code anyone? Chris Gray
- byte-code anyone? Jon Leonard
- - byte-code anyone? Chris Gray
  - - byte-code anyone? Jon Leonard
    - - byte-code anyone? Chris Gray
user-centered design (was Clients) Mike Sellers
OT: Linux g++ Greg Munt
- OT: Linux g++ Ben Greear
- - OT: Linux g++ coder@ibm.net
  - - OT: Linux g++ Shawn Halpenny
- OT: Linux g++ Chris Gray
OT: Clients Vadim Tkachenko
- OT: Clients Adam Wiggins
- OT: Clients coder@ibm.net
META: OS wars coder@ibm.net
- META: OS wars Mike Sellers
Clients Stephen Zepp
- Clients Ling
- Clients Travis Casey
- - Clients Katrina McClelan
  - - Clients Nathan F Yospe
    - - Clients Chris Gray
    - Clients coder@ibm.net
    - - Clients Katrina McClelan
      - Clients Travis Casey
        Clients Chris Gray
        Clients Vadim Tkachenko
        Clients J C Lawrence
- Clients Greg Munt
- - Clients Stephen Zepp
  - Clients Matt Chatterley
Moore's Law sucks (was: 3D graphics) Brandon J. Rickman
- Moore's Law sucks (was: 3D graphics) Adam Wiggins
- - Moore's Law sucks (was: 3D graphics) Chris Gray
  - Moore's Law sucks (was: 3D graphics) coder@ibm.net
  - Moore's Law sucks (was: 3D graphics) Brandon J. Rickman
- Moore's Law sucks (was: 3D graphics) Mike Sellers
- Moore's Law sucks (was: 3D graphics) Chris Gray
- Moore's Law sucks (was: 3D graphics) Ben Greear
- - Moore's Law sucks (was: 3D graphics) Ling
  - Moore's Law sucks (was: 3D graphics) Brandon J. Rickman
- Moore's Law sucks (was: 3D graphics) coder@ibm.net
- Moore's Law sucks (was: 3D graphics) Alex Oren
Version Control (was: DBs and Events) Vadim Tkachenko
- Version Control (was: DBs and Events) coder@ibm.net
- - Version Control (was: DBs and Events) Vadim Tkachenko
  - - Version Control (was: DBs and Events) coder@ibm.net
    - Version Control (was: DBs and Events) coder@ibm.net
    - - Version Control (was: DBs and Events) Jon A. Lambert
  - Version Control (was: DBs and Events) s001gmu@nova.wright.edu
  - Version Control (was: DBs and Events) Raph & Kristen Koster
  - - Version Control (was: DBs and Events) coder@ibm.net
    - - Version Control (was: DBs and Events) Felix A. Croes
Net protocols for MUDing (was: Moore's Law sucks) Jon Leonard
- Net protocols for MUDing (was: Moore's Law sucks) coder@ibm.net
- - Net protocols for MUDing (was: Moore's Law sucks) s001gmu@nova.wright.edu
  - Net protocols for MUDing (was: Moore's Law sucks) Vadim Tkachenko
  - Net protocols for MUDing (was: Moore's Law sucks) Chris Gray
  - - Net protocols for MUDing (was: Moore's Law sucks) Caliban Tiresias Darklock
    - - Net protocols for MUDing (was: Moore's Law sucks) J C Lawrence
    - Net protocols for MUDing (was: Moore's Law sucks) J C Lawrence
  - Net protocols for MUDing (was: Moore's Law sucks) Adam Wiggins
  - - Net protocols for MUDing (was: Moore's Law sucks) Chris Gray
    - - Net protocols for MUDing (was: Moore's Law sucks) Vadim Tkachenko
      - Net protocols for MUDing (was: Moore's Law sucks) Chris Gray
        Net protocols for MUDing (was: Moore's Law sucks) Vadim Tkachenko
        Net protocols for MUDing (was: Moore's Law sucks) Chris Gray
        Net protocols for MUDing (was: Moore's Law sucks) Adam Wiggins
        Net protocols for MUDing (was: Moore's Law sucks) Chris Gray
        Net protocols for MUDing (was: Moore's Law sucks) J C Lawrence
        Net protocols for MUDing (was: Moore's Law sucks) Chris Gray
        Net protocols for MUDing (was: Moore's Law sucks) Joel Dillon
        Net protocols for MUDing (was: Moore's Law sucks) Chris Gray
        Net protocols for MUDing (was: Moore's Law sucks) Adam Wiggins
        Net protocols for MUDing (was: Moore's Law sucks) Ben Greear
        Net protocols for MUDing (was: Moore's Law sucks) Adam Wiggins
    - Net protocols for MUDing (was: Moore's Law sucks) s001gmu@nova.wright.edu
- Net protocols for MUDing (was: Moore's Law sucks) Jon A. Lambert
- Net protocols for MUDing (was: Moore's Law sucks) J C Lawrence
VEIL (was: Clients) Brandon Gillespie
LDMs (large dynamic maps) was Unique items Mike Sellers
Describing the environment Stephen Zepp
- Describing the environment Richard Woolcock
Back on the list Niklas Elmqvist
- Back on the list Chris Gray
- Back on the list coder@ibm.net
Unique items The Eternal City
- Unique items coder@ibm.net
Position sorting Adam Wiggins
- Position sorting coder@ibm.net
Unique items coder@ibm.net
- Unique items Nathan F Yospe
BOOK: Myer's Silverlock coder@ibm.net
- BOOK: Myer's Silverlock Chris Gray
- BOOK: Myer's Silverlock Adam Wiggins
Dynamic Loading of Modules (was: Back on the list) Niklas Elmqvist
- Dynamic Loading of Modules (was: Back on the list) Vadim Tkachenko
- Dynamic Loading of Modules (was: Back on the list) J C Lawrence
- Dynamic Loading of Modules (was: Back on the list Jon A. Lambert
Net protocols for MUDing s001gmu@nova.wright.edu
- Net protocols for MUDing Stephen Zepp
- - Net protocols for MUDing Chris Gray
  - Net protocols for MUDing Adam Wiggins
  - Net protocols for MUDing J C Lawrence
  - - Net protocols for MUDing Shawn Halpenny
    - - Net protocols for MUDing J C Lawrence
    - Net protocols for MUDing Chris Gray
Dynamic Loading of Modules Niklas Elmqvist
Senses (was: The MLI Project) s001gmu@nova.wright.edu
bar-time (was The MLI Project) Mike Sellers
3D engines for MUDs (was: The MLI Project) Niklas Elmqvist
- 3D engines for MUDs (was: The MLI Project) J C Lawrence
- - 3D engines for MUDs (was: The MLI Project) Michael Hohensee
  - - 3D engines for MUDs (was: The MLI Project) Miroslav Silovic
    - - 3D engines for MUDs (was: The MLI Project) Michael Hohensee
Why not compile java into object code? Ben Greear
- Why not compile java into object code? Cynbe ru Taren
- Why not compile java into object code? Caliban Tiresias Darklock
- - Why not compile java into object code? Nathan F Yospe
  - Why not compile java into object code? Niklas Elmqvist
  - Why not compile java into object code? Ben Greear
- Why not compile java into object code? Jon A. Lambert
- Why not compile java into object code? Travis Casey
- Why not compile java into object code? Chris Gray
Tutorial: Comments on Hand-crafting a compiler Jon A. Lambert
Tutorial: Let's build a Compiler! - Part I: Introduction Jon A. Lambert
Tutorial: Let's build a Compiler! - Part II: Expression Parsing Jon A. Lambert
Tutorial: Let's build a Compiler! - Part I: Introductio Chris Gray
- Tutorial: Let's build a Compiler! - Part I: Introductio s001gmu@nova.wright.edu
- - Tutorial: Let's build a Compiler! - Part I: Introductio coder@ibm.net
MUD Development Digest Dr. Cat
- MUD Development Digest Koster, Raph
- MUD Development Digest Mike Sellers
Tutorial: Let's build a Compiler! - Part III: More Expressions Jon A. Lambert

LET'S BUILD A COMPILER!

By

Jack W. Crenshaw, Ph.D.

4 Aug 1988

Part III: MORE EXPRESSIONS

*****************************************************************
* *
* COPYRIGHT NOTICE *
* *
* Copyright (C) 1988 Jack W. Crenshaw. All rights reserved. *
* *
*****************************************************************

INTRODUCTION

In the last installment, we examined the techniques used to parse
and translate a general math expression. We ended up with a
simple parser that could handle arbitrarily complex expressions,
with two restrictions:

o No variables were allowed, only numeric factors

o The numeric factors were limited to single digits

In this installment, we'll get rid of those restrictions. We'll
also extend what we've done to include assignment statements
function calls and. Remember, though, that the second
restriction was mainly self-imposed ... a choice of convenience
on our part, to make life easier and to let us concentrate on the
fundamental concepts. As you'll see in a bit, it's an easy
restriction to get rid of, so don't get too hung up about it.
We'll use the trick when it serves us to do so, confident that we
can discard it when we're ready to.

VARIABLES

Most expressions that we see in practice involve variables, such
as

b * b + 4 * a * c

No parser is much good without being able to deal with them.
Fortunately, it's also quite easy to do.

Remember that in our parser as it currently stands, there are two
kinds of factors allowed: integer constants and expressions
within parentheses. In BNF notation,

<factor> ::= <number> | (<expression>)

The '|' stands for "or", meaning of course that either form is a
legal form for a factor. Remember, too, that we had no trouble
knowing which was which ... the lookahead character is a left
paren '(' in one case, and a digit in the other.

It probably won't come as too much of a surprise that a variable
is just another kind of factor. So we extend the BNF above to
read:

<factor> ::= <number> | (<expression>) | <variable>

Again, there is no ambiguity: if the lookahead character is a
letter, we have a variable; if a digit, we have a number. Back
when we translated the number, we just issued code to load the
number, as immediate data, into D0. Now we do the same, only we
load a variable.

A minor complication in the code generation arises from the fact
that most 68000 operating systems, including the SK*DOS that I'm
using, require the code to be written in "position-independent"
form, which basically means that everything is PC-relative. The
format for a load in this language is

MOVE X(PC),D0

where X is, of course, the variable name. Armed with that, let's
modify the current version of Factor to read:

{---------------------------------------------------------------}
{ Parse and Translate a Math Factor }

procedure Expression; Forward;

procedure Factor;
begin
if Look = '(' then begin
Match('(');
Expression;
Match(')');
end
else if IsAlpha(Look) then
EmitLn('MOVE ' + GetName + '(PC),D0')
else
EmitLn('MOVE #' + GetNum + ',D0');
end;
{--------------------------------------------------------------}

I've remarked before how easy it is to add extensions to the
parser, because of the way it's structured. You can see that
this still holds true here. This time it cost us all of two
extra lines of code. Notice, too, how the if-else-else structure
exactly parallels the BNF syntax equation.

OK, compile and test this new version of the parser. That didn't
hurt too badly, did it?

FUNCTIONS

There is only one other common kind of factor supported by most
languages: the function call. It's really too early for us to
deal with functions well, because we haven't yet addressed the
issue of parameter passing. What's more, a "real" language would
include a mechanism to support more than one type, one of which
should be a function type. We haven't gotten there yet, either.
But I'd still like to deal with functions now for a couple of
reasons. First, it lets us finally wrap up the parser in
something very close to its final form, and second, it brings up
a new issue which is very much worth talking about.

Up till now, we've been able to write what is called a
"predictive parser." That means that at any point, we can know
by looking at the current lookahead character exactly what to do
next. That isn't the case when we add functions. Every language
has some naming rules for what constitutes a legal identifier.
For the present, ours is simply that it is one of the letters
'a'..'z'. The problem is that a variable name and a function
name obey the same rules. So how can we tell which is which?
One way is to require that they each be declared before they are
used. Pascal takes that approach. The other is that we might
require a function to be followed by a (possibly empty) parameter
list. That's the rule used in C.

Since we don't yet have a mechanism for declaring types, let's
use the C rule for now. Since we also don't have a mechanism to
deal with parameters, we can only handle empty lists, so our
function calls will have the form

x() .

Since we're not dealing with parameter lists yet, there is
nothing to do but to call the function, so we need only to issue
a BSR (call) instead of a MOVE.

Now that there are two possibilities for the "If IsAlpha" branch
of the test in Factor, let's treat them in a separate procedure.
Modify Factor to read:

{---------------------------------------------------------------}
{ Parse and Translate a Math Factor }

procedure Expression; Forward;

procedure Factor;
begin
if Look = '(' then begin
Match('(');
Expression;
Match(')');
end
else if IsAlpha(Look) then
Ident
else
EmitLn('MOVE #' + GetNum + ',D0');
end;
{--------------------------------------------------------------}

and insert before it the new procedure

{---------------------------------------------------------------}
{ Parse and Translate an Identifier }

procedure Ident;
var Name: char;
begin
Name := GetName;
if Look = '(' then begin
Match('(');
Match(')');
EmitLn('BSR ' + Name);
end
else
EmitLn('MOVE ' + Name + '(PC),D0')
end;
{---------------------------------------------------------------}

OK, compile and test this version. Does it parse all legal
expressions? Does it correctly flag badly formed ones?

The important thing to notice is that even though we no longer
have a predictive parser, there is little or no complication
added with the recursive descent approach that we're using. At
the point where Factor finds an identifier (letter), it doesn't
know whether it's a variable name or a function name, nor does it
really care. It simply passes it on to Ident and leaves it up to
that procedure to figure it out. Ident, in turn, simply tucks
away the identifier and then reads one more character to decide
which kind of identifier it's dealing with.

Keep this approach in mind. It's a very powerful concept, and it
should be used whenever you encounter an ambiguous situation
requiring further lookahead. Even if you had to look several
tokens ahead, the principle would still work.

MORE ON ERROR HANDLING

As long as we're talking philosophy, there's another important
issue to point out: error handling. Notice that although the
parser correctly rejects (almost) every malformed expression we
can throw at it, with a meaningful error message, we haven't
really had to do much work to make that happen. In fact, in the
whole parser per se (from Ident through Expression) there are
only two calls to the error routine, Expected. Even those aren't
necessary ... if you'll look again in Term and Expression, you'll
see that those statements can't be reached. I put them in early
on as a bit of insurance, but they're no longer needed. Why
don't you delete them now?

So how did we get this nice error handling virtually for free?
It's simply that I've carefully avoided reading a character
directly using GetChar. Instead, I've relied on the error
handling in GetName, GetNum, and Match to do all the error
checking for me. Astute readers will notice that some of the
calls to Match (for example, the ones in Add and Subtract) are
also unnecessary ... we already know what the character is by the
time we get there ... but it maintains a certain symmetry to
leave them in, and the general rule to always use Match instead
of GetChar is a good one.

I mentioned an "almost" above. There is a case where our error
handling leaves a bit to be desired. So far we haven't told our
parser what and end-of-line looks like, or what to do with
embedded white space. So a space character (or any other
character not part of the recognized character set) simply causes
the parser to terminate, ignoring the unrecognized characters.

It could be argued that this is reasonable behavior at this
point. In a "real" compiler, there is usually another statement
following the one we're working on, so any characters not treated
as part of our expression will either be used for or rejected as
part of the next one.

But it's also a very easy thing to fix up, even if it's only
temporary. All we have to do is assert that the expression
should end with an end-of-line , i.e., a carriage return.

To see what I'm talking about, try the input line

1+2 <space> 3+4

See how the space was treated as a terminator? Now, to make the
compiler properly flag this, add the line

if Look <> CR then Expected('Newline');

in the main program, just after the call to Expression. That
catches anything left over in the input stream. Don't forget to
define CR in the const statement:

CR = ^M;

As usual, recompile the program and verify that it does what it's
supposed to.

ASSIGNMENT STATEMENTS

OK, at this point we have a parser that works very nicely. I'd
like to point out that we got it using only 88 lines of
executable code, not counting what was in the cradle. The
compiled object file is a whopping 4752 bytes. Not bad,
considering we weren't trying very hard to save either source
code or object size. We just stuck to the KISS principle.

Of course, parsing an expression is not much good without having
something to do with it afterwards. Expressions USUALLY (but not
always) appear in assignment statements, in the form

<Ident> = <Expression>

We're only a breath away from being able to parse an assignment
statement, so let's take that last step. Just after procedure
Expression, add the following new procedure:

{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }

procedure Assignment;
var Name: char;
begin
Name := GetName;
Match('=');
Expression;
EmitLn('LEA ' + Name + '(PC),A0');
EmitLn('MOVE D0,(A0)')
end;
{--------------------------------------------------------------}

Note again that the code exactly parallels the BNF. And notice
further that the error checking was painless, handled by GetName
and Match.

The reason for the two lines of assembler has to do with a
peculiarity in the 68000, which requires this kind of construct
for PC-relative code.

Now change the call to Expression, in the main program, to one to
Assignment. That's all there is to it.

Son of a gun! We are actually compiling assignment statements.
If those were the only kind of statements in a language, all we'd
have to do is put this in a loop and we'd have a full-fledged
compiler!

Well, of course they're not the only kind. There are also little
items like control statements (IFs and loops), procedures,
declarations, etc. But cheer up. The arithmetic expressions
that we've been dealing with are among the most challenging in a
language. Compared to what we've already done, control
statements will be easy. I'll be covering them in the fifth
installment. And the other statements will all fall in line, as
long as we remember to KISS.

MULTI-CHARACTER TOKENS

Throughout this series, I've been carefully restricting
everything we do to single-character tokens, all the while
assuring you that it wouldn't be difficult to extend to multi-
character ones. I don't know if you believed me or not ... I
wouldn't really blame you if you were a bit skeptical. I'll
continue to use that approach in the sessions which follow,
because it helps keep complexity away. But I'd like to back up
those assurances, and wrap up this portion of the parser, by
showing you just how easy that extension really is. In the
process, we'll also provide for embedded white space. Before you
make the next few changes, though, save the current version of
the parser away under another name. I have some more uses for it
in the next installment, and we'll be working with the single-
character version.

Most compilers separate out the handling of the input stream into
a separate module called the lexical scanner. The idea is that
the scanner deals with all the character-by-character input, and
returns the separate units (tokens) of the stream. There may
come a time when we'll want to do something like that, too, but
for now there is no need. We can handle the multi-character
tokens that we need by very slight and very local modifications
to GetName and GetNum.

The usual definition of an identifier is that the first character
must be a letter, but the rest can be alphanumeric (letters or
numbers). To deal with this, we need one other recognizer
function

{--------------------------------------------------------------}
{ Recognize an Alphanumeric }

function IsAlNum(c: char): boolean;
begin
IsAlNum := IsAlpha(c) or IsDigit(c);
end;
{--------------------------------------------------------------}

Add this function to your parser. I put mine just after IsDigit.
While you're at it, might as well include it as a permanent
member of Cradle, too.

Now, we need to modify function GetName to return a string
instead of a character:

{--------------------------------------------------------------}
{ Get an Identifier }

function GetName: string;
var Token: string;
begin
Token := '';
if not IsAlpha(Look) then Expected('Name');
while IsAlNum(Look) do begin
Token := Token + UpCase(Look);
GetChar;
end;
GetName := Token;
end;
{--------------------------------------------------------------}

Similarly, modify GetNum to read:

{--------------------------------------------------------------}
{ Get a Number }

function GetNum: string;
var Value: string;
begin
Value := '';
if not IsDigit(Look) then Expected('Integer');
while IsDigit(Look) do begin
Value := Value + Look;
GetChar;
end;
GetNum := Value;
end;
{--------------------------------------------------------------}

Amazingly enough, that is virtually all the changes required to
the parser! The local variable Name in procedures Ident and
Assignment was originally declared as "char", and must now be
declared string[8]. (Clearly, we could make the string length
longer if we chose, but most assemblers limit the length anyhow.)
Make this change, and then recompile and test. _NOW_ do you
believe that it's a simple change?

WHITE SPACE

Before we leave this parser for awhile, let's address the issue
of white space. As it stands now, the parser will barf (or
simply terminate) on a single space character embedded anywhere
in the input stream. That's pretty unfriendly behavior. So
let's "productionize" the thing a bit by eliminating this last
restriction.

The key to easy handling of white space is to come up with a
simple rule for how the parser should treat the input stream, and
to enforce that rule everywhere. Up till now, because white
space wasn't permitted, we've been able to assume that after each
parsing action, the lookahead character Look contains the next
meaningful character, so we could test it immediately. Our
design was based upon this principle.

It still sounds like a good rule to me, so that's the one we'll
use. This means that every routine that advances the input
stream must skip over white space, and leave the next non-white
character in Look. Fortunately, because we've been careful to
use GetName, GetNum, and Match for most of our input processing,
it is only those three routines (plus Init) that we need to
modify.

Not surprisingly, we start with yet another new recognizer
routine:

{--------------------------------------------------------------}
{ Recognize White Space }

function IsWhite(c: char): boolean;
begin
IsWhite := c in [' ', TAB];
end;
{--------------------------------------------------------------}

We also need a routine that will eat white-space characters,
until it finds a non-white one:

{--------------------------------------------------------------}
{ Skip Over Leading White Space }

procedure SkipWhite;
begin
while IsWhite(Look) do
GetChar;
end;
{--------------------------------------------------------------}

Now, add calls to SkipWhite to Match, GetName, and GetNum as
shown below:

{--------------------------------------------------------------}
{ Match a Specific Input Character }

procedure Match(x: char);
begin
if Look <> x then Expected('''' + x + '''')
else begin
GetChar;
SkipWhite;
end;
end;

{--------------------------------------------------------------}
{ Get an Identifier }

function GetName: string;
var Token: string;
begin
Token := '';
if not IsAlpha(Look) then Expected('Name');
while IsAlNum(Look) do begin
Token := Token + UpCase(Look);
GetChar;
end;
GetName := Token;
SkipWhite;
end;

{--------------------------------------------------------------}
{ Get a Number }

function GetNum: string;
var Value: string;
begin
Value := '';
if not IsDigit(Look) then Expected('Integer');
while IsDigit(Look) do begin
Value := Value + Look;
GetChar;
end;
GetNum := Value;
SkipWhite;
end;
{--------------------------------------------------------------}

(Note that I rearranged Match a bit, without changing the
functionality.)

Finally, we need to skip over leading blanks where we "prime the
pump" in Init:

{--------------------------------------------------------------}
{ Initialize }

procedure Init;
begin
GetChar;
SkipWhite;
end;
{--------------------------------------------------------------}

Make these changes and recompile the program. You will find that
you will have to move Match below SkipWhite, to avoid an error
message from the Pascal compiler. Test the program as always to
make sure it works properly.

Since we've made quite a few changes during this session, I'm
reproducing the entire parser below:

{--------------------------------------------------------------}
program parse;

{--------------------------------------------------------------}
{ Constant Declarations }

const TAB = ^I;
CR = ^M;

{--------------------------------------------------------------}
{ Variable Declarations }

var Look: char; { Lookahead Character }

{--------------------------------------------------------------}
{ Read New Character From Input Stream }

procedure GetChar;
begin
Read(Look);
end;

{--------------------------------------------------------------}
{ Report an Error }

procedure Error(s: string);
begin
WriteLn;
WriteLn(^G, 'Error: ', s, '.');
end;

{--------------------------------------------------------------}
{ Report Error and Halt }

procedure Abort(s: string);
begin
Error(s);
Halt;
end;

{--------------------------------------------------------------}
{ Report What Was Expected }

procedure Expected(s: string);
begin
Abort(s + ' Expected');
end;

{--------------------------------------------------------------}
{ Recognize an Alpha Character }

function IsAlpha(c: char): boolean;
begin
IsAlpha := UpCase(c) in ['A'..'Z'];
end;

{--------------------------------------------------------------}
{ Recognize a Decimal Digit }

function IsDigit(c: char): boolean;
begin
IsDigit := c in ['0'..'9'];
end;

{--------------------------------------------------------------}
{ Recognize an Alphanumeric }

function IsAlNum(c: char): boolean;
begin
IsAlNum := IsAlpha(c) or IsDigit(c);
end;

{--------------------------------------------------------------}
{ Recognize an Addop }

function IsAddop(c: char): boolean;
begin
IsAddop := c in ['+', '-'];
end;

{--------------------------------------------------------------}
{ Recognize White Space }

function IsWhite(c: char): boolean;
begin
IsWhite := c in [' ', TAB];
end;

{--------------------------------------------------------------}
{ Skip Over Leading White Space }

procedure SkipWhite;
begin
while IsWhite(Look) do
GetChar;
end;

{--------------------------------------------------------------}
{ Match a Specific Input Character }

procedure Match(x: char);
begin
if Look <> x then Expected('''' + x + '''')
else begin
GetChar;
SkipWhite;
end;
end;

{--------------------------------------------------------------}
{ Get an Identifier }

function GetName: string;
var Token: string;
begin
Token := '';
if not IsAlpha(Look) then Expected('Name');
while IsAlNum(Look) do begin
Token := Token + UpCase(Look);
GetChar;
end;
GetName := Token;
SkipWhite;
end;

{--------------------------------------------------------------}
{ Get a Number }

function GetNum: string;
var Value: string;
begin
Value := '';
if not IsDigit(Look) then Expected('Integer');
while IsDigit(Look) do begin
Value := Value + Look;
GetChar;
end;
GetNum := Value;
SkipWhite;
end;

{--------------------------------------------------------------}
{ Output a String with Tab }

procedure Emit(s: string);
begin
Write(TAB, s);
end;

{--------------------------------------------------------------}
{ Output a String with Tab and CRLF }

procedure EmitLn(s: string);
begin
Emit(s);
WriteLn;
end;

{---------------------------------------------------------------}
{ Parse and Translate a Identifier }

procedure Ident;
var Name: string[8];
begin
Name:= GetName;
if Look = '(' then begin
Match('(');
Match(')');
EmitLn('BSR ' + Name);
end
else
EmitLn('MOVE ' + Name + '(PC),D0');
end;

{---------------------------------------------------------------}
{ Parse and Translate a Math Factor }

procedure Expression; Forward;

procedure Factor;
begin
if Look = '(' then begin
Match('(');
Expression;
Match(')');
end
else if IsAlpha(Look) then
Ident
else
EmitLn('MOVE #' + GetNum + ',D0');
end;

{--------------------------------------------------------------}
{ Recognize and Translate a Multiply }

procedure Multiply;
begin
Match('*');
Factor;
EmitLn('MULS (SP)+,D0');
end;

{-------------------------------------------------------------}
{ Recognize and Translate a Divide }

procedure Divide;
begin
Match('/');
Factor;
EmitLn('MOVE (SP)+,D1');
EmitLn('EXS.L D0');
EmitLn('DIVS D1,D0');
end;

{---------------------------------------------------------------}
{ Parse and Translate a Math Term }

procedure Term;
begin
Factor;
while Look in ['*', '/'] do begin
EmitLn('MOVE D0,-(SP)');
case Look of
'*': Multiply;
'/': Divide;
end;
end;
end;

{--------------------------------------------------------------}
{ Recognize and Translate an Add }

procedure Add;
begin
Match('+');
Term;
EmitLn('ADD (SP)+,D0');
end;

{-------------------------------------------------------------}
{ Recognize and Translate a Subtract }

procedure Subtract;
begin
Match('-');
Term;
EmitLn('SUB (SP)+,D0');
EmitLn('NEG D0');
end;

{---------------------------------------------------------------}
{ Parse and Translate an Expression }

procedure Expression;
begin
if IsAddop(Look) then
EmitLn('CLR D0')
else
Term;
while IsAddop(Look) do begin
EmitLn('MOVE D0,-(SP)');
case Look of
'+': Add;
'-': Subtract;
end;
end;
end;

{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }

procedure Assignment;
var Name: string[8];
begin
Name := GetName;
Match('=');
Expression;
EmitLn('LEA ' + Name + '(PC),A0');
EmitLn('MOVE D0,(A0)')
end;

{--------------------------------------------------------------}
{ Initialize }

procedure Init;
begin
GetChar;
SkipWhite;
end;

{--------------------------------------------------------------}
{ Main Program }

begin
Init;
Assignment;
If Look <> CR then Expected('NewLine');
end.
{--------------------------------------------------------------}

Now the parser is complete. It's got every feature we can put in
a one-line "compiler." Tuck it away in a safe place. Next time
we'll move on to a new subject, but we'll still be talking about
expressions for quite awhile. Next installment, I plan to talk a
bit about interpreters as opposed to compilers, and show you how
the structure of the parser changes a bit as we change what sort
of action has to be taken. The information we pick up there will
serve us in good stead later on, even if you have no interest in
interpreters. See you next time.

*****************************************************************
* *
* COPYRIGHT NOTICE *
* *
* Copyright (C) 1988 Jack W. Crenshaw. All rights reserved. *
* *
*****************************************************************

--
--/*\ Jon A. Lambert - TychoMUD Internet:jlsysinc@ix.netcom.com /*\--
--/*\ Mud Server Developer's Page <http://www.netcom.com/~jlsysinc> /*\--
--/*\ "Everything that deceives may be said to enchant" - Plato /*\--
Tutorial: Let's build a Compiler! - Part IV: Interpreters Jon A. Lambert
Tutorial: Let's build a Compiler! - Part V: Control Constructs Jon A. Lambert
Tutorial: Let's build a Compiler! - Part VI: Boolean Expressions Jon A. Lambert
Tutorial: Let's build a Compiler! - Comments Jon A. Lambert
Tutorial: Let's build a Compiler! - Part VII: Lexical Scanning Jon A. Lambert
Tutorial: Let's build a Compiler! - Part VIII: A Little Philosophy Jon A. Lambert
Tutorial: Let's build a Compiler! - Part IX: A Top View Jon A. Lambert
Tutorial: Let's build a Compiler! - Part X: Introducing TINY Jon A. Lambert
Compilers: Toy available for ftp Chris Gray

MUD-Dev Archive

Year

Month

Index

February 1998