Chris Gray wrote:
>[Jon A. Lambert:]
>
>[Oh dear, I'm running on again. Sorry about that!]
>
No problem. Me too.
>> I've noticed that you can obtain lots of little optimization tweaks by just using
>> "lazy evaluation". That is instead of emitting an opcode as soon as it is
>> identified, pass that opcode along as an integer to the next routine in the
>> parse tree as you begin to evaluate the next token.
[snip]
>
>Normally, you just do the special-case check at the place where you have
>all of the needed information. In this case, it would be in the processing
>for the '+' operation. I don't have an 'add1' opcode, so this particular
>example doesn't work for me, but here is a more generic one, which is
>a minor kind of "compile-time expression evaluation":
>
> case ex_negate:
> case ex_fNegate:
> ex = ex->ex_v.ex_expressionPtr->ex_right;
> if (ex->ex_kind == ex_int) {
> bcGenConst(- ex->ex_v.ex_integer);
> } else {
> bcComp(ex);
> bcWriteOp(bc_neg);
> }
> break;
>
>If the operand for the unary '-' operator is a constant, then just generate
>the negative of that constant, rather than generating the positive
>value and then a 'neg' byte-code instruction.
>
>Some compilers do a lot of special cases like this. Others have a lot
>of "tree rewriting rules". They work through the built-up parse trees,
>applying various rewriting rules wherever their patterns match. The result
>is still a valid parse tree, but it has been optimized with a bunch of
>special cases.
>
Not to belabor the point, since I think we're in agreement although coming from
different angles. The interesting thing about Crenshaw's toy compiler is that it's a
single phase compiler. The AST is the current state of the compiler's run-time
stack and code generation is done as it parser executes. Admittedly this
limits your optimizing options since the AST disappears when the parsing is
complete. It is rather elegantly simple though.
I thought I'd include another example clipped from the Crenshaw tutorial for the
benefit of anyone else reading. He also discusses the unary negate optimization
you mention above elsewhere.
---cut---
There is a concept in compiler theory called "lazy" translation.
The idea is that you typically don't just emit code at every
action. In fact, at the extreme you don't emit anything at all,
until you absolutely have to. To accomplish this, the actions
associated with the parsing routines typically don't just emit
code. Sometimes they do, but often they simply return in-
formation back to the caller. Armed with such information, the
caller can then make a better choice of what to do.
For example, given the statement
x = x + 3 - 2 - (5 - 4) ,
our compiler will dutifully spit out a stream of 18 instructions
to load each parameter into registers, perform the arithmetic,
and store the result. A lazier evaluation would recognize that
the arithmetic involving constants can be evaluated at compile
time, and would reduce the expression to
x = x + 0 .
An even lazier evaluation would then be smart enough to figure
out that this is equivalent to
x = x ,
which calls for no action at all. We could reduce 18 in-
structions to zero!
---end cut---
As you can see Crenshaw is in conflict with your .sig ---
He designs in inefficiency in and removes it later... ;-)
>> Another possible optimization is for the parser to generate code for
>> a virtual machine that uses a combination of register and stack instructions.
>> The parser can be made smart enough to know to use registers.
>
>I thought about that a bit for my system. If you step back a bit and look
>at what a modern CPU is actually doing, you find that the register set is
>very much like a stack - they will both end up in the processor's data
>cache. The key thing to look at is how many native machine instructions
>it takes to execute the sequence you want to look at. With a register
>set, you are going to have to extract a register number from the byte-code
>stream, and use that to index the array of registers in the virtual machine.
>That takes native machine instructions, perhaps more than are needed to
>do stack operations, given that many real CPU's have post-increment and
>pre-decrement addressing modes that the native compiler can generate.
>
Hmm, I was thinking in terms of how many instructions my VM will be traversing
rather than how the hardware is going to handle those instructions.
Now I could implement my VM class as a 3 register machine using the members:
int a, x, y;
// instead of
int registers[3];
Much of this does depends on how your C/C++ compiler generates and
optimizes the code you write anyway. In Builder, accessing an array with
a constant subscript incurs no more overhead than any other automatic
variable.
From the bytecode angle the register is an intrinsic part of the opcode, that is
it is determined at compilation time. Now push() and pop() operations
in the VM are going to be to checking for a stack overflow or underflow, no?
There not just going to be doing a *sp++ = value; or return --*sp;
So a register specific bytecode like "lda_int" simply calls:
lda_int(int value)
{
a = value;
}
push(int value)
{
if (sp == stacktop)
throw(ErrorStackFault); // Or whatever you do...
*sp++ = value;
}
And quite possibly there are more savings in using an "add_a_x" register opcode
as opposed to a stack-based "add" opcode.
add_a_x()
{
push(a + x);
}
add()
{
push(pop()+pop());
}
Certainly you can inline all these VM functions for more fun.
There may also be possible optimizations by keeping loop counters
accessible in registers rather than on the stack.
I don't know. It may be something worth revisiting.
Besides is there anything more swell than a VM that looks like a 6502? :-)
I guess this might lead to the question of whether it is better to have more
or less opcodes for your VM's instruction set. I will note that the Java VM
has quite a boatload of special case opcodes, it even distinguishes between
pushing an int and pushing a constant one or zero. So is it better to accomplish
something by generating a longer series of simple bytecode operations, or
implement a lot of special purpose bytecodes?
>> And finally, there are optimizations that you just can't do with a RD parser.
>> You can pass the generated byte-code into a post processor that rescans
>> it for patterns that can be simplified. Call it a byte-code optimizer, or
>> peephole optimizer. The only problem with this last bit is that pretty printing
>> or reverse disassembly to source becomes rather difficult. I've noticed a
>> lot of mud servers do not do this and seek to support reverse code generation
>> instead of storing the source code somewhere. Perhaps they are missing
>> out on some major optimization possibilities.
>
>Those can readily be done in conjunction with recursive descent parsing.
>Let's make sure we understand the context here. Recursive descent parsing
>normally only means the parsing of the input token sequence into complete
>statements and functions understood by the language parser as a whole.
>That parser can be tied into direct code generation which emits byte-codes,
>or can produce a "parse-tree" data structure.
>
>In the latter case, a recursive walk over that tree will produce a byte-code
>stream. That recursive walk is *not* known as recursive descent parsing.
>I don't think I've seen the term "recursive descent" used for things other
>than parsing, just to avoid the possible confusions.
>
>In any case, optimizations can be done regardless of what parsing or
>code-generation techniques are used. In my MUD, I build a parse tree,
>because that is the form that non-byte-code-compiled code runs from,
>directly. In my Draco compiler for 8080 and later for MC68000, the recursive
>descent parser directly emitted native machine code, which was filtered
>through a pretty good peephole optimizer. Peephole optimizers, and special
>cases like we've been discussing, can make for pretty reasonable generated
>code. Nothing like gcc and commercial compilers which do things like
>register colouring and full program restructuring, of course!
>
Yes, you are correct. What I mean to say is that you can't normally do those
peephole operations in the parsing phase. You have two phases then,
the first generates the tree (recursive descent parser), the second walks it
(code generation). It is interesting though that there are a lot of possible
configurations here.
For example, Eric Harth's Smart compiler does a 4 phase compilation:
1) Lexical Analysis - reads the source and produces a stream of tokens. (scanner)
2) Syntax Analysis - reads the token stream and produces an AST (RDP)
3) Semantic Analysis - reads the AST and does type checking, assigns scope
and builds symbolic tables.
4) Code Generation - reads the AST and symbolic tables and outputs bytecode.
I don't think speed of compilation is really an issue at least in the context of
a mud server's internal programming language. Typically in an LP, Moo, Cold,
Muck, etc. you are compiling rather small pieces of method code or single objects
so you can add as many phases or passes to the compilation as you need.
Of course online compilation should be multitasked and handled just like any
other mud event especially if you have a user programmable setup.
>Either I'm not understanding what you are getting at, or you are
>misinterpreting what I was getting at. I was not referring to evaluation
>at compile time (e.g. my example above of the compile-time versus
>run-time evaluation of the unary negation operator).
Other than me tripping over terminology again, I still don't agree with the
following:
--->Byte-code execution really only makes sense, I believe, for a strongly
typed language. If run-time checks and conversions are needed, they will
likely cost far more time than the basic interpretation of the code, and
so going to byte-code instead of staying with something more direct, could
be a waste of time, in that byte-code won't speed execution up much.
I agree that overhead is higher, I still think it's less than direct interpretation.
And there are other reasons where byte-code makes a lot of sense. It's
a common target for multiple use languages and functional complexity is
split between parsing and execution rather than having a monolithic
interpreter. There are also a whole hosts of structures built and checked
during compilation to bytecode that are avoided altogether in the execution of
that code, like classes symbol tables, object symbol tables, local symbol tables,
method signature checking, etc. Perhaps your statement holds true for a
strictly procedural language, I don't think it holds true for an OO mud language.
>With a non-strongly-typed language (assuming no smart compiler that can
>deduce the types), either the byte-code emitted is considerably more
>complex, or the C code for the byte-code engine is considerably more
>complex. Either way, it runs slower, perhaps significantly slower.
I do weak-typing and use a very simple byte-code. I only use one
byte-code for the add operation. And the byte-code engine is pretty
simple too:
void add_op() throw
{
Var rval;
Var lval;
Var retval;
rval = pop_op();
lval = pop_op();
retval = lval + rval;
push_op(retval);
}
It looks like what one would do with a strongly-typed system. All the
complexity and overhead is hiding under the covers of C++.
Var& Var::operator=(const Var& value);
friend Var operator+(const Var& v1,const Var& v2) throw(TypeError);
You'll find all the switches, if-else chains, and conversion exceptions in
those functions. Yes it's overhead but it is not complex. Maintenance,
debugging and readability are very good. At least I think so.
--
--* Jon A. Lambert - TychoMUD Email: jlsysinc@nospam.ix.netcom.com *--
--* Mud Server Developer's Page <
http://jlsysinc.home.netcom.com> *--
--* "No Free man shall ever be debarred the use of arms." Thomas Jefferson *--