Chris Gray wrote:
>[Jon A. Lambert:]
>
>> Hmm, I was thinking in terms of how many instructions my VM will be traversing
>> rather than how the hardware is going to handle those instructions.
>> Now I could implement my VM class as a 3 register machine using the
>> members:
>
>Ah, OK. That makes sense. If your register count is small, that works fine.
>Now reminded, my 8080 emulator on x86 simply used the one-byte opcode as
>a direct index into a 256 entry dispatch table.
>
I'm familiar with the Z-80, though I'm not sure how closely related it was to the
8080. I know the Z-80 had over 400 instructions (if you count the modes), a
much much larger set than the 6502.
In particular, I recall the 'RST n' opcode, which is pretty much the way I
implement calls to built-in functions/library calls in my bytecode. It allows
me to add new built-in functions without recompiling old bytecode, as long
as I don't change 'n' which is the index into a definition table. BTW, this
technique might have been useful in the DevMUD project for dynamically
loading and registering new function libraries (DLLs).
>> >From the bytecode angle the register is an intrinsic part of the opcode, that is
>> it is determined at compilation time. Now push() and pop() operations
>> in the VM are going to be to checking for a stack overflow or underflow, no?
>
>Actually, no. If you don't allow users to directly generate byte-code, and
>you are careful with your compiler, there is never any need to check for
>stack underflow. (Well, my interpreter checks the SP after a 'return'
>instruction to see if it should return from this level of execution.)
>Stack overflow can be handled by a combination of a few checks at run-time
>(e.g. at function entry), and compilation checks. This is the exact code for
>integer addition from my sources:
>
> CASE(bc_add) {
> ULONG_T ul;
>
> ul = *sp++;
> *sp += ul;
> BREAK;
> }
>
>Macros 'CASE' and 'BREAK' allow the code to compile either using the
>gcc dynamic goto stuff, or not. Note: use lots of small scopes and very
>local variables like this, so that the compiler can do maximum re-use of
>the limited X86 register set.
>
Good point. I could probably remove my check for underflow! I'm dubious
about removing the stack overflow check since I allow infinite recursion,
theoretically at least. My VM will timeout and cancel task long before any
ridiculously huge stack size is reached.
I'm not familiar with how the gcc dynamic goto stuff works. I've included
my Borland output near the end of this post. Usually I do include locals
at the very top of routines, for personal readability. The only time I embed
locals in block scope is with the C++ for-loop counters and those that have
a high construction cost (classes). I know that it is not necessarily optimal.
>> I guess this might lead to the question of whether it is better to have more
>> or less opcodes for your VM's instruction set. I will note that the Java VM
>> has quite a boatload of special case opcodes, it even distinguishes between
>> pushing an int and pushing a constant one or zero. So is it better to accomplish
>> something by generating a longer series of simple bytecode operations, or
>> implement a lot of special purpose bytecodes?
>
>I say go for the special-case ones. I have some like 'pshz', 'psh1', etc.
>They are very easy to do and use, and can have a noticeable speedup. I'm
>real tempted to go add 'add1' and 'sub1', as in our earlier discussion,
>to see what effect they have. Should take about 15 minutes.
I tend to agree. I would note that my weak-typeing of variables prevents
my use of specialized bytecodes except in the case of constants. :(
>> I don't think speed of compilation is really an issue at least in the context of
>> a mud server's internal programming language. Typically in an LP, Moo, Cold,
>> Muck, etc. you are compiling rather small pieces of method code or single objects
>> so you can add as many phases or passes to the compilation as you need.
>> Of course online compilation should be multitasked and handled just like any
>> other mud event especially if you have a user programmable setup.
>
>Well, when you have a system that works well, you tend to use it a lot.
>I've gotten into the habit of changing one line of scenario source and then
>recompiling all 20,000 lines, rather than making the change online! Too
>many times I've forgotten to make the change in the master sources.
>
Actually I have been doing this also, but primarily because I'm still making
low-level changes in the way the bytecode is generated that require it.
Much of my test mudlib is really just a massive test script hitting all the
language features.
> My point, however,
>is that if the overheads of non-static typing are significant, then going
>to byte-code could be rather pointless, since it might gain only a few
>percent speedup. With strong typing, there is virtually no run-time
>overhead for type-checking sorts of things, so going from direct
>interpration to byte-code is almost certainly a big win. This is just
>a case of one of the general rules concerning speeding programs up: fix
>the big things first, then the little things.
>
Sure, but there are other considerations beyond a small(?) performance
gain that make it "sensible". Design goals for instance. Like providing a
common target for multiple languages. Implementing and debugging a new
parser is half the cost of a new interpreter, since the execution piece is
reusable. Multi-tasking for another. It's a good deal easier to interrupt a
running VM externally and save its state than an executing Interpreter.
Although I must admit using threads might simplify it.
Or perhaps the idea of safely unwinding the native stack in an interpreter in
order to flag an error and get back to square one just frightens me, and I find
it easier to debug something after it has been be decomposed into the
simplest of functions. For some reason I associate interpreters with
monolithic code.
>> It looks like what one would do with a strongly-typed system. All the
>> complexity and overhead is hiding under the covers of C++.
>>
>> Var& Var::operator=(const Var& value);
>> friend Var operator+(const Var& v1,const Var& v2) throw(TypeError);
>>
>> You'll find all the switches, if-else chains, and conversion exceptions in
>> those functions. Yes it's overhead but it is not complex. Maintenance,
>> debugging and readability are very good. At least I think so.
>
>Any idea how much overhead? Just guessing, comparing my snippet of source
>against what you are showing here, I'd guess a factor of 25 or more. That
>is actually *more* than I had thought.
Ok, I'll pull back the covers. And your guess is pretty accurate. ;)
TMVar operator+(const TMVar& v1,const TMVar& v2) throw(Error)
{
if ((v1.type != v2.type) && (v1.type != LIST)) {
throw (E_TYPE);
}
if ((v1.type == ARRAY) || (v1.type == ERROR) || (v1.type == OBJECT)) {
throw (E_TYPE);
}
TMVar ret(v1.type);
switch (v1.type) {
case NUM:
*((int*)ret.value) = *((int*)v1.value) + *((int*)v2.value);
break;
case LIST:
new(ret.v) TMList(*(TMList*)v1.value);
((TMList*)ret.value)->rSetAdd(v2);
break;
case STRING:
new(ret.v) TMString(*(TMString*)v1.value);
*((TMString*)ret.value) += *((TMString*)v2.value);
break;
}
return ret;
}
And heres the native code. I have register optimizations on, but I turned
off constructor inlines so it could be followed. Pardon the line breaks.
c:\tychomud\tmvar.cpp, 454: TMVar operator+(const TMVar& v1,const TMVar& v2) throw(Error)
004039ED 55 push ebp
004039EE 8BEC mov ebp,esp
004039F0 83C4C0 add esp,-0x40
004039F3 53 push ebx
004039F4 56 push esi
004039F5 8B7510 mov esi,[ebp+0x10]
004039F8 8B5D0C mov ebx,[ebp+0x0c]
004039FB B8FC034100 mov eax,0x004103fc
00403A00 E8BFA30000 call __InitExceptBlock(void *)
c:\tychomud\tmvar.cpp, 459: if ((v1.type != v2.type) && (v1.type != LIST)) {
00403A05 8B5304 mov edx,[ebx+0x04]
00403A08 3B5604 cmp edx,[esi+0x04]
00403A0B 742A jz 0x403A37
00403A0D 837B0403 cmp dword ptr [ebx+0x04],0x03
00403A11 7424 jz 0x403A37
c:\tychomud\tmvar.cpp, 460: throw (E_TYPE);
00403A13 6A00 push 0x00
00403A15 6A00 push 0x00
00403A17 6A00 push 0x00
00403A19 6A00 push 0x00
00403A1B 6A00 push 0x00
00403A1D 6A00 push 0x00
00403A1F C745C401000000 mov [ebp-0x3c],0x00000001
00403A26 8D4DC4 lea ecx,[ebp-0x3c]
00403A29 51 push ecx
00403A2A 6820434000 push 0x00404320
00403A2F E821A90000 call cw3220._ThrowException(void*,void*,void*,void*,unsigned int,unsigned
int,unsigned int,unsigned char*)
00403A34 83C420 add esp,0x20
c:\tychomud\tmvar.cpp, 462: if ((v1.type == ARRAY) || (v1.type == ERROR) || (v1.type == OBJECT)) {
00403A37 837B0404 cmp dword ptr [ebx+0x04],0x04
00403A3B 740C jz 0x403A49
00403A3D 837B0405 cmp dword ptr [ebx+0x04],0x05
00403A41 7406 jz 0x403A49
00403A43 837B0402 cmp dword ptr [ebx+0x04],0x02
00403A47 7524 jnz 0x403A6D
c:\tychomud\tmvar.cpp, 463: throw (E_TYPE);
00403A49 6A00 push 0x00
00403A4B 6A00 push 0x00
00403A4D 6A00 push 0x00
00403A4F 6A00 push 0x00
00403A51 6A00 push 0x00
00403A53 6A00 push 0x00
00403A55 C745C001000000 mov [ebp-0x40],0x00000001
00403A5C 8D45C0 lea eax,[ebp-0x40]
00403A5F 50 push eax
00403A60 6820434000 push 0x00404320
00403A65 E8EBA80000 call cw3220._ThrowException(void*,void*,void*,void*,unsigned int,unsigned
int,unsigned int,unsigned char*)
00403A6A 83C420 add esp,0x20
c:\tychomud\tmvar.cpp, 466: TMVar ret(v1.type);
00403A6D 66C745D80800 mov word ptr [ebp-0x28],0x0008
00403A73 FF7304 push [ebx+0x04]
00403A76 8D55F0 lea edx,[ebp-0x10]
00403A79 52 push edx
00403A7A E841EDFFFF call TMVar::TMVar(Type_spec)
00403A7F 83C408 add esp,0x08
c:\tychomud\tmvar.cpp, 468: switch (v1.type) {
00403A82 8B4B04 mov ecx,[ebx+0x04]
00403A85 83E901 sub ecx,0x01
00403A88 720C jb 0x403A96
00403A8A 746F jz 0x403AFB
00403A8C 83E902 sub ecx,0x02
00403A8F 7413 jz 0x403AA4
00403A91 E9A1000000 jmp 0x403B37
c:\tychomud\tmvar.cpp, 470: *((int*)ret.v) = *((int*)v1.v) + *((int*)v2.v);
00403A96 8B4308 mov eax,[ebx+0x08]
00403A99 034608 add eax,[esi+0x08]
00403A9C 8945F8 mov [ebp-0x08],eax
c:\tychomud\tmvar.cpp, 471: break;
00403A9F E993000000 jmp 0x403B37
c:\tychomud\tmvar.cpp, 476: new(ret.v) TMList((*(TMList*)v1.v));
00403AA4 8D55F8 lea edx,[ebp-0x08]
00403AA7 8955EC mov [ebp-0x14],edx
00403AAA 85D2 test edx,edx
00403AAC 7422 jz 0x403AD0
00403AAE 66C745D82000 mov word ptr [ebp-0x28],0x0020
00403AB4 83C308 add ebx,0x08
00403AB7 53 push ebx
00403AB8 FF75EC push [ebp-0x14]
00403ABB E885090000 call TMList::TMList(const TMList &)
00403AC0 83C408 add esp,0x08
00403AC3 A1EC4B4100 mov eax,[__DestructorCountPtr]
00403AC8 FF08 dec [eax]
00403ACA 66C745D81400 mov word ptr [ebp-0x28],0x0014
c:\tychomud\tmvar.cpp, 477: ((TMList*)ret.v)->rSetAdd(v2);
00403AD0 83C4F0 add esp,-0x10
00403AD3 66C745D83800 mov word ptr [ebp-0x28],0x0038
00403AD9 56 push esi
00403ADA 8D542404 lea edx,[esp+0x04]
00403ADE 52 push edx
00403ADF E8D6F1FFFF call TMVar::TMVar(const TMVar &)
00403AE4 83C408 add esp,0x08
00403AE7 8D4DF8 lea ecx,[ebp-0x08]
00403AEA 51 push ecx
00403AEB 66C745D82C00 mov word ptr [ebp-0x28],0x002c
00403AF1 E8070D0000 call TMList::rSetAdd(TMVar)
00403AF6 83C414 add esp,0x14
c:\tychomud\tmvar.cpp, 478: break;
00403AF9 EB3C jmp 0x403B37
c:\tychomud\tmvar.cpp, 480: new(ret.v) TMString(*((TMString*)v1.v));
00403AFB 8D45F8 lea eax,[ebp-0x08]
00403AFE 8945E8 mov [ebp-0x18],eax
00403B01 85C0 test eax,eax
00403B03 7422 jz 0x403B27
00403B05 66C745D85000 mov word ptr [ebp-0x28],0x0050
00403B0B 83C308 add ebx,0x08
00403B0E 53 push ebx
00403B0F FF75E8 push [ebp-0x18]
00403B12 E87E220000 call TMString::TMString(const TMString &)
00403B17 83C408 add esp,0x08
00403B1A A1EC4B4100 mov eax,[__DestructorCountPtr]
00403B1F FF08 dec [eax]
00403B21 66C745D84400 mov word ptr [ebp-0x28],0x0044
c:\tychomud\tmvar.cpp, 481: *((TMString*)ret.v) += *((TMString*)v2.v);
00403B27 83C608 add esi,0x08
00403B2A 56 push esi
00403B2B 8D45F8 lea eax,[ebp-0x08]
00403B2E 50 push eax
00403B2F E8EC270000 call TMString::operator +=(const TMString &)
00403B34 83C408 add esp,0x08
c:\tychomud\tmvar.cpp, 484: return ret;
00403B37 66C745D85C00 mov word ptr [ebp-0x28],0x005c
00403B3D 8D55F0 lea edx,[ebp-0x10]
00403B40 52 push edx
00403B41 FF7508 push [ebp+0x08]
00403B44 E871F1FFFF call TMVar::TMVar(const TMVar &)
00403B49 83C408 add esp,0x08
00403B4C 8B4508 mov eax,[ebp+0x08]
00403B4F 66C745D86800 mov word ptr [ebp-0x28],0x0068
00403B55 50 push eax
00403B56 6A02 push 0x02
00403B58 8D55F0 lea edx,[ebp-0x10]
00403B5B 52 push edx
00403B5C E896F2FFFF call TMVar::~TMVar()
00403B61 83C408 add esp,0x08
00403B64 58 pop eax
00403B65 66C745D85C00 mov word ptr [ebp-0x28],0x005c
00403B6B FF45E4 inc [ebp-0x1c]
00403B6E 8B55C8 mov edx,[ebp-0x38]
00403B71 64891500000000 mov fs:[0x00000000],edx
c:\tychomud\tmvar.cpp, 485: }
00403B78 5E pop esi
00403B79 5B pop ebx
00403B7A 8BE5 mov esp,ebp
00403B7C 5D pop ebp
00403B7D C3 ret
So to add two integers, there's 50 some odd instructions (gasp!) that are
executed, and that's assuming I have inlined the constructors.
Now if I were strongly typed, I wouldn't be checking for exceptions or
providing a wrapper for variables, so I'd have maybe 10-13 instructions
for my add_op(), assuming a standard function call. So it's about 5 times
slower. Inlining add_op() could would of course reduce that to maybe 3-4
instructions and be 20 times faster! Any thoughts? crummy code? ;_)
>And heaven help you if you end
>up with some temporary copies of variables because of problems with
>reference parameters! I have practically zero experience with C++, so
>the presense of all those '&'s would make me *very* nervous about that.
>
Passing by reference is the same as passing by pointer as far as the
machine code that is generated by the compiler. Of course, as a general rule,
I only use const reference parameters for safety. BTW there is an excellent
shareware tool for BC++, Builder and Delphi called MemProof which
performs many of the same functions as Purify does. It even finds memory
leaks in Windows API functions. I'd still be crashing and burning without it. :)
--
--* Jon A. Lambert - TychoMUD Email: jlsysinc@nospam.ix.netcom.com *--
--* Mud Server Developer's Page <
http://jlsysinc.home.netcom.com> *--
--* "No Free man shall ever be debarred the use of arms." Thomas Jefferson *--