On Fri, 01 Jan 1999 12:39:46 -0800
Greg Connor<gconnor@nekodojo.org> wrote:
> I am cross-posting this to Mud-Dev...
Ditto.
> Anyway, the context is "trying to develop an API for a generic
> database module". Earlier I had suggested working on a generic
> API rather than specific code, so that there could later be
> multiple interchangeable modules that all use the same
> interface... meaning more flexibility for the end user. There may
> eventually come a time when a devmud-derived server offers both a
> "free" database and a "high-end" plug in to a high-performance
> third party DB.
> In this note: - discussion about how to pass values between client
> and db modules (long) - discussion about indexing and
> sort/traverse/search, why you might need it
> Thanks. Though as I thought about it later, I started to toss
> around other ideas... specifically the idea that data should be
> passed in as a large block, like a struct. This is great for dbm
> type databases, but other table/field-oriented db's will need to
> break thing up into individual fields, especially if you want
> indexing (more on that below).
> So, I'm now working on an alternate proposal, that would break
> records up into fields that the caller defines ahead of time. You
> could use the interface for storing a single binary, even the
> in-memory representation of a struct, but in order to properly
> sort and traverse/search the data, the database needs to either:
> learn about the structure of the data to know which bytes to index
> callback to a client function to deconstruct it or, just have the
> client hand in pre-separated fields in the first place :)
> The revised proposal would have groups of functions to add new
> fields to a database, to get and set individual fields, or get/set
> multiple fields at a time (call them "requests" maybe)... this
> implies that we have a way to call a function and hand it multiple
> values, or get back multiple values; meaning that a simple struct
> would not serve the purpose.
I'm really not sure how to answer this one in less than several
thousand words. You are staring at pandora's box.
There are in essence two approaches to data in data bases:
1) Objects are opaque.
2) Objects are compounds or aggregates of known structure.
#1 is well known and obvious. The simple summary is that the
database is utterly dumb and knows nothing about the data it is
storing. It just stores "blobs" and allows you to access those
opaque blobs via some key or index mechanism. dbm and all its
derivitives, as well as all the tony-* clan servers, MOO, COOL,
Cold, etc all derive from this.
#2 is where RDBMS'es, SQL, and the rest of that horde enter. It
says that the data comprising objects (or records) is not only
known, but can be usefully indexed, accessed, or otherwise
manipulated in intelligent fashions.
In the general business case this is quite ell and good. The data
falls into well defined patterns that are known in advance and can
be accomodated in an elegant manner. Unfortunately these
characteristics are not commonly shared with MUDs.
Assuming only object oriented MUD servers (well, ibject inheritance
really). Loosely writing MUDs fall into two categories:
a) Designs which have pre-defined and well known object
heirarchies.
b) Designs which allow end-user defined object heirarchies.
For an RDBMS #a is a simple case. You build tables, one table per
base object type, one column per object attribute or method, one row
per object. As your total set of object types is well known and
defined, your total number of tables is finite, documented, and can
be explicitly programmed against.
Interestingly, Diku and Aber are moderately good examples of #a.
#b is a bitch. This isn't just things like Cool or ColdX which
allow the object heirarchies or inheritance to be edited at runtime
with the results intantly reflected in the world -- its *ANYTHING*
that allows something __other__ that the core server to define the
object heirarchies. (think of the difference in heirachiees and
base design assumptions in the various LP base MUDlibs). The key is
that the server does not define the object heirarchy.
With #b what we have now is genericism across the board. We don't
and can't know in advance what any of our base classes are going to
look like for the final representation. Even if we do mandate a few
starting base classes, what gets built on top of those can be
anything. Ergo our previously neat tables with one column/field per
method are now disjoint as we have no way of predicting how many
methods an arbitrary object is going to have, and what the impact
and significance of those attributes will be.
Classical OO DB's are not designed for this case (see the archives
for some interesting URLs on the area posted by Lambert). The OO
DBMS'es are variations on the pre-canned phenomena, you pre-define
toe heirarchy and then execute from there having mandated that
nothing will ever change.
The result, if you're going to go for the flexible deal is that you
have to work at very high abstraction levels. You don't and can't
know what you are working on, so you have to be able to handle
(mostly) anything, and figure the rest out while you go. You code
has to be self-intelligent and figure out your heirarchy and other
relativistic structures at runtime and then interpret its
significance, possibly with the help of embedded hits in the
user-defined structure.
ie To access you database you need an interpreter which can
figure out what the mess means and how it all really relates and
from three what actually is an object etc etc etc.
Translation: Your DB is opaque and you now have a translation layer
between your DB and your code that tells your code what the opaque
DB records really mean etc. Doesn't this sound familiar? Somehow
we've gone back to the very spot we started with: opaque DB's with
the code, not the DB, interpreting the contents of the
objects/records.
Aaaaargh.
Further, taking this approach objects start devolving into
structural relationships of their components instead of the more
typical _behavioural_ definition of an object. As such an object
becomes a (potentially) nested collection of various data types:
attributes, methods, etc (table for inheritance, table for methods,
table for attributes, etc).
This devolution of objects into over-muscled structures destroys
much of the value of objects (their internal opacity), and adds an
incredible overhead in the number of talbe queries which have to
occur for even simple resolutions. Performance suffers, badly.
I did some early work playing about in this area (see archives for
things I did with SQL). It was not pretty. I'll admot to being
largely SQL ignorant. It still shouldn't have been *that* ugly.
A large part of the problem (for me) is that I explicitly want an
undefined (at the server level) object inheritance heirarchy. The
server natively supports multiple inheritance and allows multiple
roote. Further, the object heirarchy is editable at runtime. None
of these characteristics created the problem, it was there already.
They merely exacerbated it.
I required an OO DBMS which was capable of operating on arbitrary
object heirachies and definitions and thus addind definitional and
relativistic value to the objects it contained. Nobody, and this
encludes the DB research establishments, appears to have done that
yet.
There hope however, if a wan sort of hope.
If you look at the DB services offfered by things like ColdX and co,
in essence they are (very poorly featured) OO DBMS'es which allow
and support arbitrary object heirarchies etc. However what you have
to look at is the ColdX interface, not the underlieing opaque DB
interface. In this view ColdX itself is an OODBMS with an interface
language called ColdC. The opaque RDBVMS underlieing it is merely a
storage technique and really has no relevance to the final interface.
<<If this message is disjoint I apologise. This system PANICed
twice while composing it while various other RL emergencies also
intervened>>
> This leads into the discussion of how to pass values back and
> forth...
There is a more subtle question:
Are MUD-world objects a function of the DB or external to the DB?
Translation:
As far as the DB is concerned, is it responsible for the contents,
correctness and relevence of object contents, or is it responsible
only for the objects themselves?
> Just in terms of how to pass a "block" of data around, some
> alternatives come to mind.
My approach:
The DB is responsible for all objects and their storage. The DB
hands out read-only pointers to copies of objects it maintains in
its cache.
Translation: The DB only hands out references to read-only
objects.
The pointers are further trapped (magic of C++) such that any
attempt to write via those pointers causes the object to be
cloned/copied into a new block and the value of the pointer changed
to point to the new writable object.
Translation: The pointer is really a handle that the DB
interactively interprets at call-time into a memory pointer. The
pointer is wrapped however so that the caller cannot (simply)
devolve a reference into a real memory pointer, but as far as code
flow is concerned the reference behaves in all ways exactly like a
normal memory pointer.
This all of course ties in with my non-locking model. As the DB
owns all allocations relating ot objects, upon thread disconnection
from the DB all related allocations are reaped, handled etc as part
of the normal tear-down procedure.
> - DB uses "request cookies" and read/write/copy semantics Instead
> of just a pointer, the client gets a "cookie" - either a number
> that the DB can track back to a specific request, or a
> pointer-plus-ID-number If item is still loaded in cache,
> read/write/copy can just do memory operations. If item has been
> unloaded/flushed, and cookie doesn't "map to" what's in memory, DB
> either returns an error result, or forces the item to reload (but
> doesn't read or write a buffer that's already freed) If client
> wants to pass stuff by reference, it will have to request it be
> copied to local storage. That is, to enforce this kind of
> "electric fence" you can't just give a pointer to a block; you
> have to make a copy for each client... this is an efficiency
> penalty (usually an extra in-memory copy). "Cookies" could be
> similar to what someone (JC Lawrence?) proposed for object-ID,
> like an a ID/creation time pair
And what are your locking/logical_correctness semantics?
> In order to make the db re-entrant and thread safe, the DB would
> have to make sure that its routines use proper semantics, but if
> this is all implemented within the DB module, at least there is a
> *chance* of making things multi-thread aware :) Whereas if you
> hand a simple pointer to an external routine, you may never be
> able to get complete control of thread issues.
This is one of the bits I like about my approach. Only the kernel
of the server really needs to be aware of the process model for
events and data. Clients, in the form of user-written code need not
know the slightest detail about that, or for that matter about
persistancy. It all happens automatically under the covers, always.
Guaranteed logical correctness and minimum performance.
> But even in this case you probably want an "index" - this time,
> it's an index by "location". Your mud code may or may not
> maintain a constant list of "things that are nearby" - chances are
> it doesn't. Also, maybe it makes sense to just look at the room
> you're in for its "contents" - but ultimately it boils down to
> "which five records, out of thousands, have the same 'location'
> value as my player?"
I go for the performance versus storage scheme there. My data has a
horrible normalisation factor as objects contain lists not only of
what they contain, but what immediately contains them. There are
also spatially bound objects which establish "ranges" about the
locations of other objects (eg everything "close" to Bubba), which
allow for easy processing of that data set (see prior threads on
neighborhoods for a partial reference).
> "Contents" and "location" are reciprocal, but are also
> many-to-one.
Yup. I have somewhere just under a score of different types on
containment.
> You usually don't store the ID's of all the "contents" in the
> database record with the "room" or "container" - usually you will
> optimize this lookup by stashing a First and Next (so you can
> "walk the list" of contents) or by using an index on "location" as
> a key and searching (also so you can "walk the list" of records
> meeting your criteria).
I use small cells so that the lists are (hopefully) short. I also
don't descend containment levels for the lists.
> There are a fair amount of "many to one" relationships
> (parent/child, container/location, owner/property). Some db's I
> have seen use a kind of "linked list" for each relationship, where
> each item knows it's "parent", but the parent only knows one
> "child" and each child has a pointer to the next, and the next,
> etc. This is really optimized for "walking a list" but these
> lists can become inconsistent due to bugs/crashes/etc... if the db
> is inconsistent, you get items located in a room that aren't part
> of its contents list, and other "impossible" conditions.
I have parents list their children in ObjectID order, and children
list their parents in inheritance order (order-sensitive MI). This
allows for most corruptions to be detected. I deliberately did not
use node-stored list structures for this as I did not want the
correctness to the accidental deletion or corruption of a middle
node in the list.
Yes, my objects end up kinda big.
--
J C Lawrence Internet: claw@kanga.nu
(Contractor) Internet: coder@kanga.nu
---------(*) Internet: claw@under.engr.sgi.com
...Honorary Member of Clan McFud -- Teamer's Avenging Monolith...