[MUD-Dev] [Tech] MUDs, MORPGs, and Object Persistence

Wed May 16 14:18:11 CEST 2001

>From: Daniel.Harman at barclayscapital.com
>>From: brian price [mailto:brianleeprice at hotmail.com]
>>>From: Daniel.Harman at barclayscapital.com
>>>>From: brian price [mailto:brianleeprice at hotmail.com]

>>>> Checkpoints can be restored simply by starting with the last full
>>>> backup and applying (in order) the saved changes since that
>>>> backup occurred (can be done offline).

>>> I disagree with you on a lot of points here, but I'd start here. I
>>> think transactions are important in a MUD. Its the best way to
>>> prevent duplicates through synchronisation problems (which is how

>> You can only get dupes or item loss in a system that does not
>> capture coherent snapshots of the persistence state during safe
>> periods.  Consider: A hands item to B; both objects A and B (and in
>> some cases the item object itself) are dirtied (persistent state
>> changed).  If you create a checkpoint prior to the exchange, any
>> rollback results in the restoration of the state prior to the
>> exchange.  If you create a checkpoint after the exchange, any
>> rollback results in the restoration of the state after the exchange
>> occurred. You only run into problems when you try creating a
>> checkpoint in the midst of an exchange - simply designing the
>> server system so this condition cannot occur solves the problem.

> Unless I've misunderstood, by creating checkpoints and rollbacks,
> you've just manually implemented transactions. Or do we have a
> different understanding of what they are?

Rollback was probably not the best term to apply in combination with
checkpoints; a better one would be restore.  Checkpoints allow for a
very coarse grain sort of primitive transaction capability and would
typically not be under control of the game server's code.  It is
actually an incremental backup strategy rather than a true db
transaction.  Normally one would only restore from the backup if the
server crashed or the working db had been corrupted in some manner;
either case would be under supervisor control.

>>> You previously said that infrequent reads were required. Thus I
>>> don't see how the performance of an RDBMS is going to impact your
>>> proposed solution.  Writes are generally fairly fast, its the
>>> queries that are slow.

An important point I missed earlier in regard to reads - even though
the frequency of them is postulated to be relatively low, when a read
occurs it should have minimal latency in order to avoid stalls.  The
number of required reads is important because of the seek time
required prior to each read.  Since seek time predominates read time,
the latency for an object fetch is linear to the number of reads
required.  The persistent object store has the minimum average latency
possible while an RDBMS will often have a latency 2x or more that of
the persistent object store.  This is based purely upon the number of
indices and tables that must be accessed for a given object.

>> Writes are going to be slower in an RDBMS than a OODB/Persistent
>> Object Store due to the large number of indices and tables
>> typically required in an RDB model that is used to represent an
>> object model.

> Thats definitely an assumption too. What kind of writes are we
> talking about? If its replacing a record, then finding the location
> in the flat file will be slow unless you have some kind on indexing
> yourself. These sweeping generalisation probably don't serve anyone
> :)

All along, as stated in my original post, I've been speaking of a
fairly specific case.  In regards to this point, let me restate the
general object model specification: the object model will likely
contain a large number of classes and some portions of the model will
use deep inheritance trees, possibly including multiple inheritance.

Representing such a model using a RDBMS would require many tables and
many indices, up to or exceeding N tables and N indices where N is
equal to the number of classes in the object model.

Representing any model with an OODB-like persistant object store can
be done with one index and one table.  In rdb terms this table
consists of two fields; ObjectID, and ObjectData.  The table is
indexed by the ObjectID field.

Updating a record in the persistent object store requires at most 3
read and 3 write operations.  Updating a record in the RDBMS case
would take at least that many in the case of a very simple object, but
typically would require at least 2x to 3x that number of file i/o
operations in order to update all the seperate tables and associated
indices which are used to represent an object of a class which used
inheritance.

>>> By not going for an RDBMS you have made any type of reporting
>>> functionality many times more difficult to implement. If you have
>>> a large game, then I would imagine functionality to measure how
>>> many warriors have weapons of greater than 'x' affectiveness is
>>> something you might want to find out infrequently enough to make
>>> writing a bespoke tool a pain, but frequently enough that having
>>> sql is a feature. The same for economy reports and such like. With
>>> a bespoke object store, any type of data-mining is just hideous.

>> In previous discussions elsewhere, the maintenance issue has been
>> raised time and time again as a reason for choosing RDBMS over
>> OODB/Persistent object store.  There is an alternative for OODB
>> that is in many ways far more powerful: embedded script/interpreted
>> language engines.  For C++ server implementations, Java or Python
>> are natural choices.

> So you are saying that writing your own data store access language
> is better? To implement the functionality and concise expressiveness
> of SQL is non-trivial, and a project probably as complex as the mud
> itself. If you aren't going to give that functionality, then you
> have an inferior solution.  Going back to the flat file system I
> worked on, it did indeed have an advanced suite of programmatic data
> access functionality. Compared to writing a query in SQL, it was
> still a dog.

Not simply a data store access language, this is where RDB minded
folks invariably miss the point... THERE IS ONLY ONE OBJECT MODEL.
The system does not require a special purpose language to access data
in a secondary datastore specific object model.  Since most modern
MUD/MORPG servers will include scripting capability anyhow, you get
reporting functionality for *FREE*.

In any case, embedding Java or Python into a C++ server core is not
that difficult of a task, and you definately are not writing your own
data access language.  You are exposing your server's objects to the
embedded engine so that the classes are available from the script
language.  This is far simpler than what you have described.

>>> Anyway, a well tuned and designed database can be remarkably fast.

>> Performance issues boil down to the number of file i/o operations
>> required per equivalent action.  In the case considered in the
>> original message, the OODB/Persistent object store approach results
>> in far fewer file i/o operations than the equivalent RDBMS solution
>> for the actions required.

> Well we've already realised that we are talking different ideals
> here. When you said you had an ideal method of handling these
> things, I think most of us assumed you were talking about a large
> scalable system. If you are just writing a single PC text mud then
> frankly it probably doesn't matter much how you implement this
> stuff.

As far as scalability, the persistent object store approach may well
be scalable.  The use of OODB is not a bar to scalability and with the
addition of the capability to synchronize checkpoints across multiple
machines, the persistent object store approach would likely scale
quite nicely.  The primary problem to address would be the mechanism
for exchanging objects between machines but there are numerous
possible solutions to that problem which do not pose any bar to the
use of an OODB/persistent object store approach.

> As to the amount of file i/o, thats not something I feel you can
> judge as any decent RDBMS has highly evolved and effective caching
> (assuming you understand how to design a DB). You would probably get
> less disk i/o with an RDBMS.

The amount of file i/o can be easily analyzed for specific cases.
General cases are more difficult, but statistical analysis can yield
very good hints.  As to caching, since the stated conditions in the
original post require that as many objects as possible reside in
memory - excessive db index/row caching is undesirable (at least in
the non-distributed case).

>>> and a persist method to get it to write itself to the db. None of
>>> these are a great deal of work. If you were to go towards Java or
>>> C#, you could make this even more trivial with the object
>>> reflection.

>> It seems you're speaking here of translating to/from the rdb model
>> in each object's persistence interface.  For simple systems this
>> would work fine, but with a large number of classes I'd think both
>> implementation and maintenance would become a task of herculean
>> proportions.

> Thats why I mentioned reflection, it does all the work for you. With
> it, you can iterate through all the properties on an object, and in
> C# objects can even be instructed to convert themselves into XML
> (Java I don't know about...).  Do you not have to programmatically
> define how each object is persisted using your system? (as you would
> using the pattern I described)

While you could manually write the simple load/store methods required
for persistent object stores, it is fairly easy to generate them
programatically using an external preprocessor.  Using a preprocessor,
you need only flag the persistent data members in the class
declaration with a token signifying that they are persistent.

>>>>  I've heard all the arguments against OODBMS over the years and
>>>>  all the arguments for RDBMS, and in this case at least, *none*
>>>>  of them hold any water.

>>> I disagree. I think an RDBMS with a bespoke in-memory cache would
>>> be the optimal solution.

>> The very need for a seperate cache makes such a solution
>> non-optimal in the stated case.

> Surely your system needs a cache too, otherwise both will be disk
> i/o bound for every object access. Of course you get a free cache
> with an RDBMS (but assuming a distributed system its not local,
> which is why I specified a custom local one).

Pardon me, I assumed the 'free' RDBMS cache (being almost totally
superfluous with a persistent object store) would impact available
local server memory.  In the distributed case this may not be a
factor.

Ideally in the postulated system, the entire world would be loaded
into RAM, thus an object cache is not required.  (Alternatively one
might say that it has an object cache containing the entire world.)
However, in a practical sense, especially with a distributed system,
only the locally active part of the world would be in RAM and one
might consider that portion to be in the cache.

>>> What about failover? A proper RDBMS will faciliate this. I get ill
>>> thinking about having to write one of these for some kind of
>>> bespoke flat file object store.

>> Failover is a non-issue because the persistent store is tightly
>> integrated with and local to the MUD/MORPG server.  Even in the
>> distributed case, depending upon system design, it may not be
>> necessary.

> If you are working on the basis that you are using one PC sure. From
> what I understand of commercial MMORPGs, disk failure is fairly
> common due to the load on them, so I'm working from the assumption
> of a raid array anyway.  This doesn't change the fact however that a
> database cluster associated with a raid array isn't more robust and
> if I were to write a MUD, even a small one, I'd design it to be
> scaleable.

Scalability is a non-issue here as well.  Suppose that both checkpoint
synchronization and object migration features are added to a single
machine persistent object store implementation, then simply transmit
the checkpoints to a central store.  Assume, of course, that RAID is
used at all levels and scales.  Now, adding fail-over to the central
checkpoint/db backup store may be worthwhile - but this changes
nothing in the persistent store implementation.

>>> Its interesting, because I have worked on two version of a
>>> large(ish) scale distributed fat-client system, one where we used
>>> sysbase, and another where we did use a bespoke flat file system
>>> with in memory cache for 'performance' reasons. The flat file
>>> system whilst initally fast, was in fact more trouble than it was
>>> worth for the following reasons:

>> I do not believe the application spheres are congruent.  We can
>> compare apples and oranges all day in re RDB/ODB.

> I feel that they are which is why I used the example :)

> In the end, I think the basis of the argument is that both Derek and
> I are talking about large scale systems, and that you probably
> aren't. Having said that even on a smaller system, I think a lot of
> the arguments still hold.

I think the basis of the argument is that both you and Derek assume
that rdbms is a good fit for the problem domain and I do not.  The
further I examine the distributed system case, the more I come to
believe that the system size is irrelevant to this issue.

There is also the common misconception that OODB is only applicable to
smaller systems and does not scale well, however, as illustrated by
the DOORs system used by the US Navy for Requirements Management in
the Aegis Reengineering project, this is definately not the case.

> A lot of people seem to make a lot of assumtions about database
> performance without actually benchmarking, I'm not asserting that
> you have, but I'm just refuting a lot of the arguments these people
> put forward, as most are unfounded.

I agree, in general you really have to analyze and test on a case by
case basis, different database technologies have different strengths
and weaknesses.

Brian Price
-=have compiler, will travel=-
_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev