[MUD-Dev] Re: TECH: Distributed Muds

Tue Apr 24 22:18:28 CEST 2001

On Wed, 25 Apr 2001 00:05:02 -0400 (EDT) 
Nick Walker <nick at geekspace.com> wrote:

> The way that the TinyMUD based servers do this (they have a global
> object list) is that they have a global 'struct object *db'.  This
> struct is allocated to the size of the 'struct object' times the
> number of objects in the db.  One accesses an object with
> 'db[1234]', for example, to access object #1234.  

Translation: A constant time lookup that depends on the OS virtual
memory system to appropriately/suitably handle cacheing of the
process working set.  Its a good approach for where the working set
size (in terms of system pages) is comparitively small.  Once your
working set size grows to a significant number of VM pages, or
pessimally, exceeds the available pool, well, life is no longer Good
as you're now stuck in swap/paging hell.

This is one of the reasons that inserting a small caching layer atop
your object storage layer can be so beneficial.  It can provide
reasonable scaling/performance degredation as the working set size
(depending on your time definition) meets and exceeds the available
VM pool.  cf How Linux handles the filesystem cache.

>> The only time I would think that a pure linear traversal would be
>> necessary is if you had to touch all items, e.g. for storage to
>> disk.

> Brian is correct, in the case of Tiny derivatives -- setting aside
> the matter of searches.

There are a couple interesting variables:

  Rate of object creation/deletion.
  Indexing/access methods needed for objects (just ObjectID or are
    other methods needed as well?)

Balanced tree structures are interesting as long as the
creation/deletion rate is not too high (or the tree is small) as
tree rebalancing starts to become a significant overhead.  Hashes,
hash trees (top level hash and then a second level hash for when
buckets get too big), or even simple skip lists (think an
(un)balanced tree structure where each node has between 1 and N
branches, where N may be fairly large) can be interesting if your
insert/remove rate is higher.  They get more interesting if you
start having multiple access methods as often you can mux the
various index structures to save space and keep the d-cache happier.

  Just did a little test case for route load balancing for finding
  the lowest latency server to a client (think Akami-type stuff) and
  muxing the indexes in precisely this way, and thus saving d-cache
  flushes more than doubled system performance.  Quite scary.

For Murkle I have two indexes; ObjectID and inheritance tree
(parentage).  

I use a fairly simple LRU cache for ObjectID index to the objects in
the working set with the internal structure of the cache being a
multi-layer skip list setup (I think its three layer, with N<50)
where inserts and delete do a single pass up the tree adding or
deleting nodes in the skip lists to keep the glom groups under N.
Its a crude balancing algorithm, but it works surprisingly well over
time (the tree is never perfectly balanced, but its also never
pessimal).

As the objects have their own internal parent list, and parents have
their own direct descendant list, the rest is just more cross
pointers to maintain at cache insert/flush time.  To handle the
N-level parental problem (child of child and parent of parent), I
opportunistically maintain a set of N-th level inheritance lists (is
if I build one I keep it, but I don't build one unless I need it,
and I delete rather than maintain them).

--
J C Lawrence                                       claw at kanga.nu
---------(*)                          http://www.kanga.nu/~claw/
--=| A man is as sane as he is dangerous to his environment |=--
_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev