[MUD-Dev] Architecture

Mon Jun 30 22:00:27 CEST 2003

ceo <ceo at grexengine.com> wrote:

>     1 - How much each stage of processing adds to the mean RTT
>     (round trip time). Is the RTT heading towards taking a long
>     time? (if so, you need to cut stages, or find a way of
>     offering alternative paths through the system, so that the
>     average path length shortens).

1/ If you mean Networking Latency (not only network latency), the
first part will be added by physical connection used, and will be
fixed on some minimal value.

As for the protocol, I guess that UDP as underlaying protocol should
do just fine.

Then there is the size of a packet - packetisation. The packets
should be of course compressed by some zlib, and the bigger the
packets, the better bandwith utilisation, but the bigger latency one
gets.  I guess this has to be tuned on example configuration. (and
with orientation pre-calculation).

Then there will be backend latency of whole system. The
communication between ConnectionManager and CellHandler should run
obviously on low latency, high speed connection; but the networking
part is quite easy today.  even gigabit ethernet should do it. :)

The rest of the latency will depend much on the intra-backend
communication handling, the slowest synchronous operations dictating
the latency. I am afraid that could easily be the database(s). In
our mobile phone business, which is real-time based, the database
reside completely in RAM, and is unserialised from disk based
database on boot time (coldstart).  But given that an PW is until
now not considered to be a real time system (which will surely
change in the future), some kind of hybrid solutuon sould be
employed here - and that can be extremly strong caching. So strong,
that the toll paid for accessing disk will be paid only once per
client session/object. And if pre-fetching will be employed, even
that won't be necessary.

Actualy, that won't be called caching anymore, it will be more like
memory-resident on-demand deserialised database :) But i have not
done any calculations/tests - maybe that won't be necessary?  The CM
element should not add any sensible latency - it just translates
protocols and does some pre-processing, all things pretty
straight-forward.  All operations which could last longer, as for
example the validation of client data, should be done
asynchronously.

>     2 - How it copes with each failure-mode. Bear in mind that any
>     computer anywhere could "fail" at any moment (and also that
>     any internet connected client could appear to fail for an
>     arbitrarily long time without actually doing so, anything up
>     to 2 minutes). So, it becomes critically important where you
>     are storing state in the individual machines - and even more
>     critical if you start talking about load-balancing.

2/ Here am I still unsure how to implement it. There should be some
watchdog logic implemented. but it has to be a distributed function,
because if it will be implemented on just one element, it could fail
too.

One possible solution would be to implement a CellHandler watchdog
into Connection Managers, to monitor the CellHandler servers it has
a client for; and if a it (CH) fails, check if there is no possible
connection from any other connection manager, broadcast a message to
all elements that the server has failed (to avoid beeing out of sync
in case some elements still can comunicate with it). Then eighter a
back up CellHandler server could kick in, or a rebalancing procedure
could take place. When that is finished, the lost clients should be
reinserted into the CellHandlers again, 'somehow' trying to
re-create as much as possible lost information. (how?)

But how would the Connection Managers be watchdoged, if i have only
one connection from client to it?. Any ideas?  If I would have
second, opened and ready "helper connection" to some other CM beside
the main connection, it would kick-in only in case that the main CM
is dead, telling the client what to do. (it would controll the
hand-over to other CM).  There is a need of a LD (LinkDead)
detection and handling, too. That should be easy one. One counter
governing the "last sucessful communication", then a counter
governing disconnect - with end-of-session command signalling to all
involved parts. Also an icmp handling should land here - no "path to
destination" and simmilar icmp messages should be handled on same
level as LD.

>     3 - Protocols (high-level ones, not byte-stuffing
>     stuff)...what's the algorithm (for example) for load
>     balancing?  How does it initiate, what data does it send, does
>     it require the servers to synchronise in advance, or to always
>     be synchronized (so that all the state is shared)? Can it cut
>     in partway through a request, or are outstanding requests
>     lost? The network overhead for a permanently-synchronized
>     load-balancing pair can be massive...

3/ Loadbalancing runs on each level of the architecture. 1st level
is the Connection managers. next level are the CellHandlers. next
level are yet unmentioned subsystems (as AI/NPC).  The Connection
Managers loadsharing can come in two levels. 1st level is sharing
the number of connections, and is actually governed by logon
procedure, controlled where the client actually lands in with his
connection.

On a second level, the CM can consider that it doesn't have enough
resources (CP/Memory/Whatever needed) to handle all the clients
effectively and can try to find a CM which have enough
resources. The whole handover procedure is quite expensive operation
(not only in PW), so it should be checked not to do it too often,
not to bounce etc.. To get completely invisible handover, a complete
copy of a client state would be created on the confirmed target CM,
and the client (and CH too) would be instructed to connect
there. (not dropping the connection to the 1st CM until done
succesfully).

The loadsharing on CH level would be much trickier, as it will
involve change of the cell size and would need to be done by some
crazy recursion to adjust the sizes adjacent cells too.  The
geometry of cells should not change. (it actually could, but i can't
even imagine how to handle that :))

>     4 - Overload: unless you only want to support very low numbers
>     of players (*low for a distributed system*), one of the
>     critical questions is "what happens when the system gets
>     overloaded?". Does it just fall over and die? Can it corrupt
>     data in the process? Does it gracefully slow to a halt, but
>     actually remain running (albeit very, very, very slowly?
>     Obviously, the last of these is the "ideal" case; all (!) you
>     have to do is remove some of the connected clients (recall
>     that this is compared to it *crashing*, so forceably
>     disconnecting clients is not so bad :)), and the server will
>     carry on happily without further intervention.

i want to support unlimited number of clients - if i need more, i
drop in more hardware = more stock servers.  the overload detection
is the 1st thing to solve. as for element overload, this knows each
element by itself and can initiate appropriate handling - say on the
CellHandler level try to rebalance the cellsizes. The trouble is
that is should recognize that any more tryies to rebalance will be
wasted effort. Also some algorithm to avoid "bouncing" or
"rubberbanding" effect would be needed.

When the whole system is overloaded, and it have comed to conclusion
that it cannot automaticaly rebalance itself, an alarm must be
raised in NOC to indicate an operator assistance. They could then
rebalance themself, or could just add some server(s). If they do
nothing, the system will lag as any other pw does :)

>     5 - ...and what's the *intrinsic* overhead of your system, and
>     how does it scale with additional players? If you've got any
>     algorithm anywhere in your system that is worse than
>     log-linear, you've got a problem. If you've got any that are
>     worse than quadratic, you most likely have a very BIG
>     problem. But even linear-cost algorithms can be devastating,
>     if the constant of proportionality is too high (what's the RAM
>     consumption per client? you only have limited RAM, no matter
>     waht you do. Processor time is infinite (it may take a long
>     time to execute, but you'll get there eventually), but RAM
>     isn't).

5/ no idea as of yet. all algorithm should be log-linear, as far as
i know.  there isn't many algorithms in as of yet :)

> Ahem. I've glossed over a few points there, and made some gross
> generalizations, but I hope it will give you a flavour of the
> questions you need to ask (and answer). This is off the top of my
> head ... I've probably ommitted something important, so don't take
> it as a thorough checklist.

Thanx for the discussion - this is exactly what i have been missing
- some stimulli comming from discussion.  Given that i haven't
worked yet that much on the design, there is much to plan. but it's
fun to do :)

Today, i touched in my head first time the working state of the
system, like routing of the main events/main objects etc.

Peter "Pietro" Rossmann
_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev