[MUD-Dev] Complexities of MMOG Servers WAS Retention Without Addiction

Amanda Walker amanda at alfar.com
Fri Dec 13 18:05:17 CET 2002


On 12/13/02 7:09 AM, adam <adam at grexengine.com> wrote:
> on Thu, Dec 12, 2002 at 01:50:36AM -0500, Amanda Walker wrote:

>> You also have to make sure that you don't hold on to locks (or,
>> if possible, use locks at all), don't rely on a single global
>> state being consistent, watch out for priority inversion, deal
>> with redundancy and failover if a machine crashes or has a
>> hardware
 
> Hmm..admittedly you end up describing some of the easiest
> problems, as opposed to the really hard ones. Those things you
> highlight are mainly difficult in the same way that your first
> sorting algorithm is difficult: they require you to take on board
> new knowledge, but after that they are, frankly, striaghtforward.

Indeed.  And they are the first things that even "experienced"
programmers run into when they start to deal with concurrency and
redundancy.  They are also examples of where general purpose
toolkits tend to start leaving you high and dry.

> [priority inversion] is a really well documented problem! Unless
> I'm misunderstanding what you're referring to...

Nope, that's it.  Affects everything from game engines to Mars
landers.  Concurrency in general has been widely studied, common
issues identified and solved, in both academia and industry.  That's
part of my point.

My point here was that "experienced" is not a single-valued metric.
I maintain that an experienced programmer or programming team can
indeed be out of their league when it comes to building a large,
scalable piece of network infrastructure, which is in effect what a
commercial-scale MMORPG service is.

Experience is not universally applicable.  I, for example, am not a
VB/Database jockey.  If you need to mine stuff out of a legacy
database and present it in a nice Windows application, I'm the wrong
person to hire.  I know VB/Database jockeys, and they do some great
work I couldn't touch without years of study and practice.  On the
other hand, ask them to develop a streaming media network with
predictable latency, low MTBF and fast MTTR, and the shoe is on the
other foot.

>> There's a reason that everyone's big MMORPG servers roll over and
>> die the first few times they throw thousands of clients at them
>> simultaneously.
 
> ...but you are mainly glossing over what these problems might be,
> with statements like "There's a reason that ...(but I'm not going
> to tell you what it is, possibly because I don't understand
> myself)...".

It is very clear from patch notes issued by MMORPG developers in
both beta and production that concurrency and scaling problems are
some of the big stumbling blocks to providing a pleasant gameplay
experience.  That's why they do repeated load tests, and why lag is
a persistent problem whenever they first open the floodgates to
players.  It's also why the immediate player calls to "buy more
servers!" is seldom the actual solution.

What the specific reasons are probably vary all across the map.  A
lock on a data structure that creates an unintended serialization
bottleneck.  Priority inversion starving some essential process of
CPU time.  A 3rd party library that turns out not to be thread-safe
after all.  Relying on STP for network failover without realizing
that the link timeout causes other tasks to fail.  All of these
these problems are emergent misbehavior that only appears under load
or in the presence of failure.  This general class of problem is not
one two which people are exposed to in many parts of the industry,
and they are not problems that "just sprinkle in a few more lines of
code" can solve.  You have to design your system from the start to
be scalable.

Scalability is not a new problem, but it's a common failure mode.

> The basis for a very good point; but you neglect to give an
> example of why 50ms might be difficult to meet. Today's commodity
> CPUs are 2Ghz+.  Therefore, in 50ms they can execute 100 million
> instructions. That's an awful lot of work for an awful lot of
> players, before you even get close to having spent 50ms on it!

Sure, right up until you have to do a database query, wait for a
disk to seek to the right place, a packet from your accounting
server to arrive, a lock on a data structure to be released...  If
your code isn't designed to scale, you'll spend millions of
instructions waiting.

> The few single-system MMOGs that I've seen the source code for
> (and seen them running under heavy load) don't seem to have a
> problem with lag at all, but YMMV.

I wouldn't call a single-system server an MMOG, just an MOG.  And
yes, single-system servers don't have many of these problems.
Quake3 and UT run without noticeable lag if you have a reasonably
fast network path to the server.
 
> We'll be publishing papers on some aspects of our technology next
> year with the first two public releases of products. (I'll of
> course post them to MUD-DEV too at the time :).

Sounds cool.  Scalability and network architecture are two of my
soapboxes ;-).

Amanda Walker


_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev



More information about the mud-dev-archive mailing list