[MUD-Dev] [TECH] Server Bottlenecks

J C Lawrence claw at kanga.nu
Mon Sep 1 13:58:53 CEST 2003


On Mon, 1 Sep 2003 10:41:25 +0100 
Jim Purbrick <Jpurbrick at climax.co.uk> wrote:

> Where are the normal bottlenecks in server scalability? 

That's a pretty tough type of thing to answer without knowing a lot more
about your server.  Things like: Expected connection rate, expected base
level of connections, IO rate per connection, average message sizes,
TCP/UDP balance and average latency to remote nodes, average and peak
message rates, processing overhead for each message type broken down by
expected average and peak rates for that message type, normal working
set size, average and peak rates of working set churn, base
computational load, synchronisation scheme and implications on client
communications and veracity checking, etc.

It can be as trivial (tho unlikely in your case) as the size of the
buffers in the network cards.  Back when, with a non-game server, we
added almost another 35% throughput just by moving to different NICs.

> I've seen a lot of posts suggesting that thread-per-connection
> implementations limit scalability due to context switching. 

It can, but it doesn't have to as it depends on the threading model of
the base OS (N:N, N:M, or N:1).  Given an N:M threading model it also
depends on how (and if) you cluster the threads into groups and what
threads are grouped with other specific threads (thread group turning
can be an effective way of lessening lock contention by providing a base
scheduling beat).

> Are there any tools for Windows/Linux for diagnosing context switching
> problems?

Use a kernel profiler like OProfile.  You may also like to investigate
using the new NLTM supports in recent Linux kernels and libC versions
(adds an N:M threading model rather than N:N).

> I've <cough>inherited</cough> a multiple thread per connection network
> library and will need to prove that its the limiting factor before I
> get to change it. 

Do note that it might not be THE limiting factor.  Ignoring the rest of
the system (there could be other more binding limiting factors), it
really depends on how many threads are involved, what they are doing
(runnable states), host OS behaviour, process scheduler choice (eg older
kernel, newer kernel, any of the third-party scheduler patches), and
various other sundry points.  OProfile with some work should show you
what's going on here.

> Where else are server bottlenecks found and which tools are useful for
> finding them? As the context switching issue shows, straight profiling
> isn't always enough.

I also tend to do what I call lock time graphs, which are simple records
of how much time is spent waiting for the various types and locations of
locks.  (There's probably a more standard name for this)

--
J C Lawrence                
---------(*)                Satan, oscillate my metallic sonatas. 
claw at kanga.nu               He lived as a devil, eh?		  
http://www.kanga.nu/~claw/  Evil is a name of a foeman, as I live.

_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev



More information about the mud-dev-archive mailing list