[MUD-Dev] Matching and Maximizing: How players choose between activities

Wed Aug 23 16:23:05 CEST 2000

	As with the previous article,
(http://www.kanga.nu/archives/MUD-Dev-L/2000Q3/msg00364.php)  I'll be
describing some basic findings from experimental psychology and discussing
how they apply to muds.  A couple of things to remember:

a) I'll be using the terms rat/pigeon/subject/participant/player pretty
interchangeably.  All the results I'm discussing have been done with a wide
variety of species, including humans. The intent here is not to dehumanize
the players; In most cases the strategies we're discussing are highly
adaptive.  One of the primary goals of this kind of psychology was to find
general rules of intelligence that run through all learning organisms,
hence its original name of comparative psychology.

b) Don't mistake these generalizations for iron-clad rules.  There is
always room for individual differences, for the effects of the particular
history of an individual, and for just plain chance.  Don't let yourself be
led astray by "Man-who" arguments. ("That's not right, I knew a man who did
x, y, and z.")  There will always be variations, but the generalizations
are still useful tools.  In general, groups tend to fit the equations
better than individuals as mavericks tend to even out in the long run.

-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D=
-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-

Choice

	Experiments involving choice are among the richest areas of study in
experimental psychology.  These experiments can take a wide variety of
forms, all of which offer the subject two or more alternatives which they
have to choose between.  The subject allocates responses, time, or energy
between the options.  Some examples of choice experiments are:

	* A rat has two levers it can press, each of which produces rewards
(reinforcements) on a different schedule.

	* A starling in the wild has three patches of woods in which it can
forage, each of which has a different variety/incidence of food.

	* A goldfish swims in a tank with a variety of sizes of worms to eat.
Large worms provide more nourishment, but require more work to subdue and=
 eat.

	There are essentially two strategies a player can follow when choosing how
to allot their time between activities: Matching and Maximizing.  These are
not always mutually exclusive and players may switch back and forth between
them frequently.

Some vocabulary:

Schedule of reinforcement:  A schedule of reinforcement is a rule or set of
rules under which rewards are provided.  Also sometimes referred to as a
contingency.

Rich vs. lean:  A rich schedule is one which produces more reinforcement or
produces reinforcement more often.  The reverse is the lean schedule.  One
alternative is usually richer than the other.

Concurrent schedules:  Two contingencies offered at the same time, with
subjects able to choose freely between them.

-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D=
-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-

Maximizing

	Maximizing is exactly what it sounds like, choosing in a way that
maximizes their overall rewards.  This doesn't necessarily mean that the
subject is consciously calculating how to get the most possible rewards.  A
lot of simple, stupid rules of thumb can work to maximize reward.  Players
might not notice that the dragons tend to be easier to find at a certain
time of day, they'll just have more fun and thus tend to play during that
time.

	Mostly, subjects maximize in situations where the outcomes are very
regular and predictable, especially where one alternative is clearly always
better than the other.  For example, a rat presented with two levers, one
of which produces food every fifth press and one of which produces food
every tenth press (Fixed Ratio [FR] schedules, as in the last article),
will soon concentrate all his responding on the richer schedule, the FR 5.
There's never a reason to press on the other lever, because in every choice
of which lever to press, a response on the rich lever is always more likely
to produce reward.  This also happens in concurrent VR schedules.

	Once a random element is introduced into the system, subjects tend to veer
away from straight maximizing and match as described below.  It's important
to remember here that the schedule as experienced by the player may be very
different than that programmed by you.  For example, an area might repop a
dragon after thirty players have passed through.  From your perspective,
this is a FR schedule, but because the players are entering at random
intervals, the overall effect is a VI.

	Or consider the case where the dragon repops every 20 minutes.  Because
some portion of the time the dragon will get hunted by other players, the
subjective contingency becomes probabilistic.

	From the general point of view of mud developers, we don't want players to
maximize.  Maximizing suggests they've figured out what's going on and have
a fixed (though probably unconscious) strategy that they're following.
This is a bad thing, because it means players have figured out the game and
are in danger of being bored.  Someone who's bored in your game this week
is someone else's player next week.  We'd much rather they were just a bit
unsure of whether they're getting the most possible rewards, because that
tends to lead to players exploring new alternatives and trying new=
 strategies.

-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D=
-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-

Matching

	One of the most solid laws to come out of behavioral psychology is the
Matching Law, which can be roughly translated as:

	The number of responses on an alternative is proportional to the rate of
reinforcement on that alternative.

Or, in mathematical terms=85

	 Responses on option A     Rewards from option A
	----------------------- =3D -----------------------
	   Total responses             Total rewards

	This rule generally applies in cases where reinforcement is probabilistic
and a function of time, such as concurrent Variable Interval (VI) schedules.=
 =20

Consider the following mudding example:

	Bob can choose to go hunting for dragons in area A or in area B.  In area
A, dragons repop on an average of every 10 minutes.  In area B, dragons
repop on an average of every 20 minutes.  Area A is richer, but the longer
Bob spends there, the more likely it is that a dragon has repoped in area
B.  This prompts Bob to switch back and forth between areas as he tries to
get as many dragons as possible.  The Matching Law says that this will lead
to Bob spending two-thirds of his time in area A and one-third in area B.

	Remember, the key word here is "rate".  If area A produces wyverns that
give you 100 xp about every five minutes and area B produces dragons that
give you 200 xp about every five minutes, the overall rate of xp/minute is
twice as high in area B as area A.

	Effort can also be factored into the equation.  A dragon produces twice as
much xp as a wyvern but is also twice as hard to kill will be considered
roughly equal in terms of richness, producing equal rates of response.

	The Matching Law has been explored pretty exhaustively with experiments
adding in things like chances of being eaten by a predator, etc.  The full
equation has a large number of additional factors.  If there's interest, I
can do a more extensive writeup.

-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D=
-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-

The Ideal Free Distribution: Matching with groups

	Yes, you say, but I don't have one guy trying to decide whether to spend
time in area A or area B; I've got a hundred guys, all of whom are
competing for those dragons.  How do you factor other people into the
equation?

	For this, I'll turn to a related theory called the "Ideal Free
Distribution" or IFD.  Essentially, a group will allocate members in the
same proportion as an individual allocates their responses.  If one area
produces dragons twice as often, it will tend to get twice as many players.

	Consider this from the player's point of view.  If everyone is in area A,
then the first guy who switches to hunting in area B will get all the
dragons there.  As more people switch over, the rewards for area B
gradually get sliced into thinner and thinner portions.  Simultaneously, as
the player population in area A goes down, each player there gets a larger
portion of the hunting.  Eventually, things settle out such that the
average portion in both areas is the same.  As long as there's an
inequality, players will keep switching selfishly, ironically producing an
overall equal distribution.

In other words, the stable distribution for the system is:

           Rewards in area A                 Rewards in area B
     -----------------------------  =3D  -----------------------------
      Number of players in area A       Number of players in area B

	This sounds simplistic, I know, but I encourage you to go try this in
reality.  Go find a pond with thirty or so ducks in it and station two
people along the shore.  One throws out a handful of food every thirty
seconds, the other throws one every minute.  Space them 5 yards apart and
you'll quickly see the population of ducks sort themselves out accordingly.

	This also works when we consider prey types.  Imagine you have wolves and
rabbits living in the same forest, with players choosing to hunt one or the
other.  The proportion of players hunting rabbits and the proportion
hunting wolves at any given time will tend to sort itself out along similar
lines.

	One thing to note, players will always be switching back and forth between
the options as they perceive temporary inequities due to chance ("I know
there are more dragons here, but the last five got taken by someone else so
I'll try area B").  What stabilizes is the overall proportion of players
exploiting a given alternative, not the specific choice of an individual.
Our previously mentioned hero Bob may stay with area A all the time or
switch back and forth every kill.  Overall, the choices of the population
as a whole will average out to reflect the IFD.

-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D=
-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-

Switching between alternatives and Overmatching

	Of course, the real world often deviates from these nice mathematical
functions.  In particular, subjects tend to "overmatch", spending more time
on the rich alternative than predicted by these equations.  However, even
the degree of deviation from the rule can usually be predicted.  Subjects
tend to overmatch more as switching between alternatives becomes more
expensive.  If area A is close to area B, players allocate their time
pretty much as the=20
Matching Law says.  As the travel time between the areas grows, players
will tend to spend more time on the richer area and correspondingly less
time on the leaner area.  The amount of time require to switch between
alternatives is called the "change-over delay" or COD.

	As the COD increases, so does the overmatching.  If it takes a month's
worth of work to switch from hunting in area A to area B, players just
won't bother to switch that often, concentrating on the richer option.  In
general, this rule makes me worried about the recent trend towards
realistic traveltimes in fantasy worlds.  Discouraging exploration tends to
lead to players sticking to the areas they know, speeding the day when they
get bored with your game.

Of course, there's also a problem with reducing the COD.  If there is no
changeover time, subjects in experiments don't match.  They just bop back
and forth between alternatives as fast as possible, irrespective of the
relative rewards.  In mud terms, imagine if Bob could teleport from area A
to area B, instantly checking each one for dragons.  The way to get the
most dragons in a given amount of time is to jump back and forth,
minimizing the time between the repop and the kill.  This is also obviously
a bad thing.

	There's a happy medium between the extremes, an ideal level of effort
required to switch between the choices.  Unfortunately, that's going to be
a function of your particular game mechanics and isn't easily derived from
a general rule.  The costs of switching might not just be time.  A player
changing from hunting wolves to hunting rabbits might have to change
equipment, change trigger settings, etc.  "Changeover delay" is just a
general term, encompassing a wide variety of possible barriers between
alternatives. You may even decide that you want players to overmatch a
little.  As with any general law, there's a translation that has to occur
before it can be applied to a specific case.

-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D=
-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-

	Well, I hope you've enjoyed this installment of my rants on experimental
psychology and mud design.  Sorry about the delay between articles, I've
moved about 3000 miles in the meantime.  If you found this useful, please
drop me an email mentioning it and any other topics you'd like to hear me
babble about.  Again, here's a list of some possible topics:

	Addiction, a how to guide.
	Flocks, herds, and dragon-slaying posses:  when do players gang up
defensively and offensively.
	Conditioned reinforcers, how to get players to do things for free.

_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
http://www.kanga.nu/lists/listinfo/mud-dev