[MUD-Dev] Shift in time

Thu Oct 7 01:05:44 CEST 2004

Thomas Clive Richards wrote:

> Unfortunately, voice recognition in *any* form is *VERY* hard to
> accomplish. The method you describe is actually very similar to
> the way leading voice recognition apps do it.

Actually, it's easier to write a speech recognition engine than to
produce a MMORPG. But there's no point in writing a SR engine
because there are enough ones that are either free or very
cheap. (Assuming they have the features you need. To do transplanted
prosody I wrote half an engine so that I could identify where
phonemes begin and end, because the Microsoft engine no longer
supports a phoneme timing API, although it used to.)

> Even the best voice recognition packages need extensive training.

This isn't exactly true. A rough guestimate is that doing exensive
training (speaker dependent acoustic model) will halve the error
rate half that of the untrained (speaker independent) model. Thus,
if the model were 90% accurate without training, it would be 95%
accurate with training.

If your language model is sufficiently constrained that users are
already getting 99% accuracy without training, then there's no point
training. For example: If you wanted to add voice commands to a
MMORPG so the player could speak "equip the sword of cathaway" or
"pick up gold" then there wouldn't be any need for training. It's
only when you get to dictation (the user can speak anything) that
training is an issue.

> If you happen to have a thick Slavic, Indian, or Yorkshire accent
> (for example), it can be very hard to get decent accuracy.

People that don't speak standard American will usually get about 2x
the error rate. People with a dialect/accent that's signficantly
different from standard american (such as thick Slavic, etc.)
shouldn't even apply.

It is possible to create a speaker-independent SR model for people
with thick Slavic accents, but you'd need to get about 1000
thick-slavic speakers to record 500-1000 sentences each, costing a
fair amount of money. Indian-english dialects will eventually be
done because the market it large enough. Any population with only a
few million speakers (such as Yorkshire dialect) is out of luck
unless a thousand enthusiasts get together and record their voices
for free, along with producing a Yorkshire pronunciation lexicon.

Futhermore, 10% (or so) of the population are called "goats" by
speech recognition research. Even though they speak standard
American, speech recognition just doesn't like them. They get 2x to
4x the error rate of everyone else.
_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev