[MUD-Dev] Shift in time

Mon Oct 4 18:58:16 CEST 2004

On Oct 3, 2004, at 3:16 AM, Arnau Rossell=F3 Castell=F3 wrote:

> Would such a thing be feasible?  How much bandwidth would the
> phoneme stream need(well i could time myself saying something and
> then count... but maybe someone has more elaborate aproximations)?
> How could such a system deal with ambient noise, or tapping the
> mic?  Is there any king of public project(i know about festival
> only), creating these kind of programs? papers maybe?

In academic research, this is usually called "speaker replacement,"
"voice replacement," or "voice transplantation," and it's an area of
very active research.  It's also harder than it seems at first
glance (but so are most really interesting problems).

Here are some paper references to get you started:

  http://www.etro.vub.ac.be/Research/DSSP/publications/loc_conf/SPS-2002-A.pdf

  http://www.busim.ee.boun.edu.tr/~speech/thesis/oytun_turk.pdf

    (this is actually a thesis, so it's large, but it has a good
    overview of various techniques and their advantages and
    disadvantages.)

The hard part is not the data stream--the bandwidth required is
actually not very large.  The hard part is voice modeling.  You need
a good model of the target voice, as well as a fair amount of heavy
duty DSP at the client end to analyze the incoming voice signal and
derive the data to send upstream.

An intermediate approach that might be useful for NPCs is to use a
tailored voice model.  Cepstral has some nice demos of their
synthesis engine, which can provide extremely lifelike TTS results
as long as the range of utterances is restricted (weather reports
being a classic example).

Amanda Walker
_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev