[MUD-Dev] [TECH] Voice in MO* - Phoneme Decomposition and Reconstruction

Eli Stevens listsub at wickedgrey.com
Fri May 24 13:35:40 CEST 2002


From: "Ted L. Chen" <tedlchen at yahoo.com>

> Ugh, I hate replying to my own posts but I guess I should just
> stick a foot in my mouth :) In my own defense, "phoneme" didn't
> pop up any good hits on the archive search engine so I thought
> this topic wasn't discussed before.  But going backwards through
> the archives manually (good reading) I finally got to the short
> sub-thread:

>   http://www.kanga.nu/archives/MUD-Dev-L/2001Q2/msg01688.php

> Sorry for the noise, but hopefully at least the Comp.speech FAQ
> and English to Phoneme Translation resources might be of some
> interest to people on that older thread.

Heh heh, noise.  :P  I believe that your use of "phoneme" was the first
time I have encountered the term.  Learn something new every week.  ;)

I have been working on the system I mentioned in the above post for
the past week or so (I graduate and all of a sudden, I have all this
free time...  weird ;).  I have been looking at the input that one
gets when recording from a PCM wave source (just the typical
microphone on a typical PC), and have been trying to come up with
ways to generalize it (i.e.  stick it in a self organizing map - see
the above message for links).

This is (roughly) what I see (apologies if you cannot view this in
fixed-width):


        ***
      **   *
    **      ****          **
   *            **       *  *
  *----------------*----*----*-------------*  Repeats...
                    ** *      *           *
                      *        **        *
                                 *      *
                                  **  **
                                    **

>From what little I know about sound, this seems like a typical
sample for human speech (granted, this one is just something I made
up, but it looks about right).  Slicing the wave up into separate
crests and troughs is pretty simple (just watch for where the signal
crosses over zero and append ones that are too short to the previous
one).  You end up with a series of partial waves like:
  
        ***
      **   *
    **      ****
   *            **
  *----------------*


  *----*
   ** *
     *


    **
   *  *
  *----*


  *-------------*
   *           *
    **        *
      *      *
       **  **
         **


        ***
      **   *
    **      ****
   *            **
  *----------------* (first one again, and so on)


What I would like to know is if there is a good way (i.e. not brute
force) to figure out when a wave has finished and is now repeating.
In a perfect world, you could compare the 1st to the 2nd, 3rd, 4th
etc. until a match was found, then increment to the 2nd and the
match + 1, compare, and they too would match up.  However, in the
real world I suspect that the waveforms would get out of sync fairly
quickly which would lead to the expensive iterative comparisons to
re-sync the pattern.  It seems there should be a Better Way.
Actually, now that I think about it, I don't care if the wave is
split or not, it was just the direction I was heading...  Perhaps a
small window could be slid over the wave and compared to a window at
the start of the wave?  When they match, the wave has most likely
repeated?

I suspect that someone has already solved this problem (it seems
simple enough to be fairly common).  However, I have had a hard time
coming up with decent search terms.  :/ Anyone?  Anyone?  :)

Thanks,
Eli

--
Give a man some mud, and he plays for a day.
Teach a man to mud, and he plays for a lifetime.


_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev



More information about the mud-dev-archive mailing list