[DECtalk] DECTalk At High Speaking Rates

Mon Oct 31 16:53:18 EDT 2022

On 10/31/2022 4:17 AM, Brandon Tyson wrote:
>> You might suggest to the folks working on the DT sources that they explore
>> increasing the sample rate and parameter update rates to see if this has a
>> noticeable impact on the quality of the speech at higher speaking rates.

> I'd be happy to reach out about this. This seems like it could be a good
> starting point.

I don't know what the results would sound like.  My design follows
Klatt's intent but with a more modern implementation -- computers
are thousands of times faster, today, than in the 1980's.

[Modified rhyme test]

> Running Eloquence fast I was able to figure these, I'll have to try it with
> DECTalk.

It's not a question of whether or not you can pick the correct word.
Rather, it's how little (or much) effort is required to do so.
Sighted, I can pick the correct word almost instantly.

[Low speed performance comparison]

> At low speaking rates both DECTalk and Eloquence seem to be intelligible. Is
> there something I'm missing with this one?

The point I'm making is to the relative intelligibility of each when
operating in a "less strained" condition.

I suspect no one would have a problem understanding Siri or Alexa at
nominal speaking rates.  OTOH, a Votrax is difficult for anyone without
prior listening experience -- regardless of speaking rate.

It's relatively easy for us to "tolerate" a synthesizer's idiosyncrasies
when we give our brain's time to "process" what it just heard.  But, as
that processing time ("slack") diminishes, the brain has to work harder
to get the correct interpretation because that will act as the basis for
further analysis -- context.

> Does this answer your questions?
> 
> I'd be happy to clarify anything else that might come up too.

I'm just offering possible reasons why one synthesizer may be
more or less intelligible than another.

My thesis adviser (back in school) had developed a technology to
speed up playback of audio tape WITHOUT the "Mickey Mouse"
syndrome -- where the pitch is altered as the speed of the
tape is increased.

This was done by removing portions of the speech so the speedup
had less effect on the pitch of the final result.

There may be an analogy that could be exploited, here.  Instead
of just shortening the duration of each phoneme and transition,
it may be more effective to alter the "shape" of each as the
rate is increased.

Dunno.  Again, my interest has been in making a "low resource"
synthesizer that is readily understandable -- by folks who
aren't accustomed to interacting with one.