[DECtalk] DECTalk At High Speaking Rates

Mon Oct 31 17:51:32 EDT 2022

Hi,

These are really strong points in my opinion and it's definitely gotten me to think about this in a different way.

I'll reach out and see what those working on the DECTalk sources have to say in terms of your original idea of altering the parameter update rates.

Thanks again for your input,

Brandon

Sent from my iPhone

> On Oct 31, 2022, at 4:53 PM, Don <Text_to_Speech at gmx.com> wrote:
> 
> On 10/31/2022 4:17 AM, Brandon Tyson wrote:
>>> You might suggest to the folks working on the DT sources that they explore
>>> increasing the sample rate and parameter update rates to see if this has a
>>> noticeable impact on the quality of the speech at higher speaking rates.
> 
>> I'd be happy to reach out about this. This seems like it could be a good
>> starting point.
> 
> I don't know what the results would sound like.  My design follows
> Klatt's intent but with a more modern implementation -- computers
> are thousands of times faster, today, than in the 1980's.
> 
> [Modified rhyme test]
> 
>> Running Eloquence fast I was able to figure these, I'll have to try it with
>> DECTalk.
> 
> It's not a question of whether or not you can pick the correct word.
> Rather, it's how little (or much) effort is required to do so.
> Sighted, I can pick the correct word almost instantly.
> 
> [Low speed performance comparison]
> 
>> At low speaking rates both DECTalk and Eloquence seem to be intelligible. Is
>> there something I'm missing with this one?
> 
> The point I'm making is to the relative intelligibility of each when
> operating in a "less strained" condition.
> 
> I suspect no one would have a problem understanding Siri or Alexa at
> nominal speaking rates.  OTOH, a Votrax is difficult for anyone without
> prior listening experience -- regardless of speaking rate.
> 
> It's relatively easy for us to "tolerate" a synthesizer's idiosyncrasies
> when we give our brain's time to "process" what it just heard.  But, as
> that processing time ("slack") diminishes, the brain has to work harder
> to get the correct interpretation because that will act as the basis for
> further analysis -- context.
> 
>> Does this answer your questions?
>> I'd be happy to clarify anything else that might come up too.
> 
> I'm just offering possible reasons why one synthesizer may be
> more or less intelligible than another.
> 
> My thesis adviser (back in school) had developed a technology to
> speed up playback of audio tape WITHOUT the "Mickey Mouse"
> syndrome -- where the pitch is altered as the speed of the
> tape is increased.
> 
> This was done by removing portions of the speech so the speedup
> had less effect on the pitch of the final result.
> 
> There may be an analogy that could be exploited, here.  Instead
> of just shortening the duration of each phoneme and transition,
> it may be more effective to alter the "shape" of each as the
> rate is increased.
> 
> Dunno.  Again, my interest has been in making a "low resource"
> synthesizer that is readily understandable -- by folks who
> aren't accustomed to interacting with one.
> _______________________________________________
> Dectalk mailing list
> Dectalk at bluegrasspals.com
> https://bluegrasspals.com/mailman/listinfo/dectalk