[DECtalk] DECTalk At High Speaking Rates

Don Text_to_Speech at GMX.com
Sat Oct 29 22:35:02 EDT 2022


On 10/28/2022 1:20 AM, Brandon Tyson wrote:
> I'm curious how many users on this list use DECTalk at high speeds with a
> screen reader, say, above 460 words per minute. To me, by the point you get
> it to about 470-480 or so it sounds very much like it's mumbling and a
> friend and I described it as imitating how a person tries to speak fast
> (it's very difficult to understand them).

It may be that the "parameter update rate" is getting in the way.

Briefly, there are a few dozen parameters that are adjusted to
form the various sounds (phonemes) and the *transitions* between
them.  The overall "trajectory" of each parameter is coded into
the algorithms.  But, the actual values of the parameters are
only periodically updated; in the original MITalk implementation,
these updates occurred about 200 times per second.  Note the
default speaking rate was about 200 words per minute.

If the parameter updates remain at the 200 per second rate regardless
of the speaking rate, then the synthesizer's model of the vocal tract
may be too sluggish to provide high fidelity to the resulting speech
waveform.

Note, also, that the original synthesizer generated 10,000 samples
per second which were filtered, electronically, to the 5,000 Hz
bandwidth of most (male) speech.

[I run my synthesizer considerably faster as I process speech in
much the same way as I process music -- which needs a sample rate
in the 40,000 range.]

You might suggest to the folks working on the DT sources that they
explore increasing the sample rate and parameter update rates to
see if this has a noticeable impact on the quality of the speech
at higher speaking rates.

This is all conjecture as I've not examined the DECTalk sources but, rather,
am commenting from published literature regarding its ORIGINAL design.

> If I use Eloquence at a high speed it's much easier for me to follow that.
> I'm used to how both synthesizers sound and so I don't think it's due to not
> having used DECTalk enough.

But, the Eloquence voice inherently sounds different.  It could be that you
are more attuned to the audio artifacts in its speech -- even if its
speech is of a lesser (whatever that means) quality.  Your brain likely
fills in a lot of detail that your ears miss.

If you really want to "objectively" test intelligibility, you need to
use material that is unpredictable -- so your brain doesn't fill in
words that your ears missed.

I use a modified rhyme test to really trip up listeners as all of the
words sound very similar -- and none are related to the others conceptually.
    fun, sun, bun, run, nun, gun
"Which of these words -- first, second, third, etc -- holds a hotdog?"
    jaw, thaw, law, raw, saw, paw
"Which of these words is used to cut wood?"

> For those who use DECTalk with a screen reader, do you use it fast, say,
> above 470-480, or do you use it slower?
> 
> And generally speaking, for those who use DECTalk in general, such as with
> the Speak windows, is fast speech simply not a concern?

I can't comment on high speaking rates as my usage is for short messages.
But, wanting high comprehension on a single pass as repeating a message
is a costly exercise for the listener.

> I feel that if DECTalk were to be used widely in the AT space that it could
> really do a better job at high speeds.
> 
> And I'm not saying Eloquence is perfect either--it definitely isn't, but
> it's far more intelligible for me at high speeds than DECTalk.

Flip that comment around.  At *low* speeds, how would you rate its
intelligibility?  How would one of your colleagues NOT accustomed
to listening to it answer that question?  I.e., how much have you
been trained vs. objectively evaluating the products in each
situation?


More information about the Dectalk mailing list