[DECtalk] an article about speech synthesis methods

Mon Sep 13 00:17:12 EDT 2021

On 9/12/2021 7:58 AM, Blake Roberts via Dectalk wrote:
> While browsing the most recent Top Tech Tidbits newsletter, I came across this
> blog post which explains different methods for creating speech synthesis.
>
> https://tink.uk/notes-on-synthetic-speech/
>
> The author claims, at the time this message is written, that parametric
> synthesis is not available for screen readers. I wrote to her already about the
> parametric synthesis solution RHVoice. Some of you might find the article
> explaining various TTS methods of interest. I definitely did!

If the author's stated goal is a "fast speaking voice" that is also
"intelligible", that can be accomplished with a diphone synthesizer
(gives the enhanced intelligibility over a formant-based design)
using a speaker (as in "person who can speak") who has the ability
to speak very quickly as the "model" for the diphone inventory.

[There are folks who can speak at 600 wpm]

You can also update the clock frequency of the waveform output
to emit bits of the waveform more quickly.  This has the unfortunate
side effect of also altering pitch -- like playing a 33RPM phonograph
record at 45RPM.

Alternatively, you can selectively extract redundant portions of the
speech waveform (created from a "normal" speaker) to gain increases
in throughput without the "mickey mouse" effect of arbitrarily
speeding up the timebase.  Speech is highly redundant so one can
chop pieces out and not lose information.

[My thesis advisor marketed a device that does this way back in the 70's]

These aren't things that an end user is likely going to be able to do,
though.