[DECtalk] Question About Formant vs Natural Speech

Don Text_to_Speech at GMX.com
Sun Jun 19 14:53:37 EDT 2022


On 6/19/2022 8:19 AM, joshknnd1982 at gmail.com wrote:
> Yes well just take for example all possible combinations of all 26 lletters
> of the English alphabet or all possible combinations of dictionary words.
> It's a number so large you have to express it scientific notation using
> exponents. Its just an indescribably huge number!

Actually, it isn't.

There are about 40 phones so, worst case, there would be 40*40 (1600) diphones.
But, just because there are 1600 different combinations, doesn't mean all of
them represent sound combinations that occur in the language.

In *a* language!  Different languages have different sound combinations
so English and Spanish can require different size diphone inventories.

[And, some languages add other sounds that we wouldn't think of as
"phonemes" -- like clicks and whistles... the !Kung bushmen being a
perfect example of this]

But, each diphone has to be represented by a sound snippet and
accompanied by data to let the algorithm decide if/how to use it.
That's considerably more overhead than a list of a few dozen parameters
for each phoneme (DECtalk style).


More information about the Dectalk mailing list