[DECtalk] Question About Formant vs Natural Speech
Don
Text_to_Speech at GMX.com
Sun Jun 19 14:53:37 EDT 2022
On 6/19/2022 8:19 AM, joshknnd1982 at gmail.com wrote:
> Yes well just take for example all possible combinations of all 26 lletters
> of the English alphabet or all possible combinations of dictionary words.
> It's a number so large you have to express it scientific notation using
> exponents. Its just an indescribably huge number!
Actually, it isn't.
There are about 40 phones so, worst case, there would be 40*40 (1600) diphones.
But, just because there are 1600 different combinations, doesn't mean all of
them represent sound combinations that occur in the language.
In *a* language! Different languages have different sound combinations
so English and Spanish can require different size diphone inventories.
[And, some languages add other sounds that we wouldn't think of as
"phonemes" -- like clicks and whistles... the !Kung bushmen being a
perfect example of this]
But, each diphone has to be represented by a sound snippet and
accompanied by data to let the algorithm decide if/how to use it.
That's considerably more overhead than a list of a few dozen parameters
for each phoneme (DECtalk style).
More information about the Dectalk
mailing list