[DECtalk] DECtalk TTS licensing

Don Text_to_Speech at GMX.com
Tue Aug 31 20:34:45 EDT 2021


On 8/31/2021 4:55 PM, Jayson Smith wrote:
> Hi,
>
> This is one issue I sometimes have trouble with, the fact that the best speech
> synthesizer in the world can't know what an author was thinking or his/her mood
> when he/she wrote a message.

"On line" (in non-print media), we have adopted emoticons to help with
this.  But, we place the emoticon AFTER the content!  A smarter move
might be to place the emoticon AHEAD of the content as a "flag"
indicating how THE FOLLOWING should be interpreted.

It is conceivable that a synthesizer could "look ahead" to see if
such a flag is present at the end of the sentence.  Many already
do that to some extent to identify questions or exclamations.

But, this assumes the input text is backed up waiting for the
synthesizer.  If, OTOH, the synthesizer was keeping pace with
the input text and there was a desire for low latency in generating
the output, this would be problematic.

Imagine getting a block of text that ends with:  "3... 2... 1... go!"
Further, imagine it takes several seconds to speak that block of
text.  So, *you* hear "go!" long after it was intended!

But, this brings us back to the original problem that I mentioned:
the flaw lies in the *application*, not the synthesizer.  In this
case, the "application" would be who -- or what -- was expecting
"go!" to coincide with a particular moment in time.

The application would have to be modified to compensate for this
latency.  It would, effectively, have to say, "Let me know when the
output has been completely processed/read... THEN I will say go!"

> Perhaps the best example I can come up with on the
> spot is if I were to ask someone what they were working on, and they responded,
> "I could tell you, but then I'd have to kill you." Based on how a speech
> synthesizer reads that, you have no choice but to assume the person is really
> working on something extremely secretive. However, I might happen to know that
> they're joking around with me.

It's worse than that.

You are assuming that they write correct grammar, don't misspell, have coherent
thought process, etc.

I have friends whose emails are complete puzzles to me.
It's as if they can't keep a single thought in their head from
the start of the sentence to the end.  I jokingly refer
to these folks as "Oh, look!  There's a butterfly!" as
it seems to capture their attention spans.

If you are reading a novel, sentences are likely well formed,
hopefully proofread AND relate, in some way, to the content
of the sentences preceding and following.

But, how does a synthesizer handle those "sequences of
characters and words and punctuation" that don't strictly
follow these rules?

I, for example, tend to get keystrokes out of order...
as if the signals from my brain arrive at my right hand
at a different time than my left.  So, I'll frequently
type "teh" instead of "the" -- just because my left hand
could transition from the 't' to the 'e' quicker than
my right hand could squeeze in the 'h'.

(sight) *reading* this, I can easily recognize what has happened.
But, if a machine is tasked with converting "teh" into a series
of sounds, it's likely not going to be suggestive of "the".

Here's an email from a friend:

     Going to dentist at 1030 a.m. to have new cap installed.
     Kathies birthday was yesterday, she is sleeping as we speak,
     rather as I type. Went by your home, weeds are gone.looks great.
     Take care..

My English teachers would have a field day with that mess!
I didn't realize there was more than one Kathy!  Forget
the idea of "subject verb object".  And capitalization??

Of course, once you see the actual text, it makes sense.
But, how would a length message from him fare?



More information about the Dectalk mailing list