[DECtalk] Intelligibility/Listenability criteria

Jayson Smith jaybird at bluegrasspals.com
Tue Jul 23 13:47:08 EDT 2019


Hi,

You bring up many interesting points and problems.

One thought that just came to my mind is Stephen Hawking. For many years 
he used a speech synthesizer called the CallText 5010 by Speech Plus. 
But unlike most members on this list, this device was his voice, period. 
This obviously means people who were not accustomed to speech synthesis 
were exposed to his synthetic voice. This is especially true in the 80's 
and 90's, when synthetic speech wasn't nearly as mainstream as it is now 
E.G. no Alexa/Google/Siri, no or very few synthetic voiced robocalls 
reminding you of doctor appointments, etc. His voice sounded similar to 
DECtalk, so the two were often confused, to the point that the DECtalk 
article on Wikipedia claims that Hawking used a form of DECtalk. The 
interesting thing about this particular synthesizer is that an emulation 
of it was completed, to be used by Hawking, a few months before his 
death. This brings this classic voice into the modern world, but 
unfortunately I assume it's likely that none of us will ever get a 
chance to play with it. According to an article, the source code was 
turned over to the Hawking estate, which certainly has reason to protect 
that particular voice. I know one friend who's told me that if this 
voice were made available, he'd use it as his default screen reader voice.

As I see it, the problem with your scenario of the crippled voice in the 
earpiece is that you can't please everyone no matter how hard you try. 
Ideally, if the big box in the closet can send synthetic speech to the 
earpiece once the user is properly authenticated, that's an audio 
stream, so it should be able to send a pre-recorded file to 
unauthenticated/unregistered users. This pre-recorded file could be 
spoken by a more capable synthesizer, or even by an actual human, who 
can clearly state the message. If spoken by a human, in theory that 
human would know, or could find out, how to properly pronounce unusual 
names, places, etc.

This leaves our poor crippled synthesizer in the earpiece to deal with 
those situations where no big box can be reached at all, or a situation 
has arisen for which no canned recording is available. Once again, you 
can't please everybody. If you're designing the big boxes and writing 
their documentation, you can include best practices to insure that when 
administrators write error messages, they make them as friendly to the 
particular synthesizer you've chosen as possible. But that doesn't mean 
the admins have to follow your recommendations. And then maybe there's 
the newbie admin who is rushing through installation and configuration 
and just wants to get this up and running as quickly as possible, skims 
through your docs briefly, and some poor user who needs to register gets 
the following mess, spoken by their earpiece's crippled synthesizer:

Sorry we dont recognise your device ID. Please clal dr Johnson at 
8178446611 Tahnk you

Now for those situations where no big box can be found at all, assuming 
you're in control of the firmware for the earpieces, you know exactly 
how your crippled synthesizer works, and can work around any quirks it 
has in order to provide the most understandable messages possible.

I hope this helps,

Jayson

On 7/23/2019 1:10 PM, Don wrote:
> On 7/23/2019 6:38 AM, Jayson Smith wrote:
>> Hi,
>>
>> A few points here.
>> First, I think it's a little of both pronunciation and the voice 
>> itself that
>> gets on my nerves with ESpeak.
>>
>> Second, I'd argue that Alex and Alexa do have to contend with 
>> unrestrained
>> input. If I go into my Alexa web portal and put
>> "Sfjsaofhdsahbfiuewfbhifgbfvbiuewqfbirewqfbiwfbiubifdsava" on my 
>> shopping list,
>> then ask her to read my shopping list, she's going to have to deal 
>> with that
>> horrible mess of text.
>
> Yes, but what does it end up saying?  And, I suspect it says whatever
> in a very specific context:  "I'm sorry, I don't find anyone who is
> selling 'Sfjsaofhdsahbfiuewfbhifgbfvbiuewqfbirewqfbiwfbiubifdsava'".
>
> It doesn't have to contend with trying to apply prosody to
> incomplete sentences or a meaningless series of words:
> "Bob went snigglepuss"  "Here is teh mising peas of the puzl"
>
> I can ensure that all of the messages that I generate for the
> synthesizer (either synthesizer!) are grammatically correct.  I
> can craft them in such a way as to avoid difficult pronunciations.
> Or, to exploit known text normalization patterns (e.g., presenting
> "2,019" to the synthesizer when I want it to say "two thousand and
> nineteen" instead of "twenty nineteen".)
>
> But, I can't guarantee that folks who extend my design will be
> as disciplined.  Yet, the user will have to contend with the
> "input text" chosen by those folks for their extensions!
>
> I don't think it is acceptable (or ethical) to say "that's not
> MY problem!"
>
> _______________________________________________
> Dectalk mailing list
> Dectalk at bluegrasspals.com
> http://bluegrasspals.com/mailman/listinfo/dectalk
>
>



More information about the Dectalk mailing list