[DECtalk] Intelligibility/Listenability criteria

Damien Garwood damien at daygar.plus.com
Sun Jul 21 11:15:45 EDT 2019


Hi Blake,
I hear ya. I still vividly remember the time, even though it was about 
15 years ago, when I was, let's say 1000% convinced that I wouldn't be 
able to understand voices at high speed. It's taken a long time, but I'm 
much happier for it. If I need to read something carefully, I can and 
will usually slow it down, but it means I can read at high speeds and 
generally have a good idea what it's saying. Again, that's where 
understandability plays a very important role. The more understandable 
it is, the higher the speed I can work with, at least in formant synths.
Cheers,
Damien.

On 21/07/2019 02:46 pm, Blake Roberts wrote:
> Don,
> For me, whether a speech synthesizer is tolerable or not depends on a 
> few factors.
> 1. How realistic the voice sounds, naturalness.
> 2. Whether the synthesizer can handle the amount of text given to it by 
> the screen reading software  without crashing.
> 3. If I can listen to the synthesizer for a long period without getting 
> ear fatigue.
> 
> Let me provide two  examples. Years ago I purchased AT&T voices from 
> nextup.com for use with the the TextAloud program. Since AT&T voices are 
> SAPI5 compatible, I chose to use them with my screen reader. That was a 
> mistake. The voices are so large in size that they would consistently 
> crash after being my JAWS screen reader voice for a minute or two. To 
> me, the AT&T voices I purchased also sound monotone, so I could not 
> tolerate listening to AT&T voices for hours on end in any event. I think 
> there is a newer version of the AT&T voices from Wizard Software which 
> NextUp does not have access to/does not sell. I can only share my 
> perspective based on the voices which I have.
> 
> On my Windows 10 system at home, I prefer either Eloquence or Microsoft 
> Mark. When I am using JAWS Professional Edition on my work laptop, I 
> prefer Microsoft Mark or the Vocalizer British English Vocalizer voice 
> Malcolm although I happen to reside in the U.S. Malcolm sounds natural, 
> does not crash and I enjoy listening to him for hours.
> 
> These are my thoughts. I know that some people evaluate a synthesizer 
> voice on how fast it can talk. I do not use that criteria myself as an 
> end-user because I prefer slow or medium speed. If a voice is set too 
> fast, I cannot understand it.
> Blake
> 
> 
> ----- Original Message -----
> From: mattias jonsson <mj at mjw.se>
> To: DECtalk <dectalk at bluegrasspals.com>
> Sent: Sun, 21 Jul 2019 06:40:35 -0400 (EDT)
> Subject: Re: [DECtalk] Intelligibility/Listenability criteria
> 
> my favorite voices: vocalizer swedish alva for swedish,vocalizer evan us 
> english
> 
> 
> Den 21 juli 2019 10:18:11 skrev Damien Garwood <damien at daygar.plus.com>:
> 
>  > Hi Don,
>  > Here are my criteria:
>  >
>  > 1. Understandability
>  > As a screen reader user who has to listen to speech synthesis on a
>  > constant basis while using a computer, understandability is first and
>  > foremost. If the synthesiser can't be understood, then you're not going
>  > to get the feedback you need. In my opinion, ESpeak ticks every box,
>  > except this, so I can't use it.
>  > 2. Responsiveness. Again, because the speech is reading everything for
>  > me, I don't want a synthesiser that acts sluggishly with any kind of
>  > latency, whether that be a second, or 50 milliseconds, whether through
>  > lack of performance optimisation or through audio silence. When I press
>  > a key, I want instant feedback. This automatically rules out most
>  > natural-sounding synthesisers.
>  > 3. Accuracy: It needs to be able to read text accurately for the
>  > language it is designed for. It's not enough simply to have a phonetics
>  > dictionary, but it also needs to be able to distinguish between words
>  > (Present noun versus present verb, for instance).
>  > 4. Flexibility: The voice timbres should be available to the user, and
>  > for the most part should adjust smoothly to the change. This is
>  > important if a user has specialist needs and cannot use the synth in its
>  > default state. Speed and pitch are definitely a must. Again, this rules
>  > out natural synths, since due to the nature of recorded samples they
>  > start to begin to sound unnatural if you attempt to adjust the speed and
>  > pitch. The bigger the change, the more unnatural.
>  > Like Jason, I also prefer formant synths. My favourite by far is
>  > Keynote, which to me is the most understandable, but I do love DECTalk
>  > for its flexibility. I also like Eloquence and the synthetic version of
>  > Orpheus.
>  > Cheers,
>  > Damien.
>  >
>  > On 21/07/2019 05:53 am, Don wrote:
>  >> Hi,
>  >>
>  >> Perhaps a bit off-topic for this list... if so, my apologies.
>  >>
>  >> I'm looking for opinions as to how one evaluates the "effectiveness"
>  >> of a particular synthesizer.  Said another way, how one decides that
>  >> synthesizer A is "better" than synthesizer B.  Ideally, criteria that
>  >> would allow you to rank a set of them!
>  >>
>  >> I've been auditioning various synthesis devices and techniques
>  >> to try to come to my own conclusions on this.  Then, hopefully,
>  >> work backwards to come up with some objective criteria by which
>  >> they could each be "scored" (even if that was done using bogus
>  >> rating units).
>  >>
>  >> "Intelligibility" is, of course, the prime issue.  "Listenability"
>  >> coming into play for any prolonged use.  Finally, "naturalness"
>  >> when it comes to extended use.
>  >>
>  >> For example, the old Votrax units were intelligible -- once you
>  >> learned their "accent".  But, listenability was rather poor... you
>  >> quickly developed ear fatigue.  And, the idea of naturalness was
>  >> never even considered!
>  >>
>  >> With gobs of resources (hardware, software, processing power), you
>  >> can achieve quite acceptable results.  This seems to be the approach
>  >> most "modern" synthesizers -- and techniques -- adopt.  The real problem
>  >> lies with limited resources attempting to handle unconstrained input.
>  >> (If you know what you're going to be asked to speak, it's really easy to
>  >> come up with a good presentation!)
>  >>
>  >> Limiting the user's exposure to the synthetic voice can reduce ear 
> fatigue.
>  >> So, dealing with it for 10 minutes might be tolerable while 2 hours
>  >> would be torture.
>  >>
>  >> But, having to face the prospect of completely unconstrained input can
>  >> tax even that brief usage.  "Dr. Jones' car -- bearing the license plate
>  >> FTDKTR -- has been parked in front of his house on Jones Dr. since 
> 12:34A
>  >> this morning when his Polish butler finished polishing it."  Imagine you
>  >> have no other way of inspecting the input text...
>  >>
>  >> So, what makes a synthesizer "tolerable" or "intolerable"?  What is the
>  >> "threshold of pain" when it comes to tolerating an underperforming
>  >> synthesizer?
>  >> _______________________________________________
>  >> Dectalk mailing list
>  >> Dectalk at bluegrasspals.com
>  >> http://bluegrasspals.com/mailman/listinfo/dectalk
>  >>
>  > _______________________________________________
>  > Dectalk mailing list
>  > Dectalk at bluegrasspals.com
>  > http://bluegrasspals.com/mailman/listinfo/dectalk
> 
> _______________________________________________
> Dectalk mailing list
> Dectalk at bluegrasspals.com
> http://bluegrasspals.com/mailman/listinfo/dectalk
> 
> _______________________________________________
> Dectalk mailing list
> Dectalk at bluegrasspals.com
> http://bluegrasspals.com/mailman/listinfo/dectalk
> 


More information about the Dectalk mailing list