[DECtalk] Intelligibility/Listenability criteria
Damien Garwood
damien at daygar.plus.com
Sun Jul 21 11:15:45 EDT 2019
Hi Blake,
I hear ya. I still vividly remember the time, even though it was about
15 years ago, when I was, let's say 1000% convinced that I wouldn't be
able to understand voices at high speed. It's taken a long time, but I'm
much happier for it. If I need to read something carefully, I can and
will usually slow it down, but it means I can read at high speeds and
generally have a good idea what it's saying. Again, that's where
understandability plays a very important role. The more understandable
it is, the higher the speed I can work with, at least in formant synths.
Cheers,
Damien.
On 21/07/2019 02:46 pm, Blake Roberts wrote:
> Don,
> For me, whether a speech synthesizer is tolerable or not depends on a
> few factors.
> 1. How realistic the voice sounds, naturalness.
> 2. Whether the synthesizer can handle the amount of text given to it by
> the screen reading software without crashing.
> 3. If I can listen to the synthesizer for a long period without getting
> ear fatigue.
>
> Let me provide two examples. Years ago I purchased AT&T voices from
> nextup.com for use with the the TextAloud program. Since AT&T voices are
> SAPI5 compatible, I chose to use them with my screen reader. That was a
> mistake. The voices are so large in size that they would consistently
> crash after being my JAWS screen reader voice for a minute or two. To
> me, the AT&T voices I purchased also sound monotone, so I could not
> tolerate listening to AT&T voices for hours on end in any event. I think
> there is a newer version of the AT&T voices from Wizard Software which
> NextUp does not have access to/does not sell. I can only share my
> perspective based on the voices which I have.
>
> On my Windows 10 system at home, I prefer either Eloquence or Microsoft
> Mark. When I am using JAWS Professional Edition on my work laptop, I
> prefer Microsoft Mark or the Vocalizer British English Vocalizer voice
> Malcolm although I happen to reside in the U.S. Malcolm sounds natural,
> does not crash and I enjoy listening to him for hours.
>
> These are my thoughts. I know that some people evaluate a synthesizer
> voice on how fast it can talk. I do not use that criteria myself as an
> end-user because I prefer slow or medium speed. If a voice is set too
> fast, I cannot understand it.
> Blake
>
>
> ----- Original Message -----
> From: mattias jonsson <mj at mjw.se>
> To: DECtalk <dectalk at bluegrasspals.com>
> Sent: Sun, 21 Jul 2019 06:40:35 -0400 (EDT)
> Subject: Re: [DECtalk] Intelligibility/Listenability criteria
>
> my favorite voices: vocalizer swedish alva for swedish,vocalizer evan us
> english
>
>
> Den 21 juli 2019 10:18:11 skrev Damien Garwood <damien at daygar.plus.com>:
>
> > Hi Don,
> > Here are my criteria:
> >
> > 1. Understandability
> > As a screen reader user who has to listen to speech synthesis on a
> > constant basis while using a computer, understandability is first and
> > foremost. If the synthesiser can't be understood, then you're not going
> > to get the feedback you need. In my opinion, ESpeak ticks every box,
> > except this, so I can't use it.
> > 2. Responsiveness. Again, because the speech is reading everything for
> > me, I don't want a synthesiser that acts sluggishly with any kind of
> > latency, whether that be a second, or 50 milliseconds, whether through
> > lack of performance optimisation or through audio silence. When I press
> > a key, I want instant feedback. This automatically rules out most
> > natural-sounding synthesisers.
> > 3. Accuracy: It needs to be able to read text accurately for the
> > language it is designed for. It's not enough simply to have a phonetics
> > dictionary, but it also needs to be able to distinguish between words
> > (Present noun versus present verb, for instance).
> > 4. Flexibility: The voice timbres should be available to the user, and
> > for the most part should adjust smoothly to the change. This is
> > important if a user has specialist needs and cannot use the synth in its
> > default state. Speed and pitch are definitely a must. Again, this rules
> > out natural synths, since due to the nature of recorded samples they
> > start to begin to sound unnatural if you attempt to adjust the speed and
> > pitch. The bigger the change, the more unnatural.
> > Like Jason, I also prefer formant synths. My favourite by far is
> > Keynote, which to me is the most understandable, but I do love DECTalk
> > for its flexibility. I also like Eloquence and the synthetic version of
> > Orpheus.
> > Cheers,
> > Damien.
> >
> > On 21/07/2019 05:53 am, Don wrote:
> >> Hi,
> >>
> >> Perhaps a bit off-topic for this list... if so, my apologies.
> >>
> >> I'm looking for opinions as to how one evaluates the "effectiveness"
> >> of a particular synthesizer. Said another way, how one decides that
> >> synthesizer A is "better" than synthesizer B. Ideally, criteria that
> >> would allow you to rank a set of them!
> >>
> >> I've been auditioning various synthesis devices and techniques
> >> to try to come to my own conclusions on this. Then, hopefully,
> >> work backwards to come up with some objective criteria by which
> >> they could each be "scored" (even if that was done using bogus
> >> rating units).
> >>
> >> "Intelligibility" is, of course, the prime issue. "Listenability"
> >> coming into play for any prolonged use. Finally, "naturalness"
> >> when it comes to extended use.
> >>
> >> For example, the old Votrax units were intelligible -- once you
> >> learned their "accent". But, listenability was rather poor... you
> >> quickly developed ear fatigue. And, the idea of naturalness was
> >> never even considered!
> >>
> >> With gobs of resources (hardware, software, processing power), you
> >> can achieve quite acceptable results. This seems to be the approach
> >> most "modern" synthesizers -- and techniques -- adopt. The real problem
> >> lies with limited resources attempting to handle unconstrained input.
> >> (If you know what you're going to be asked to speak, it's really easy to
> >> come up with a good presentation!)
> >>
> >> Limiting the user's exposure to the synthetic voice can reduce ear
> fatigue.
> >> So, dealing with it for 10 minutes might be tolerable while 2 hours
> >> would be torture.
> >>
> >> But, having to face the prospect of completely unconstrained input can
> >> tax even that brief usage. "Dr. Jones' car -- bearing the license plate
> >> FTDKTR -- has been parked in front of his house on Jones Dr. since
> 12:34A
> >> this morning when his Polish butler finished polishing it." Imagine you
> >> have no other way of inspecting the input text...
> >>
> >> So, what makes a synthesizer "tolerable" or "intolerable"? What is the
> >> "threshold of pain" when it comes to tolerating an underperforming
> >> synthesizer?
> >> _______________________________________________
> >> Dectalk mailing list
> >> Dectalk at bluegrasspals.com
> >> http://bluegrasspals.com/mailman/listinfo/dectalk
> >>
> > _______________________________________________
> > Dectalk mailing list
> > Dectalk at bluegrasspals.com
> > http://bluegrasspals.com/mailman/listinfo/dectalk
>
> _______________________________________________
> Dectalk mailing list
> Dectalk at bluegrasspals.com
> http://bluegrasspals.com/mailman/listinfo/dectalk
>
> _______________________________________________
> Dectalk mailing list
> Dectalk at bluegrasspals.com
> http://bluegrasspals.com/mailman/listinfo/dectalk
>
More information about the Dectalk
mailing list