[DECtalk] Intelligibility/Listenability criteria

Jayson Smith jaybird at bluegrasspals.com
Sun Jul 21 03:10:57 EDT 2019


Hi,

I personally prefer formant-based speech synthesizers as opposed to 
systems using pre-recorded human speech. I'm really sorry that formant 
speech synthesis seems to have fallen by the wayside in recent decades, 
and the focus seems to be trying to make things sound as human as 
possible. Don't get me wrong, there are some great natural voices out 
there, Alex on MacOS and iOS being one example, as well as Amazon Alexa. 
But it seems no matter how hard the developers try, there's always going 
to be some odd case now and then where they can't quite match up the 
bits of recorded speech just right to make it sound completely natural.

My favorite TTS systems are DECtalk and Eloquence. I absolutely cannot 
stand ESpeak, there's just something about its sound that gets on my nerves!

Hope this helps,

Jayson

On 7/21/2019 12:53 AM, Don wrote:
> Hi,
>
> Perhaps a bit off-topic for this list... if so, my apologies.
>
> I'm looking for opinions as to how one evaluates the "effectiveness"
> of a particular synthesizer.  Said another way, how one decides that
> synthesizer A is "better" than synthesizer B.  Ideally, criteria that
> would allow you to rank a set of them!
>
> I've been auditioning various synthesis devices and techniques
> to try to come to my own conclusions on this.  Then, hopefully,
> work backwards to come up with some objective criteria by which
> they could each be "scored" (even if that was done using bogus
> rating units).
>
> "Intelligibility" is, of course, the prime issue.  "Listenability"
> coming into play for any prolonged use.  Finally, "naturalness"
> when it comes to extended use.
>
> For example, the old Votrax units were intelligible -- once you
> learned their "accent".  But, listenability was rather poor... you
> quickly developed ear fatigue.  And, the idea of naturalness was
> never even considered!
>
> With gobs of resources (hardware, software, processing power), you
> can achieve quite acceptable results.  This seems to be the approach
> most "modern" synthesizers -- and techniques -- adopt.  The real problem
> lies with limited resources attempting to handle unconstrained input.
> (If you know what you're going to be asked to speak, it's really easy to
> come up with a good presentation!)
>
> Limiting the user's exposure to the synthetic voice can reduce ear 
> fatigue.
> So, dealing with it for 10 minutes might be tolerable while 2 hours
> would be torture.
>
> But, having to face the prospect of completely unconstrained input can
> tax even that brief usage.  "Dr. Jones' car -- bearing the license plate
> FTDKTR -- has been parked in front of his house on Jones Dr. since 12:34A
> this morning when his Polish butler finished polishing it." Imagine you
> have no other way of inspecting the input text...
>
> So, what makes a synthesizer "tolerable" or "intolerable"?  What is the
> "threshold of pain" when it comes to tolerating an underperforming
> synthesizer?
> _______________________________________________
> Dectalk mailing list
> Dectalk at bluegrasspals.com
> http://bluegrasspals.com/mailman/listinfo/dectalk
>
>



More information about the Dectalk mailing list