[DECtalk] DECtalk TTS licensing

Devin Prater r.d.t.prater at gmail.com
Mon Aug 30 12:04:07 EDT 2021


Exactly! And yeah, Emacspeak has to be taught about all that, but shoot,
it's better than what *all* other screen readers do.
Devin Prater
r.d.t.prater at gmail.com
gemini://tilde.pink/~devinprater/



On Mon, Aug 30, 2021 at 9:18 AM Don <Text_to_Speech at gmx.com> wrote:

> On 8/30/2021 6:40 AM, Devin Prater wrote:
> > Man, this should be spread far and wide throughout TTS circles. Screen
> > readers should do this too. But we've not even gotten past simple API
> > readers almost.
>
> That's because synthesizers are treated as *bags* that are bolted onto
> (usually preexisting!) applications that weren't designed with speech
> in mind.  So, *all* of the "conversion" from the initial medium to
> speech has to be done *in* the synthesizer; it can't "ask" the application
> what the application is trying to do.
>
> If you were reading a novel (or, a news story on the web) and you
> encountered some double quotes, you would recognize this as a
> literal quotation of someone's statement/speaking.  For example:
>
>     The sheriff said "We believe we have apprehended the sole gunman
>     in this heinous crime".  He later added that the suspect was due to
>     be booked into the local jail, later that day.
>
> Why can't the synthesizer switch from "narrator voice" to "speaker's
> voice" when it encounters the first double quote?  And, then switch
> back when it encounters the closing double quote?  Wouldn't this
> make it easier to understand that you are now *in* a quoted statement?
>
> Ah, but what if someone FORGOT a closing quote?  Does the synthesizer
> stay in that voice, forever?
>
> What if there was never intended to be a closing quote?  Your
> password is 4HJ"/*Fred
>
> Not only can choice of voice convey information, but it also gives
> your ears a break.  Ever listen to a newscast with ONE reporter
> reading all of the stories?  Contrast that with newscasts where two or
> more reporters/anchors alternate -- especially a male and a female.
>
> > Well we now have OCR and image recognition in some of the
> > screen readers, but TTS is still a sort of binary thing with "is
> speaking"
> > "is not speaking" at a user-defined rate and pitch and volume and
> sometimes
> > intonation. I'd love to see more screen readers become more like
> Emacspeak.
>
> EMACSpeak tries to integrate the "presentation" (speech) with the
> application.
> But, has to be "taught" about every potential application.  Because
> applications don't emit information that informs as to its (current!)
> intent.
>
> It is fairly common to design error and informational messages
> in the form:
>     <number> <explanation>
> Like:
>     404 page not found
> A computer (program!) that sees this message gets everything that it
> needs to know in those first 4 characters -- the text that follows is
> just there for humans (the computer ignores it!).
>
> So, folks *can* make interactions that convey additional information
> for a particular type of consumer (computer vs. human).  But, there
> is little pressure to do so.
>
> How often do you see information conveyed by color?  Choice of "font"?
> Bold?  Italic?
>
> Will the synthesizer recognize that something presented in BOLD should
> be treated differently than something in italic?  Or, do you throw
> away that information -- in which case, why was it present in the
> first place???
>
> Why have red for stop and green for go -- if 15% of the population
> can't distinguish between them?
> _______________________________________________
> Dectalk mailing list
> Dectalk at bluegrasspals.com
> https://bluegrasspals.com/mailman/listinfo/dectalk
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://bluegrasspals.com/pipermail/dectalk/attachments/20210830/f6e65291/attachment.html>


More information about the Dectalk mailing list