[DECtalk] DECtalk TTS licensing

Don Text_to_Speech at GMX.com
Tue Aug 31 20:52:13 EDT 2021


On 8/31/2021 1:38 PM, Karen Lewellen wrote:
> On Mon, 30 Aug 2021, Don wrote:
>> You could have used a Votrax in the late 70's.  Or, DECtalk in the
>> early 80's.  But, both would have been *hardware* synthesizers...
>> boxes that sat next to your computer.
> ..and this is a problem why exactly?

Did you notice the dates I referenced?

What software were you running in the early 80's on your PC?  70's?
What sort of PC did you actually *have*?

The KRM ran on a MINIcomputer with genuine CORE memory -- little
tiny magnetic donuts with hair-fine wires threaded through them
as if they weer fabric.

Why?  Because PC's didn't exist.  And the PC's that were first
released were dog slow.  This, when the speech was synthesized
in hardware (a Votrax board set).

> granted, I am not a Linux user, for many reasons, one of which is because no
> driver exists for the dectalk hardware I am using with my machine right now.
> I have an associate here in Toronto who builds dectalk USB boxes in his
> basement, for about $50, and he is a Linux person.
> Please do not confuse what may not have been tried by yourself personally as
> impossible for others.

Excuse me, but I'm a degreed EE.  I *design* the types of kit we're
talking about.  Chances are, I've tried and *done * more things
than you can imagine.

I worked for Kurzweil in the late 70's with the initial Reading Machine.
I've tinkered with various hardware "turn key" synthesizer since then
as well as  built hardware synthesizers from chipsets (Digitalker,
Artic/SSI, General Instruments, etc.)  I *own* a reading machine.

I have several hardware synthesizers -- including a DTC-001 and
a DECtalk Express (along with Votrax, Infovox, etc.).  You couldn't
install a software synthesizer in an early PC.  Nor could you
*before* the PC was created.  But, *I* could use these devices
which was likely impossible for MOST "others".

And, written 5 software synthesizers.

>> Getting text into them would have been the problem.  But, then again,
>> all that was available at that time were CP/M machines and early PCs.
> I am not sure I follow this at all.
> I began using speech in 1988, when I got my first computer and synthesizer,

So, that's 10 years later than the date I mentioned.  Please read
more carefully.

*I* owned a "personal computer" before IBM made the first PC.

I owned my first UNIX computer before Linux was even *conceived*.

What speech synthesizer did you operate in 1980?  What was it
connected to?  What SOFTWARE drove it?

> which was an Internal card.  The technology had been around long before I got
> mine, so much so that Telesensory systems <spelling>  had representatives
> around the country, who came to your house and trained you to use their screen
> reader programs.

Yes, and they had the Opticon to let you "feel" what the text looked like.
While Kurzweil had a complete reading machine.  Another company that was
too little, too late.

> I know dectalk internal cards existed in the 90s, although I did not start
> using  any tool of theirs until  mid decade.
> So, what any of what you are claiming has to do with the reality of computing
> escapes me.

I don't see early 1980's or 1970's mentioned in any of your comments,
above.   "Computing" today (or in 1990) is considerably different than
it was in the 1970-1980 timeframe.  Folks who weren't "in" the industry
likely had no access to "capable" computers and certainly were at the mercy
of others as to the software that was available for their use ON those
computers.

You're at least a decade later.

> even IBM  had a talking structure of sorts at that time, no windows required.
>
>> I'm not sure you realize just how many choices have already been
>> made for you!  And, how intimidated you would be if they had
>> been available for you to muck with.
> Are you kidding?  One of the things I can personally say as someone using
> computers, with the same operating system, since 1988, is the last thing I
> desire is someone who does not know my needs making decisions for me.

Then you are truly clueless.  Or, think that software is trivial, internally.
There are myriad decisions that you aren't even aware of, inside the
software that you think of as "turnkey".  You think because you can adjust
a few (even a HUNDRED) parameters that you have control over the software.
There are a few orders of magnitude more decisions that have been hidden
from you.

And, you should be glad for this as you wouldn't want to spend the man-hours
to try to understand each of them and their significance with respect to
your intended use.

Have a look at about:config in Firefox.  Those are just the EXPOSED settings.
I count about 3,700 of them.  How many have you touched?  Did you even know
they were there?  Would you like to fathom a guess as to what the
"browser.newtabpage.activity-stream.asrouter.providers.cfr" setting does?
Or, how altering it might affect your browsing experience?

Try to guess how many you actually understand and reason why they might want
to be changed.  Better yet, go through and randomly change half of them.
Then, wonder why your browser doesn't work as expected.  *Then* sort out
which ones you have to change *back*.

Obviously "Nikita" feels there is far more involved INSIDE a synthesizer
else it wouldn't be such an intimidating task to undertake the design of one.
Yet, you feel qualified to make all of the decisions that Klatt made *for*
you when he designed it?

Please tell me what *rule* I should adopt for pronouncing long strings of
digits:  one at a time, groups of two, three, treat the string as an
ordinal and clutter the presentation with "units" (834 trillion, 32 billion,
596 million, 8 thousand, two hundred and 15.)

Should that last bit be "and 15"?  Or, should the "and" be elided?
Should there be a setting for you to decide which YOU prefer?

Should 4 digit numbers in the range of 1500 to 2500 be spoken as
if years (nineteen hundred forty five)?  Should that, instead,
be "nineteen forty five"?  Or, "nineteen hundred AND forty five"?

Should "colour" be recognized as "color"?  And, "aluminium"
pronounced with that extra syllable while "aluminum" avoids it?

Which abbreviations should be automatically expanded?
Mr/Mrs/Dr/Messrs/Mme/etc.?  Ct/Rd/Dr/Ln/Av/Blvd/Tpk/Hwy/St?
LOL/WTF/wrt/AFAICT/IMHO/etc.?

Should you be able to augment the abbreviation expansion table?
How will you specify the context for each type of expansion
(Dr = Doctor or Drive, depending on context)?

Should "No" be north?  Norway?  not yes?  Nobelium?
And "Ca" could be california?  canada?  circa?
while "Il" is israel?  illinois?  a mispelling of "ill"?

If even these *few* decisions were exposed to a user, their
eyes would gloss over.

There are ~50 parameters that I have to specify for each
*sound* my formant-based synthesizer utters.  Not only
do I have to put real numbers on each of them (and
they vary based on the *voice* I am building), but I
also have to indicate how they should change *during*
a sound/phoneme.  Would you like to be able to tweek my
choices?

I have to alter the prosody of each breath group to
make it sound more natural/less monotonic/robotic.
Would you like to be able to tweek the rules that
I use when doing this?

You see a synthesizer as a black box with very FEW
"adjustments" -- pick a voice and speech rate; done.
Similar to how most people see an automobile -- press
the accelerator or brake; done.  You must be surprised
when the shop says it will be a couple of days before
your repairs are complete (for such a "simple" device)!

> What is intelligently done, in every screen reading program I have used
> regularly is  a bit of consistency.

We're not talking about screen readers.  Weren't you the one who decried
folks treating TTS and screen readers as equivalent?  Nikita is interested
in a speech synthesizer.  No details of this potential screen reader have
been released -- other than generalities.

> There may be config files that the screen reader program developer feels may be
> useful.  However there are also choices as to if you need load them, ways to
> create our own, and best of all a detailed manual, both on board and in
> external form that guides you to the process.
> There are many disappointing things about Linux, but one of them is the lack
> of  consistency.

Because the Linux ecosystem is built from software authored by many different
people even before Linux came into existence.  If you want consistency,
you run Apple (no, not Windows).

What linux (and the many other "free" OS's) offers is a "price" of $0.
And, for the technically adept, control over the innerworkings of the
system (*Linux* is just the OS -- none of the programs that one would
need to make it DO anything useful!  Like windows *without* solitaire,
notepad, internet explorer, media player, etc.).

I run NetBSD (another free OS) precisely for this ability to "fix"
things that aren't working without having to wait for someone
(like microsoft or apple) to decide that my issue is important
enough to merit fixing).

> Still, speaking personally, Linux seems to me to be a developers operating
> system, not an end users one.

Agreed.  I don't run *any* Linux workstations -- though I've run NetBSD
since 1993 (but only to develop software and as network appliances).

One of the reasons Linux is user-unfriendly is because the folks
developing it take the attitude that the user should be able to
ADJUST EVERYTHING.  As a result, they can't adjust ANYTHING...
because too much complexity and choice is exposed.

My TV has over 100 "settings" -- there are a dozen just for picking
HOW closed captions are displayed (and other settings to decide
how they are decoded from the video stream).  I can decide how red
the reds should be, how blue the blues, how black the blacks, etc.
I can decide how audio is routed, whether I've an antenna or cable,
what time zone I'm in, etc.  And, a 66 page manual that I can view
on-screen.

As an ENGINEER -- i.e., someone who might enjoy tinkering with things
like that -- I've only really used the "scan for channels" setting.

Wanna bet most folks haven't gone beyond that point?  And, only got
to that point when the TV automatically ran the "setup wizard"
when it first detected power?!

>> How large is the speaker's *head*?  How many formants?  What
>> frequencies, bandwidths and gains for each?  How do they
>> change, over time, for each "phoneme"?
> Speaking personally, that I do not have  such choices is precicisily why, that
> and there are few consistencies, quality consistencies in how Linux make these
> decisions are why I
> am likely never going to be a Linux user.

You'd have a greater chance of making those TYPES of decisions in Linux
than in Windows.

But, even if you could, you likely wouldn't understand how to make
intelligent decisions for those types of parameters.  Just like if
I exposed the ignition timing on your vehicle for you to tweek.

> And pronunciations varied, even with DOS screen readers...certainly with tts
> tools.
> A simple example, I have a friend who uses her Kindle to read fanfiction, and
> TTS..which cannot even say the names of characters  properly.
> My dectalk  and my computer gets it correct.
>
>> How long a pause between words?  For each comma encountered?  Period?
>> Other punctuation?
> That is decided by the content, not the developer.

No.  When you type a sentence, do you insert a number after
each punctuation mark to tell a *potential* synthesizer how
long the associated pause should be?  Silly me, I've forgotten
to type ANY such numbers after MY punctuation!

>> How do I pronounce 1234?  1,234?  2021?  9/1/2021?
>>
> A quality screen reader leaves that to the end user, because different individual
>   life situations impact how one desires numbers be announced, and dates.

I think you seriously underestimate the complexity of "normalizing"
text.

Please summarize *your* rules for how "strings of digits" should be pronounced;
obviously, you MUST have developed some rules because you are an end user,
right?

Here are some examples that you are likely to encounter:

Phone numbers:   555-1212
                  555 1212
                  (800) 555-1212
                  555-1212 x3-1234
                  020 3352 2100
                  +44 020 3352 2100
                  +61 2 8075 8800
Dates:           09/01/2021
                  9/01/2021
                  01/9/2021
                  09-01-21
                  01 Sept 2021
                  20210901
Cardinals:       102
                  1255
                  1945
                  2099 (twenty ninety nine? two zero nine nine?)
                  3100 (thirty one hundred?  three thousand one hundred?)
                  1,945 ("one thousand nine hundred and forty five" or skip "and"?)
                  1234567890 ("one two, three four, five six, seven eight, nine"?)
                  1,234,567,890
Ordinals:        1st, 2nd, 3rd, 4th, 27th, 105th?
Decimals:        9.05 ("nine point oh five" or "nine point zero five"?)
                  9,05
                  3.125 (is that "three and an eighth"?)
Fractions:       1/2, 3/4, 9/11, 42/100, 22/7
Mixed numbers:   3-1/3, 1-1/2
MAC addresses:   01:23:45:67:89:00
                  C3:d0:56:Ff:00:92
IP addresses:    10.1.23.222
                  10.0.23.0/24

And how are you going to tell me which rules apply to which numeric
presentations?  Will you recognize a US telephone number AND a UK
phone number?  How big does a cardinal number (comma delimited) have
to get before you stop trying to group it into labeled triads?

If I told you the first 6 digits of a number were "105 novemdecillion,
892 quattuordecillion..." could you, as a listener, sort out how
many digits are present in the ENTIRE number?  (so, if you ever resort
to using units like millions and billions, when do you give up on
that approach?  and, what do you do in it's stead??)

> You do not make that decision for the user if building a quality product, there
> may be a default, but that default can be changed.
> Linux likely does not trust its end users, because, again speaking personally,
> Linux is  for programmers who may build things for people, not for  individuals.

You've completely missed the mark.  You can adjust all sorts of things
in a Linux system precisely because there are geeks behind the scenes.

I mentioned my TV has 100+ adjustments.  I have a little box sitting
next to it that runs Kodi -- a Linux-based multimedia system.  Note that
it has no access to the types of adjustments that are INTERNAL to my TV
(which is sensible as it has to work with ANY TV -- or video monitor!).
It likely has *1000* settings as it has hundreds of "plug-ins" that
each have their own settings.

Because of this, it is impractical to use.  We rely on an off-the-shelf
"appliance" that has very few settings to provide that functionality.
(The Kodi box is for *my* use -- as an engineer).

>>>  of
>>>  dictionaries made it an `extremely wonderful experience. DecTalk came with
>>>  my
>>>  first pc in 1994. I listen to it more hours each day than any1 including
>>>  my
>>>  Wife, so it better be enjoyable. The thing about choices, its your choice
>>>  to
>>>  make them or accept the defalts.
>
> And everyone should, regardless of system, have that flexibility..I am thankful
> every single day, several hours a day, that  I still have those dectalk rich
> vibrant quality choices, even though I was never a vocal eyes user.  having a
> solid consistent computer floor a screen reader that reliably gives you, and
> only you, what you need, providing the ability for you to choose what that
> means, so you know when there is a problem, and when there is not?
>
> Mercy if I had a dollar for every time someone unaware of how good adaptive
> technology should function, tell me the problem is my screen reader when it was
> not, I would be Oprah Winfrey.

Mercy if I had a dollar for every time someone unaware of how products are
designed tells me how they SHOULD be designed when they are completely
clueless as to what's actually involved, I would be Oprah Winfrey.



More information about the Dectalk mailing list