[DECtalk] DECtalk's development, and old speech synthesizer recordings

Fri Nov 4 12:24:56 EDT 2022

Don:
Thank you for the interesting comments, and the recomendation of "How
Klattalk became DECtalk: An Academic's Experiences in the Business World".
It sounds like interesting literature, which I sure would be interested in
reading.
As stated, I haven't read that literature yet, but as I understood it,
Perfect Paul was supposed to be a model of Klatt's own voice, but yes, it
could make sense that it was a combination of research including several
source voices.

Regarding your comments about implementation, are you saying that the method
of implementation had a great influence on the sound? I have heard audio
samples of DECtalk DTC-01, boath from the actual audio, and from a MAME
emulator, and they sound more or less the same to me. I use a DECtalk
Express daily, and it's "voice" sounds very similar to the early software
DECtalk versions, sutch as 4.3. The DECtalk Express is using version 4.2CD.
It was because of this, that I found it interesting that the recordings from
1986 sounds more like the DECtalk Express, or DECtalk 4.3, and the DECtalk
DTC-01 from 1984, running either version 1.8 or 2.0. How much of the
implementation could be changed in these 2 years?

-----Oprindelig meddelelse-----
Fra: Dectalk [mailto:dectalk-bounces at bluegrasspals.com] På vegne af Don
Sendt: 4. november 2022 12:26
Til: dectalk at bluegrasspals.com
Emne: Re: [DECtalk] DECtalk's development, and old speech synthesizer
recordings

> 2: What did Dennis Klatt's own voice sound like, and how much was he
> involved with DEC's development of DECtalk, after they licensed his work?

For a description (in DK's own words) of the transition process, see:
"How Klattalk became DECtalk: An Academic's Experiences in the Business
World"

> I haven't been able to find any recordings of Klatt's original voice on
the
> internet, so I think this is really interesting. Personally, I find it
most
> likely that his voice sounded more like the recordings in his collection,
> because they, to me, sound more human and naturalt han DECtalk 1.8 and
2.0.

Saying that a particular DECtalk voice was DK's is misleading.
If you read the literature, you will see reference to various
"speakers" whose voices were analyzed to determine the characteristics
of "speech".  The initials "DK" figure prominently in these lists
of subjects.

This makes sense.  Everyone -- even the author of a research paper -- has
a voice.  If doing research and needing sample voices to analyze, what
more convenient "test subject" than the author himself?

So, one can say that the *parameters* of DK's voice factored heavily
into the default parameters used for DECtalk voices.  And, thus, bore a
resemblance to his speaking STYLE.  Whether that was intentional, on
his part, or just an obvious consequence of his choice of test subjects
is unknown.

OTOH, a voice created from diphone synthesis *is* the voice of the subject
recorded the speech samples from which the diphones were extracted (as
"audio recordings").  So, had he opted to implement the synthesis
portion with diphones (leaving all of the rest of the synthesizer as is),
then it truly *would* be DK's voice that you were hearing.

> I believe I read somewhere, that the Doctor Dennis voice was supposed to
> represent Klatt's voice in later years, due to his illness breaking his
> voice, but then, it seems very strange that the later DECtalk versions
would
> make the voice sound much better than the DECtalk 2.0 version, because I
> can't imagine his voice got that much better before his dead in 1988.

I think trying to draw conclusions, retroactively, from a set of sound
samples is fraught with potential misunderstandings.

Keep in mind that, what *you* know as DECtalk, today, differs significantly
from Klattalk/MITalk in terms of implementation and performance.

Klatt was an academic.  AFAICT, he had never designed a product.
Klattalk/MITalk was an "intellectual exercise" but only existed
"in the lab".  DECtalk (the "hardware" synthesizer) was the first
attempt at reifying it into a commercially viable product.

Several thousand dollars, at that time!  ($4K comes to mind).

But, when computers were still confined to universities and
businesses (recall, the first PC didn't come around until 1980),
a $4000 peripheral was par for the course -- a disk drive was
the size of a washing machine, a computer the size of a refrigerator,
etc.

The original MITalk ran on a PDP-11 minicomputer with special audio
output hardware.  Klatt didn't have to worry about accessing mass
storage -- the operating system took care of that.  He didn't have to
worry about "doing two things at one" -- the operating system took
care of that.  And, he didn't have to worry about CREATING an operating
system -- which would be required for a stand-alone device to
mimic his "minicomputer solution"!

The port to the 68000 microprocessor had to replicate his design
in a way that was feasible to implement on the (newly marketed)
microprocessor.  It had to include the necessary memory to store
his program, the parameters that drove it as well as the memory
for the input text and output waveforms.  AFAICT, there was no real
operating system in place; the program had full reign of the
hardware.

The original implementors obviously were more concerned about making
it speak than about making it do so inexpensively -- they added a
second processor (a Digital Signal Processor) to do the waveform
synthesis.  This added a lot of cost and complexity but must have
been the easiest way forward, for them, without reengineering
his solution.  (A 68000 has enough horsepower to do all of the
work without additional augmentation -- but, you have to
engineer the solution with that in mind!)

The DECtalk that you likely use (PC-based) returns the implementation
to the original environment -- on a "host computer" (instead of
on a dedicated computer).  Memory takes the form of files on a
disk, not individual "chips" (each with a co$t)

Each time the implementation is "touched", it likely incurs changes.
Some well-intended.  Some "necessary" for the "port".  Some just
hopeful improvements.

For example, MITalk was designed NOT to rely on a "pronouncing dictionary"
for anything but "a few" exceptional words.  Yet, the PC-based DECtalk
includes such a dictionary.  How often do you think "aardvark" comes up
in text?  How crucial is it for it to hold a spot in that dictionary??

There are lots of "degrees of freedom" in the implementation.
Lots of places where a developer has a choice as to how to do something.
Each choice has ramifications.  You might not be able to foresee
the consequences of these in order to avoid "unfortunate ones".

Klatt was obviously well versed in the theory of what he was trying
to do, even if his implementation skills were dubious.  The DEC folks
were likely more skilled at the (hardware) implementation and lacking
in all of the theory.

[Speech was a hot topic in the 70's.  There was a LOT of research on which
Klatt drew.  His thesis adviser -- Allen -- had tried to tackle the same
problem when *he* prepared his own thesis, a decade earlier!]

But, you can always move backwards and revisit something that *did*
work (or, worked "well enough").  Or, can make a change that *subtly*
impacts the character of the output (would you be able to hear if
a particular phoneme's duration was extended by 5%?  Or, if the
pitch contour changed subtly?  Yet, you'd "sense" a difference...)

_______________________________________________
Dectalk mailing list
Dectalk at bluegrasspals.com
https://bluegrasspals.com/mailman/listinfo/dectalk