[DECtalk] Some DECtalk history and what I think we can and can't reasonably do

ebruckert Bruckert edbruckert at gmail.com
Wed Aug 3 11:17:15 EDT 2011


   First of all let me make you aware that I use DragonDictate, as I can't
see very well and proofreading is quite painful so you'll have to forgive
and interpret from mistakes the DragonDictate may make. It
   I was taught about form and speech synthesis by Dennis Klatt, and by
reading but before my involvement with him I knew next to nothing. One of
the questions in the early days was could you achieve higher intelligibility
by super articulation and do better than natural speech. What testing
revealed was really two things. At normal speaking rates the answer always
seem to be that the closer you matched to real speech the better the
intelligibility at higher speaking rates above that which humans could
normally achieve things were little different and I'm not going to go into
the specifics of what we did to make things better at high speed other than
to say they were based on knowledge of speech perception.
     The second thing we learned is that listening to a synthesizer has a
very fast but steep learning curve. Somewhat analogous to learning to
understand a person with a strong dialect or speech impediment. One of the
problems we encountered is that people often preferred the version they were
used to over any succeeding version. But actual tests did not support the
preference.
     One example is the way tilt was done inside DECtalk. The original
mechanism was a crude approximation of spectral tilt. Dennis before he died
developed a much more accurate (meaning matching human production) tilt
filter that was not able to be incorporated to a later date. As a point of
interest Dennis was so dedicated that he last modified the DECtalk code 3
days before he passed away. So the spectral tilt was changed and this
changed what you might consider the tone control on an old radio or record
player. That is just one of many reasons why DECtalk change slightly over
the years.
      The 5.0 DECtalk Incorporated the work of Prof. Ken Stevens who was
Dennis is blessed MIT and close friend. The 5.0 code unfortunately did not
yield the expected results, but we did learn a lot from the attempt. This
       there are even some changes to DECtalk that would change the way it
sounds from any particular version, such as Intonation that I am unwilling
to revert because I know for a fact that they caused loss of information. So
my goal is very simple I am working to create a very functional intelligible
DECtalk to put back out, I am unwilling to try and make it sound exactly
like any given person wants to. I have been through this before and the year
is very sensitive and if you directly comparing two versions side-by-side
you not testing anything but whether did the same and that is an exercise in
futility. T

Any specific issues I can address. Secondly as a word of warning to
listeners providing feedback. The other thing we've learned is that
listeners are excellent at deciding that something is not right, but are
absolutely terrible at exactly pinpointing the problem. The reason for this
is quite simple people judge the output as speech which it only kinda is, by
this I mean that a synthesizer can make mistakes that humans cannot possibly
do and as a consequence can't possibly recognize. An example of this is that
after so many years of working with it I have learned to hear a foreman
that's moving too rapidly, but most people cannot hear it. This is because
to make life easy we try to lead nor stuff that's not important in our
language, such as the nasal lifestyles in French or the retro flex ours in
American English which is Sheehan have a heckuva time hearing.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://bluegrasspals.com/pipermail/dectalk/attachments/20110803/7473271b/attachment.html>


More information about the Dectalk mailing list