[DECtalk] More Gnuspeech demos added
Carlos Fernandez
cf530a at gmail.com
Fri Oct 23 12:06:33 EDT 2015
I believe the problems with understanding the speech are somewhat
related to the technical input, but not entirely. I'm sorry, but I don't
agree about the flawless pronunciation. In terms of badly mangling the
order of phonemes, you're right, the system manages to avoid that very
well. However, the system's intelligibility is harmed by several
problems in its processing:
1. The accent is all wrong. The way the word is spoken often badly
accents and thus is very unnatural, even if the pronunciation is correct.
2. Bad phonemes for various letters make it very strange. Some are OK,
but several more are simply cringeworthy.
We will start with L and R, which unnecessarily add prior vowels. For
example, take the word "work". In English, it is of course pronounced
almost entirely without vowel, as W, R, K, with the R elongated as is
typical with -er and -or, -ir, and -ur in some cases. If the system was
too literal, it might pronounce the O, making something like woark.
However, GNUSpeech generates a sound like wairk, which I'm assuming is a
representation of an -er phoneme that is being improperly rendered as
first -e and afterword -r.
After this, we have a problem that arises when a C is used with
it's softer (similar to S) sound and, to a lesser extent, the actual S
itself. This always sounds less like a standard S and lacking in
high-frequency noise. It is thus a transitional case between the S and
TH (in English). It is much closer to the correct version, and maybe it
has something to do with the frequency of audio being used, but it
sounds to me when I am not focusing on the sounds individually like the
voice has a slight lisp, which does not make it easier to understand.
3. The voice seems hesitant on beginning to speak another word, but
quickly builds up steam while crossing the word, such that, to me, the
word is begun slowly but babbled out quickly. This creates a jerky
aspect that is a bit difficult to handle. I am very used to using
high-speed synthesizers, but they at least stay at one speed. Sometimes
the voice will continue at its previous speed if the words are in the
same sentence, but sometimes not.
You mentioned DecTalk, Eloquence, and eSpeak in the failure to pronounce
section, so I decided to try these on the same passage (I didn't make it
all the way through, of course, but quite a ways in).
Eloquence mispronounced one word, and it was copyleft. As this is more
of a play on words than an actual dictionary term, I understand and
accept this as a less-seen word, especially in the 1990s.
eSpeak pronounced everything impeccably. I could not find a single
error. It even pronounced GNU the way I do, with the G enunciated. I did
not regard the silencing of the G for other synthesizers as an error, as
a word gnu exists with this silent letter. The sound quality may not be
everyone's cup of tea, but the pronunciation is clearly not lacking.
Dectalk had a few words that were not quite mispronounced as
misaccented. It was understandable completely through my section despite
minor glitches that might make it slightly less desirable.
All three, in other words, could be listened to naturally and understood
completely, which I do not find true of GNUSpeech at this time.
On NeXT, it is true that OS X was mostly based on the NextSTEP Operating
System, but it has been independent of that original codebase and
updated by apple for sixteen years. Programs that functioned for
NextSTEP do not compile and work on OS X; the operating systems are
similar but very different. Therefore, when the page says that the NeXT
version is complete but the Linux one is not and gives information about
obtaining a computer on which to run the original 1990s versions of
NextSTEP, I do worry slightly on the logic behind this. This also leads
me to wonder from where the 1990s code was received, as it doesn't
purport to be from NeXT but some other company, and how (and indeed
whether) the project got the rights to use it. As OSX and Linux are my
most frequently-used operating systems, I have downloaded the code and
will further investigate.
Here are some quotes about NextSTEP that induced my questions. I have
bracketed some notes inside these as well:
"gnuspeech is currently fully available as a NextSTEP 3.x version in the
SVN repository along with the Gnu/Linux/GNUStep version, which is
incomplete though functional."
"The original NeXT User and Developer Kits are complete, but do not run
under OS X or under GNUStep on GNU/Linux. They also suffer from the
limitations of a slow machine, so that shorter TRM lengths (< ~15 cm)
cannot be used in real time, though the software synthesis option allows
this restriction to be avoided."
"In fact, you can use these passwords [why are there passwords at all?
Maybe this is a NeXT thing?]. But you need a NeXT computer, of
course—try [a commercial company, linked here, that sells vintage NeXT
computers and copies of the software. They recommend the latest version,
3.3, in order to avoid Y2K bugs.] if you'd like one.
Carlos
On 10/23/2015 09:42, Tony Baechler via Dectalk wrote:
> For your amusement and interest, I've added two more mp3 Gnuspeech
> demos, including one of the female voice. As always, comments
> appreciated.
>
> http://classicradio.us/iso/
>
More information about the Dectalk
mailing list