speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

eSpeak - added punctuation and capitals indications


From: Jonathan Duddington
Subject: eSpeak - added punctuation and capitals indications
Date: Mon Sep 4 09:59:49 2006

In article <address@hidden>,
   Gary Cramblitt <address@hidden> wrote:

>  The main problem I noticed is that you do not provide a library or
> api for interfacing directly with espeak. Everything is done through
> command line and manipulation of configuration files....

> You are aware, I believe, that we are currently discussing a new TTS
> Engine API on address@hidden  Hynek and I are hoping
> that this new API will be approved soon, whereupon we will begin a
> major refactoring of Speech Dispatcher to use this api.  Therefore,
> it would be best if you could design an api for espeak that would be
> aligned with that specification.

Yes.  I intend to produce a version of eSpeak as a library, with direct
function calls from a Speech Dispatcher driver.  It would not include
the portaudio interface, but would return synthesized speech in memory
buffers, through a function callback.

I'm hoping to produce a proposed interface specification for the eSpeak
Library within a few days, which will take into account the latest
(28.4.06) Common TTS API draft. 

> Some of the major changes I would like to see in espeak are:

> 1.  Support for SSML.  I noticed that you now support embedded
> commands for controlling rate, pitch, volume, etc.  In order to use
> these, SD would have to parse the SSML itself and translate embedded
> SSML into your command syntax.  That would be inefficient and
> probably imperfect.  It would be better if espeak directly supported
> SSML.

The embedded commands (and the UTF-8 capability) are a preparation for
this.  The SSML tags need to be broken down into the lower level
commands to change pitch, rate, etc.  I had thought that Speech
Dispatcher needs to do this for synthesizers which don't process SSML
themselves, and that I might also be able to take advantage of S.D.
ability to this :-)  But if you think it's better for eSpeak to process
SSML directly, then that's OK, subject to a couple of problems which
I'll address with the interface proposal.

> 2.  Espeak needs to return audio directly to SD.  Writing a .wav file
> to disk is inefficient.  A more direct method, such as callbacks or
> socket io would be better.

Yes, see above.

> 3.  Support for index marking.  Ideally, espeak should provide
> callbacks and/or index mark information for the following:

> a.  Begin/end of entire message.
> b.  Begin/end of sentence.
> c.  Begin/end of word.
> d.  Custom index marks as <mark> tags in SSML.

Yes, eSpeak's callbacks would return both synthesized speech data and
index data to S.D.  The RISC OS version of eSpeak does currently
produces word index information which an application uses to move a
caret through the text and scroll it as it's spoken.

I hope to produce a library version of eSpeak soon, with basic features
(i.e. those equivalent to the current command-line version), so it
would be better to wait for that rather than doing any work on a S.D.
driver for the command-line version.

-- 
_______________________________________________
Speechd mailing list
address@hidden
http://lists.freebsoft.org/mailman/listinfo/speechd



reply via email to

[Prev in Thread] Current Thread [Next in Thread]