speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Supporting eSpeak variants properly, or, how to offer synthesizer specif


From: Luke Yelavich
Subject: Supporting eSpeak variants properly, or, how to offer synthesizer specific settings to clients.
Date: Mon, 15 Jun 2015 09:59:14 +1000

Hey folks.
Of late we have been adding support to Speech Dispatcher to allow users to work 
with Extra espeak functionality in some way. Most recently a configuration 
option was added to the espeak driver to present the available voice variants 
along with the available voices. However, this is suboptimal. Last year, 
support was also added to Speech Dispatcher to support eSpeak's pitch range 
functionality. This has not yet been released in a tarball, and is only in git 
master, as it hasn't received sufficient testing, at least from me. Additions 
were made to the API to support espeak's pitch range functionality. Again, I 
think this is suboptimal.

Its likely that the various synthesizers Speech Dispatcher supports all offer 
some extra functionality, and it is also likely that sed functionality differs 
from synthesizer to synthesizer. It would be nice to offer all of this extra 
functionality to users, but I would rather not add additional API functionality 
to support a single synthesizer's feature, i.e espeak's pitch range and voice 
variants. What we could do however, is add an API that would retrieve, get, and 
set synthesizer specific functionality.

Here is a list of some early ideas as to what would be supported.

* Every setting must have a get, set is optional.
* Need to support int value range get/set, string value get/set, string list 
get.

Each synthesizer specific setting could be represented as a data structure in C 
along the lines of the following:

typedef struct {
        char *name;
        char *description; /* This should be localized */
        enum SynthSettingValueType get_type;
        enum SynthSettingValueType set_type;
        int min_value;
        int max_value;
        char **value_list;
        void *cur_value;
] SynthSetting;

In the C API, a NULL terminated array of this structure would be returned for 
all settings a synth offers.

The SynthSettingValueType enum would look something like this:

typedef enum {
        SYNTH_SETTING_VALUE_UNKNOWN = 0,
        SYNTH_SETTING_VALUE_NUMBER = 1,
        SYNTH_SETTING_VALUE_STRING = 2,
        SYNTH_SETTING_VALUE_STRING_LIST = 3 /* A list of strings for the user 
to choose from, i.e voice variants */
} SynthSettingValueType;

I don't see why we would need to support anything more than ints, as even now 
we are only dealing with ints for numerical values.

C API methods to work with these data types could be as follows:

SynthSetting **spd_synth_get_settings(SPDConnection *connection);
int spd_synth_set_setting(SPDConnection *connection, SynthSetting *setting, 
void *value);
void free_synth_settings(SynthSettings **settings);

I haven't yet given any thought to either the SSIP protocol, or the protocol 
between the server and drivers, but that should be trivial.

With the above, we could then allow synthesizers to provide as much specific 
functionality as is desirable. We cannot expect clients to be able to locally 
store these settings in their own config, so it would be up to Speech 
Dispatcher to do that. Fortunately, I think GSettings provides sufficient 
functionality to help with that task.

I'd be interested in any thoughts, suggestions, or questions anyone has. There 
is still a bit to get done before I get to implementing this functionality, and 
there are still pieces that likely need further fleshing out, like the SSIP 
protocol.

Luke



reply via email to

[Prev in Thread] Current Thread [Next in Thread]