speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

symbolic voice-types versus synthesis voices


From: Tomas Cerha
Subject: symbolic voice-types versus synthesis voices
Date: Mon, 08 Nov 2010 12:11:02 +0100

Dne 7.11.2010 21:11, Andrei Kholodnyi napsal(a):
>> Unfortunately, I don't have a very solid proposal.  I just find it
>> slightly confusing that there are two ways to select a voice.
> 
> yes, completely agree. I also think we need to stay with one way.

Yes.

>> When I replied to Andrei's message yesterday, I commented that
>> it would be nice to do away with the synthesis voice entirely.  From a
>> user's perspective, it is perhaps easier to think about voices in general
>> terms, such as male1 and female1.
> 
> also agree, it makes sense to have the same look and feel for all
> voices regardless any module specifics.
> however at the moment we have SPDVoiceType which is restricted to 3
> types per gender per module.

Historically, SPDVoiceType was initialy the only method.  Then "list synthesis 
voices"
and "set synthesis voice" was added because it was a requirement of Orca, which 
exposes
the available synthesis voices directly to the user for selection.  This makes 
sense to
me, since users are used to refer to the voices by their real name.

> Probably it is better to use approach defined in SSML?
> 
> gender:      Enumerated values are: "male", "female", "neutral", or
> the empty string "".
> age:          preferred age in years (since birth) of the voice to
> speak the contained text.
> variant:      a preferred variant of the other voice characteristics
> to speak the contained text. (e.g. the second male child voice).
> name:        a processor-specific voice name to speak the contained text.
> languages: list of languages the voice is desired to speak.

Yes, voice selection by properties might be a good thing, but I believe it is an
independent feature.  We can allow retrieving of voice properties for a 
particular
synthesis voice.  If the client has this information, it can select the best 
matching
voice client-side.  Or we can provide some additional convenience functions, 
such as
limiting the result of synth voice listing by given criteria.

> but it also gives a big diversity in naming, since different synths
> name voices differently.

But does this diversity matter?  If these diverse names are exposed to the end 
user, I
think it is still better than exposing nicely aligned symbolic names, which 
carry no
information (except for the gender).  The client can also expose voice 
properties to the
user if this is implemented (and available).

So if a had to choose between the two methods, I'd choose synthesis voices + 
voice
params, since this allows greater flexibility.  It also allows symbolic voice 
names to
be implemented client side, while it is not possible the other way round 
(implement
synthesis voices client side if the server only supports symbolic voice types).

Best regards, Tomas



reply via email to

[Prev in Thread] Current Thread [Next in Thread]