speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Speech Dispatcher roadmap discussion.


From: Bohdan R . Rau
Subject: Speech Dispatcher roadmap discussion.
Date: Thu, 16 Oct 2014 15:17:23 +0200

W dniu 2014-10-13 10:45, Bohdan R. Rau napisa?(a):
> PART ONE

PART TWO

1. Introducing private sessions

As speech-dispatcher would be used to retrieve speech waveforms for 
different purposes - it may interfere with normal sessions. Fetching (or 
storing) data must be uninterruptable by others (so commands like CANCEL 
ALL won't cancel generating data for fetch). In other side - long part 
of text sent to speech-dispatcher may cause long lags in applications 
like screenreader or speech notification - synthesizer may be slow, and 
synthesizing 10 minutes of speach may lock these synthesizers (like 
Ivona) for minute.

So there should be possible to start speech-dispatcher in private mode 
- for example with "--private" command line parameter. For me better 
solution is create softlink to different name (for example 
/usr/bin/speechd-private or similar) - it make possible to use pkill or 
killall commands to kill only private or only standard sessions.

In private mode speech-dispatcher reads from stdin and writes to 
stdout. All logs are redirected to stderr, pid files are not used. 
Closing any of stdin/stdout streams causes speech-dispatcher to quit 
immediately.

In private mode all speech-synthesis commands must be prefixed with 
FILE or FETCH. Also speech-dispatcher answering for CAPA command must 
not announce SPEECH capability.

As not all modules are able to produce waveforms, I propose new 
AddPrivateModule line in configuration file. AddPrivateModule lines are 
ignored in standard mode. If AddPrivateModule line is found in private 
mode, AddModule lines are ignored.

If possible, in private mode no modules are initialized at start (in 
almost all cases we will use only single module per session). Instead 
modules will be initialized on demand. One exception - if there is only 
one AddPrivateModule in configuration, this module may be initialized at 
start (because we have no choice).

By the way - there should be possible to quit speech-dispatcher 
automatically when client closes connection. For example: if I have 
speech enabled in login manager - after succeful login Orca quits, but 
speech-dispatcher still seats on socket, consumes memory and waits for 
system shutdown. I see no reason to miss a kilobyte of memory for the 
program I do not use.

2. FILE prefix

Each speech-synthesis command (SPEAK, CHAR etc.) may be prefixed by 
FILE command followed by filename. Filename should be absolute path, or 
will be relative to home directory of user running speech-dispatcher. 
Server responses:

2xx OK FILE STORED

or:

5xx FILE ERROR:<errno>:<error string>

This is simplest way to create prerecorded waves for various 
applications. I personally used prerecorded hours and minutes in my 
Symbian audiobook player as talking clock.

We must distinguish between FILE capability of server and module. Even 
if module has only FETCH capability (no FILE), server must fetch data 
from module and store it in file. So if module has only FETCH 
capability, for CAPA command server must response with both FETCH and 
FILE.

Format of file depends on module internal capabilities, but in any case 
WAV format must be accepted. Other formats (like mp3 or ogg) would be 
recognized by filename extension, but it's no warranty.

There is one exception (derived from Mbrola).

If filename has form "-" (dash), possibly followed by extension, server 
should return file content of file on listening socket, for example:

FILE -.wav CHAR a

2xx-OK FILE DATA FOLLOWS:<data length>
<binary data followed by LF>
200 OK

It may be usable, if we are connected to speech-dispatcher on distant 
machine.

3. FETCH (server-module protocol).

As we can use FETCH command for different purposes, in server-module 
communication FETCH has extra parameter: FILE or REALTIME. Depending on 
internal module capabilities this parameter may be ignored, so server 
should accept both responses.

In FILE mode, module responses simply:

2xx-OK FILE DATA FOLLOWS:<data length>
<binary data followed by LF>
2xx OK

or:

2yy-OK RAW DATA FOLLOWS:<data length>:<format specification>
<binary data followed by LF>
2yy OK

In REALTIME mode module always sends chunks of raw data, probably 
interleaved with SYNC, AUTOSYNC and MOUTH responses. Typical response:

2zz-OK CHUNKED DATA FOLLOWS:<format specification>
2zz-CHUNK:<chunk length>
<binary data followed by LF>
...
2zz OK

SYNC, AUTOSYNC and MOUTH are sent before any chunk (if needed).
Chunks are sent as fast as possible. Especially - if synthesizer can 
produce wave in realtime (like Mbrola, eSpeak or linux versions of 
Ivona), module must send synthesized wave in small parts.

REALTIME mode may be used by application for its internal purposes - in 
this case data sent by module are simply received by server and sent to 
application as fast as possible. But primary goal is to play received 
waveform by internal server audio system. We will be able to create 
modules completely audio-system independent.

Server must be able to convert chunked data (if module responses 
realtime on file request) into file data requested by application as 
answer to FILE prefixed command.

4. FETCH (SSIP protocol)

FETCH prefix is intended to use by applications having its own audio 
systems. Good example is subtitle reader - each dialogue is prefetched, 
possibly postprocessed and played synchronously with original audio 
track of watched movie. Another example - robot driven by external linux 
box, with internal speaker and mouth driven by small servomotors.

Command prefixed by FETCH always returns raw data, but in two forms: 
realtime and all. So FETCH prefix may be followed by ALL parameter.

In realtime (default) mode application expect data format similar to 
FETCH REALTIME module command. If module answers with realtime response, 
it's simply copied from module to client. If module responses as FILE, 
server must convert this response into realtime, adding necessary SYNC 
(begin-end) or AUTOSYNC (0-length) lines. Similar - if module has only 
FILE capability (no FETCH), server must convert this response into 
realtime. So our robot will talk...

Response from server should be same as response from module in 
REALTIME.

If FETCH prefix is followed by ALL - application expects synthesized 
waveform of full text in one big chunk in raw format. SYNC or MOUTH 
should be removed by server. Even if module has only FILE capability (no 
FETCH), server must convert temporary file into required response. So 
our subtitle reader will work...

Response from server must be like:

2yy-RAW DATA FOLLOWS:<data length>:<format specification>
<binary data followed by LF>
2yy OK DATA SENT

Format specification is not the subject of this discussion.

I hope - ignoring my poor English - all should be clear now...

Remarks:

a) subtitle reader exists (my applications SubAloud and Milena-ABC), 
but is limited to some synthesizers. In offline mode (Milena-ABC creates 
new audio track with spoken subtitles and precisely controlled level of 
original soundtrack) only Polish language is supported. SubAloud is 
practically not tested, I personally never watch movies on linux box, 
but I had some positive responses from ocassional testers.

b) robot speaking with Mbrola synthesizer and simple function 
converting Mbrola phonemes into instructions for servomotors also 
exists.


ethanak
-- 
http://milena.polip.com/ - Pa pa, Ivonko!



reply via email to

[Prev in Thread] Current Thread [Next in Thread]