speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

TTS / Speech Secognition: Text Preprocessing


From: Peter Grasch
Subject: TTS / Speech Secognition: Text Preprocessing
Date: Mon, 05 Sep 2011 22:16:37 +0200

Hi everybody,

I'm currently working on the ATSPI integration of simon and as part of that, I 
ran into a (as it turns out quite popular) issue. To know which voice commands 
to listen for, I need to not only read but also "understand" text in open 
applications. 
For example, to allow the user to control the calculator with voice commands, 
I need to know that "1" becomes "One" when spoken out loud and that "+" is 
called "plus", etc.

Screen readers require a similar logic when reading text out loud so I've been 
looking into how things are handled there to get an idea of how to implement 
this in simon.

However, as I started to look around and ask people how they are implementing 
this in their applications (Orca, Jovie), I noticed that this is done in a lot 
of different places.
There are filters in both Orca and Jovie (mostly for user specific extensions) 
but eSpeak does seem to do most of the heavy lifting itself internally.

A quick google showed that quite a few people were trying to disable or change 
certain abbreviations (Drive / Doctor for example) and were having problems 
with it because it's currently done in a very low layer.

Now I know it's a huge undertaking but I was wondering, if it wouldn't make a 
lot of sense to move this language processing in a small, dedicated, shared 
library?

The library could provide a pretty simple API to "humanify" text input 
according to a set of language specific rules that would ideally be 
administrated through pluggable frontends for Orca, Jovie, simon, etc.

Of course, there is also some stuff that needs to be done by application 
developers (as Jeremy Whiting pointed out for example, "+" can mean "plus" in 
a calulator or "add contact" in an instant messenger) but at least the basic 
stuff (abbreviations, symbols, numbers) should be doable in an abstract, 
multi-purpose way.

And as language processing always requires a tremendous amount of effort for 
each language I think it would imho be great to do this in a central place...

Best regards,
Peter



reply via email to

[Prev in Thread] Current Thread [Next in Thread]