speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

punctuation.scm


From: Pierre Lorenzon
Subject: punctuation.scm
Date: Sat, 01 Aug 2009 12:19:53 +0200 (CEST)

Hi,

festival-freebsoft-utils version is cvs version not older than
15 days on my system. 


More precisely, my question is about
`punctuation-process-words' method and more more precisely
about its behavior for other languages than English. This
method is involved in the `Token' step of the utterance
treatement. Here is what it produces for French and punctuation
mode set to all :

Before entering the method (utt.relation_tree utt 'Word)
returns :

> -- 

((("Bonjour" ((id "_2") (name "Bonjour"))))
 (("." ((id "_3") (name ".")))))

> -- End 

And after the method has been applied we have :

> -- 

((("Bonjour" ((id "_2") (name "Bonjour"))))
 (("." ((id "_3") (name "."))))
 (("." ((id "_4") (name ".")))))

> -- End 

It means that the "." has been duplicate. Why ? Is the French
Tokenization not correct (I mean not compatible with
speech-dispatcher.scm ?) Here is the output of
(utt.relation_tree utt 'Token) after `Initialize' `Text' and
`Token_POS' have been applied :

> -- 

((("Bonjour"
   ((id "_1")
    (name "Bonjour")
    (punc ".")
    (whitespace "")
    (prepunctuation "")))))

> -- End 

Regards,

Pierre





reply via email to

[Prev in Thread] Current Thread [Next in Thread]