speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

punctuation.scm


From: Pierre Lorenzon
Subject: punctuation.scm
Date: Mon, 03 Aug 2009 11:41:13 +0200 (CEST)

Hi Milan,

Thanks for your answer !

From: Milan Zamazal <address@hidden>
To: address@hidden
Subject: Re: punctuation.scm
Date: Mon, 03 Aug 2009 10:22:25 +0200

>>>>>> "PL" == Pierre Lorenzon <devel at pollock-nageoire.net> writes:
> 
>     PL> Before entering the method (utt.relation_tree utt 'Word)
>     PL> returns :
> 
>     >> -- 
> 
>     PL> ((("Bonjour" ((id "_2") (name "Bonjour"))))
>     PL>  (("." ((id "_3") (name ".")))))
> 
>     >> -- End 
> 
>     PL> And after the method has been applied we have :
> 
>     >> -- 
> 
>     PL> ((("Bonjour" ((id "_2") (name "Bonjour"))))
>     PL>  (("." ((id "_3") (name "."))))
>     PL>  (("." ((id "_4") (name ".")))))
> 
>     >> -- End 
> 
>     PL> It means that the "." has been duplicate. Why ? Is the French
>     PL> Tokenization not correct (I mean not compatible with
>     PL> speech-dispatcher.scm ?)
> 
> Hi Pierre,
> 
> try to add French language identifier(s) as returned by the call
> 
>   (Param.get 'Language)
> 
> to punctuation-punc-languages variable defined at the beginning of
> punctuation.scm file and tell me whether it helps.

  Indeed it might help since in this case the duplication no
  longer occurs ! But if we do not modify the code, punctuations
  will have their english pronounciation even in
  french. Personally I would say that I "don't care" but I know
  French users who won't share this point of view ! 


> 
> One of the practical problems with Festival is that there is no fixed
> text processing schema, so it's impossible to handle all possible
> situations in festival-freebsoft-utils nor it is easy to define what's
> correct or compatible.  We have to resolve each problem
> individually.

  We might say that english protocol is "the right one" and
  simply follow it. But their are other language
  implementations over which we do not have any control ! So
  let's treat the problem individually as you say : 

  My opinion is that according to the french language
  implementation in festival the best thing to do for french
  and punctuation mode set to all is to do NOTHING ! Hence I
  made following modification in punctuation.scm calling this
  franfest_token_punctuation_all method that does nothing for
  the moement. This method should be defined if french is
  selected as current language. 

  The advantage is that if you accept this modification to
  punctuation.scm code, further modificaitions if necessary
  will be able to be done in this method without modifiying
  punctuation.scm again.

  Problem is that if there exists some day another "festival
  frenchification" it will have to implement this
  method. Anyway for the moement there is no other french
  module for festival. 

;; -- Diff Mon Aug  3 11:13:44 2009

diff -c /home/devel/share/festival/lib/freebsoft/punctuation.scm.\~1.12.\~ 
/home/devel/share/festival/lib/freebsoft/punctuation.scm
*** /home/devel/share/festival/lib/freebsoft/punctuation.scm.~1.12.~    Mon Jul 
28 18:33:27 2008
--- /home/devel/share/festival/lib/freebsoft/punctuation.scm    Mon Aug  3 
11:06:06 2009
***************
*** 93,126 ****
  (define (punctuation-process-words utt)
    (cond
     ((eq? punctuation-mode 'all)
!     (if (member (intern (Param.get 'Language)) punctuation-punc-languages)
!         ;; Standard English lexicon has no notion of punctuation 
pronounciation
!         (do-relation-items (w utt Word)
!           (let ((trans (assoc (item.name w) punctuation-pronunciation)))
!             (if (and trans
!                      (not (word-mapping-of w)))
!                 (begin
!                   (item.set_name w (car (cdr trans)))
!                   (set! trans (cdr (cdr trans)))
!                   (while trans
!                     (let ((i (item.insert w (list (car trans)))))
!                       (item.append_daughter (item.parent (item.relation w 
'Token))
!                                             i))
!                     (set! trans (cdr trans)))))))
!         ;; We assume other languages don't insert punctuation words themselves
!         (do-relation-items (w utt Word)
!           (let* ((w* (item.relation w 'Token))
!                  (token (item.parent w*)))
!             (when (and (not (item.prev w*))
!                        (item.has_feat token 'prepunctuation))
!               (dolist (p (reverse (symbolexplode (item.feat token 
'prepunctuation))))
!                 (let ((i (item.insert w `(,p ((name ,p))) 'before)))
!                   (item.prepend_daughter token i))))
!             (when (and (not (item.next w*))
!                        (item.has_feat token 'punc))
!               (dolist (p (reverse (symbolexplode (item.feat token 'punc))))
!                 (let ((i (item.insert w `(,p ((name ,p))))))
!                   (item.append_daughter token i))))))))
     ;; Delete punctuation when punctuation-mode is none
     ;; (We actually don't delete the words as this might discard annotations
     ;; such as index marks.  So we just make the word names empty.)
--- 93,133 ----
  (define (punctuation-process-words utt)
    (cond
     ((eq? punctuation-mode 'all)
!     (cond
!      ((member (intern (Param.get 'Language))
!             punctuation-punc-languages)
!       ;; Standard English lexicon has no notion of punctuation pronounciation
!       (do-relation-items
!        (w utt Word)
!        (let ((trans (assoc (item.name w) punctuation-pronunciation)))
!        (if (and trans
!                 (not (word-mapping-of w)))
!            (begin
!              (item.set_name w (car (cdr trans)))
!              (set! trans (cdr (cdr trans)))
!              (while trans
!                     (let ((i (item.insert w (list (car trans)))))
!                       (item.append_daughter (item.parent (item.relation w 
'Token))
!                                             i))
!                     (set! trans (cdr trans))))))))
!      ;; For French language, the simplest seems to do nothing !
!      ((eq? (intern (Param.get 'Language)) 'french)
!       (franfest_token_punctuation_all utt))
!      ;; We assume other languages don't insert punctuation words themselves
!      (t (do-relation-items
!        (w utt Word)
!        (let* ((w* (item.relation w 'Token))
!               (token (item.parent w*)))
!          (when (and (not (item.prev w*))
!                     (item.has_feat token 'prepunctuation))
!                (dolist (p (reverse (symbolexplode (item.feat token 
'prepunctuation))))
!                        (let ((i (item.insert w `(,p ((name ,p))) 'before)))
!                          (item.prepend_daughter token i))))
!          (when (and (not (item.next w*))
!                     (item.has_feat token 'punc))
!                (dolist (p (reverse (symbolexplode (item.feat token 'punc))))
!                        (let ((i (item.insert w `(,p ((name ,p))))))
!                          (item.append_daughter token i)))))))))
     ;; Delete punctuation when punctuation-mode is none
     ;; (We actually don't delete the words as this might discard annotations
     ;; such as index marks.  So we just make the word names empty.)

Diff finished.  Mon Aug  3 11:13:11 2009


;; -- End Diff Mon Aug  3 11:13:44 2009



> 
> BTW, what package do you use for French in Festival?


  The so called FranFest package 
http://download.gna.org/lliaphon/franfest/franfest-1.96-beta-rc01.tar.bz2

        http://www.pollock-nageoire.net/franfest.html
        The latter is not available since the server is down
        for the moment.

        I was precisely doing a few updates and maintenance on
        FranFest. I knew for a long time that there was a
        problem with punctuations. The first one I had to solve
        was due to the so called "liaisons" in French. At that
        point FranFest conflicted with punctuation.scm because
        of the empty word "" inserted when punctuation mode was
        set to none. I solved this problem inside
        FranFest. Then appeared this problem of punctuation
        duplication that might be solved as described above.

        If you integrate my patch I'll see the commit message
        on the list and will update my cvs version.

        Regards

        PIerre
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]