emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: paragraphs.el: do forward-sentence and friends not work?


From: Richard Stallman
Subject: Re: paragraphs.el: do forward-sentence and friends not work?
Date: Thu, 14 Feb 2008 19:02:22 -0500

    Sentence tokenization is a known problem. You can throw machine  
    learning algorithms at it, but that's not a viable option in our case.  
    However, Grefenstette&Tapanainen (1994) examined this in detail for  
    English, using the Brown corpus. They basically say that using a small  
    lexicon of common abbreviations, they can classify 99.1% of all  
    periods correctly. Even without the lexicon, you can achieve 97.7%  
    accuracy (on English) using the right regular expressions, and I think  
    this will be similar for other languages as well. I think that's good  
    enough for M-e and M-a.

    http://citeseer.ist.psu.edu/grefenstette94what.html

I encourage someone to implement this; then we will see how well it
works.  If it works well, we could set sentence-end-double-space to
nil for languages where this feature makes it an improvement.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]