bibledit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [be] unicode hyphen


From: Teus Benschop
Subject: Re: [be] unicode hyphen
Date: Fri, 06 Nov 2009 10:52:50 +0200

The spelling checker library that is being used is enchant. However, the
word boundaries are determined using the functions
"gtk_text_iter_forward_word_end" and "gtk_text_iter_backward_word_start"
as provided in the Gtk library. The information on one of these
functions says: "Word breaks are determined by Pango and should be
correct for nearly any language (if not, the correct fix would be to the
Pango word break algorithms).". Since Pango determines the word breaks,
what might help is to report this to the Pango project. It is at
http://www.pango.org/. Hope that this helps. Teus.


On Thu, 2009-11-05 at 23:35 -0500, Doug Glidden wrote:
> Hmm, that's a tough one.  As far as I can tell, Unicode does not
> specify any hyphen character that must not act as a word boundary
> (except for the soft hyphen, but that would not fulfill your needs
> because it is not visible except when it is followed by a line break);
> pretty much any of the hyphen characters may be tailored not to act as
> a word boundary, though.  I would say that should be something that
> the localization of the spell checker should take care of, so if none
> of the characters works (see the list below—in particular notice that
> the actual hyphen Unicode character is not what you get when you type
> a hyphen on your keyboard; that is instead the hyphen-minus
> character), you may want to submit this as a bug/feature request with
> the project for whatever spell checking library is used by Bibledit
> (I'm assuming BE uses one of the many open-source spelling libraries
> and not a home-grown one).  In particular, I would say that a person
> who uses a "non-breaking hyphen" probably typically expects its
> "non-breaking" aspect to apply to words as well as lines (although in
> reality the Unicode standard requires only that a non-breaking hyphen
> prevent line breaks).
> 
> Doug
> 
> P.S.  The complete list of hyphen characters in Unicode is as follows:
> 
> Hyphen or minus sign (hyphen-minus or hyphus) - U+002D
> Soft (or discretionary) hyphen - U+00AD
> Hyphen - U+2010
> Non-breaking hyphen - U+2011
> Hyphen bullet - U+2043
> 
> See also the set of dash characters (U+2012 through U+2015), but note
> that these are not the same size as a standard hyphen.
> 
> On Thu, Nov 5, 2009 at 9:55 PM, Birch Champeon
> <address@hidden> wrote:
>         I'm working with a bunch of languages that use the hyphen
>         within their
>         words.  The hyphen is seen as a word break in BE.  Does anyone
>         know if
>         there a unicode character that looks like a hyphen, but will
>         be viewed
>         as a regular character?  I've tried a non-breaking hyphen and
>         the
>         spellchecker still sees it as a word break.
>         
>         Thanks
>         Birch
>         
>         
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]