[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [be] unicode hyphen
From: |
Birch Champeon |
Subject: |
Re: [be] unicode hyphen |
Date: |
Fri, 6 Nov 2009 04:32:55 -0500 |
Thanks for all the info guys. I'll ask a question on the pango list
and then post a bug if needed
On Fri, Nov 6, 2009 at 3:52 AM, Teus Benschop <address@hidden> wrote:
> The spelling checker library that is being used is enchant. However, the
> word boundaries are determined using the functions
> "gtk_text_iter_forward_word_end" and "gtk_text_iter_backward_word_start"
> as provided in the Gtk library. The information on one of these
> functions says: "Word breaks are determined by Pango and should be
> correct for nearly any language (if not, the correct fix would be to the
> Pango word break algorithms).". Since Pango determines the word breaks,
> what might help is to report this to the Pango project. It is at
> http://www.pango.org/. Hope that this helps. Teus.
>
>
> On Thu, 2009-11-05 at 23:35 -0500, Doug Glidden wrote:
>> Hmm, that's a tough one. As far as I can tell, Unicode does not
>> specify any hyphen character that must not act as a word boundary
>> (except for the soft hyphen, but that would not fulfill your needs
>> because it is not visible except when it is followed by a line break);
>> pretty much any of the hyphen characters may be tailored not to act as
>> a word boundary, though. I would say that should be something that
>> the localization of the spell checker should take care of, so if none
>> of the characters works (see the list below—in particular notice that
>> the actual hyphen Unicode character is not what you get when you type
>> a hyphen on your keyboard; that is instead the hyphen-minus
>> character), you may want to submit this as a bug/feature request with
>> the project for whatever spell checking library is used by Bibledit
>> (I'm assuming BE uses one of the many open-source spelling libraries
>> and not a home-grown one). In particular, I would say that a person
>> who uses a "non-breaking hyphen" probably typically expects its
>> "non-breaking" aspect to apply to words as well as lines (although in
>> reality the Unicode standard requires only that a non-breaking hyphen
>> prevent line breaks).
>>
>> Doug
>>
>> P.S. The complete list of hyphen characters in Unicode is as follows:
>>
>> Hyphen or minus sign (hyphen-minus or hyphus) - U+002D
>> Soft (or discretionary) hyphen - U+00AD
>> Hyphen - U+2010
>> Non-breaking hyphen - U+2011
>> Hyphen bullet - U+2043
>>
>> See also the set of dash characters (U+2012 through U+2015), but note
>> that these are not the same size as a standard hyphen.
>>
>> On Thu, Nov 5, 2009 at 9:55 PM, Birch Champeon
>> <address@hidden> wrote:
>> I'm working with a bunch of languages that use the hyphen
>> within their
>> words. The hyphen is seen as a word break in BE. Does anyone
>> know if
>> there a unicode character that looks like a hyphen, but will
>> be viewed
>> as a regular character? I've tried a non-breaking hyphen and
>> the
>> spellchecker still sees it as a word break.
>>
>> Thanks
>> Birch
>>
>>
>>
>
>
>
>