bug-libunistring
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-libunistring] Hangul Jamo vowels and trailing consonants should


From: Bruno Haible
Subject: Re: [bug-libunistring] Hangul Jamo vowels and trailing consonants should probably be 0 width
Date: Thu, 30 Dec 2021 01:26:45 +0100

I wrote:
>   - GNOME vte based terminal emulators are probably 50% today,
>   - konsole comes second,

So I tested how the attached file renders in gnome-terminal and
konsole.
  - In gnome-terminal the precomposed and decomposed lines render
    identically.
  - In konsole it doesn't, but in kate it does, therefore konsole
    will probably support it right as well, within a few years.

Luis Javier Merino wrote:
> Yes. wcwidth() interfaces lack context. wcswidth()-style interfaces
> are better in that regard.

But if we start to modify wcswidth(), mbswidth(), and all various
functions that evaluate the displayed length of a string to consider
context-dependent widths, things are going to get very complex.

> > 2) People argue about the use of these Hangul Jamo characters when
> > they form a complete Hangul syllable, and that in this case the
> > total width should be 2, and therefore 2 = 2 + medial + final the
> > medial and final parts should have width 0.
> >
> > But in this case people would be using a precomposed Hangul syllable.
> 
> The Mac OS X filesystem stores filenames as NFD, which would separate
> syllables into component Jamos. See:
> 
> https://github.com/neovim/neovim/issues/4476

Indeed, this shows that the problem affects many users.

> > What I am more concerned about: When you look at the code charts
> > https://www.unicode.org/charts/PDF/U1100.pdf
> > https://www.unicode.org/charts/PDF/UD7B0.pdf
> > you see that there are glyphs.
> > - In which circumstances are these characters used individually?
> >   Maybe in a text book for Korean children?
> > - How are they supposed to be rendered in these situations? Surely
> >   as glyphs of width 2, no?
> 
> To render as separate components, there are several options:
> 
>  - Use the non-conjoining forms from the Hangul Compatibility Jamo:
> U+3130–U+318F block.

Good point. So, we can assume that texts in which the conjoining
behaviour is undesired will use these characters U+3130–U+318F.

The only remaining argument for having Hangul Jamo vowels and trailing
consonants be marked as having width 2 is Unicode's EastAsianWidth.txt file.
But the corresponding explanation <https://www.unicode.org/reports/tr11/>
makes it clear that the purpose of this file is to guarantee compatibility
with traditional Japanese rendering. But such rendering did not know about
the Hangul conjoining behaviour; therefore what the EastAsianWidth.txt
says about these characters is irrelevant.

I'm therefore doing the requested change.

https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=8026587b94e4274f3406a36bc89348a24ea86b6a

Bruno

Attachment: Hangul.utf-8
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]