[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#64420: string-width of … is 2 in CJK environments
From: |
Eli Zaretskii |
Subject: |
bug#64420: string-width of … is 2 in CJK environments |
Date: |
Thu, 13 Jul 2023 08:23:43 +0300 |
> From: Yuan Fu <casouri@gmail.com>
> Date: Wed, 12 Jul 2023 14:11:14 -0700
> Cc: Eli Zaretskii <eliz@gnu.org>,
> 64420@debbugs.gnu.org
>
> Here’s what I know: In a CJK “context”, “…” is supposed to be one ideograph
> wide (like all CJK punctuation), ie, width=2.
>
> However, it’s not as simple as “they used the wrong font”, because both Latin
> and CJK use the same Unicode code point for “…”, but expect different glyphs.
> In publication, this is solved by manually marking the text with style or
> font, so the software uses the desired glyph. Terminals and editors don’t
> have this luxury.
>
> BTW it’s not just ellipses, CJK and Latin shares the same code points for
> quotes, em dash and middle dot while expecting different glyphs for them.
>
> Since most terminal and editor (especially terminal) quires ASCII/Latin font
> before falling back to CJK fonts, I expect most terminal and editor to show
> the Latin glyph for “…” (width=1) most of the time.
>
> So practically, it would be correct most of the time if we assume the
> following code points have a width of 1, regardless of locale:
>
> – HORIZONTAL ELLIPSIS …
> – LEFT/RIGHT DOUBLE QUOTATION MARK “”
> – LEFT/RIGHT SINGLE QUOTATION MARK ‘’
> – EM DASH —
> – MIDDLE DOT ·
>
> But obviously if someone configures their terminal or editor to use CJK font
> first, these characters MIGHT have width = 2. I said MIGHT because there are
> plenty CJK fonts that uses the 1-width Latin glyph for these characters by
> default.
>
> It might be helpful to have a wrapper string-width that considers heuristics
> like this, while string-width goes strictly by Unicode and locale.
Thanks. My conclusion from the above is a bit different: we should
introduce a user option to modify the behavior of
use-cjk-char-width-table, such that users who have fonts where these
characters are not double-width could have the width of these
characters left at their Unicode values.
- bug#64420: string-width of … is 2 in CJK environments, (continued)
- bug#64420: string-width of … is 2 in CJK environments, Dmitry Gutov, 2023/07/10
- bug#64420: string-width of … is 2 in CJK environments, Eli Zaretskii, 2023/07/11
- bug#64420: string-width of … is 2 in CJK environments, Dmitry Gutov, 2023/07/11
- bug#64420: string-width of … is 2 in CJK environments, Eli Zaretskii, 2023/07/11
- bug#64420: string-width of … is 2 in CJK environments, Dmitry Gutov, 2023/07/11
- bug#64420: string-width of … is 2 in CJK environments, Dmitry Gutov, 2023/07/12
- bug#64420: string-width of … is 2 in CJK environments, Yuan Fu, 2023/07/12
- bug#64420: string-width of … is 2 in CJK environments,
Eli Zaretskii <=
- bug#64420: string-width of … is 2 in CJK environments, Dmitry Gutov, 2023/07/26
bug#64420: string-width of … is 2 in CJK environments, SUNG TAE KIM, 2023/07/14
bug#64420: string-width of … is 2 in CJK environments, SUNG TAE KIM, 2023/07/14
bug#64420: string-width of … is 2 in CJK environments, SUNG TAE KIM, 2023/07/16