bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#64420: string-width of … is 2 in CJK environments


From: Eli Zaretskii
Subject: bug#64420: string-width of … is 2 in CJK environments
Date: Thu, 13 Jul 2023 08:23:43 +0300

> From: Yuan Fu <casouri@gmail.com>
> Date: Wed, 12 Jul 2023 14:11:14 -0700
> Cc: Eli Zaretskii <eliz@gnu.org>,
>  64420@debbugs.gnu.org
> 
> Here’s what I know: In a CJK “context”, “…” is supposed to be one ideograph 
> wide (like all CJK punctuation), ie, width=2.
> 
> However, it’s not as simple as “they used the wrong font”, because both Latin 
> and CJK use the same Unicode code point for “…”, but expect different glyphs. 
> In publication, this is solved by manually marking the text with style or 
> font, so the software uses the desired glyph. Terminals and editors don’t 
> have this luxury.
> 
> BTW it’s not just ellipses, CJK and Latin shares the same code points for 
> quotes, em dash and middle dot while expecting different glyphs for them.
> 
> Since most terminal and editor (especially terminal) quires ASCII/Latin font 
> before falling back to CJK fonts, I expect most terminal and editor to show 
> the Latin glyph for “…” (width=1) most of the time.
> 
> So practically, it would be correct most of the time if we assume the 
> following code points have a width of 1, regardless of locale:
> 
> – HORIZONTAL ELLIPSIS …
> – LEFT/RIGHT DOUBLE QUOTATION MARK “”
> – LEFT/RIGHT SINGLE QUOTATION MARK ‘’
> – EM DASH —
> – MIDDLE DOT ·
> 
> But obviously if someone configures their terminal or editor to use CJK font 
> first, these characters MIGHT have width = 2. I said MIGHT because there are 
> plenty CJK fonts that uses the 1-width Latin glyph for these characters by 
> default.
> 
> It might be helpful to have a wrapper string-width that considers heuristics 
> like this, while string-width goes strictly by Unicode and locale.

Thanks.  My conclusion from the above is a bit different: we should
introduce a user option to modify the behavior of
use-cjk-char-width-table, such that users who have fonts where these
characters are not double-width could have the width of these
characters left at their Unicode values.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]