[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-readline] Bug re: Unicode combining characters
From: |
Chet Ramey |
Subject: |
Re: [Bug-readline] Bug re: Unicode combining characters |
Date: |
Sat, 29 Jan 2011 20:46:53 -0500 |
User-agent: |
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.8) Gecko/20100802 Lightning/1.0b2 Thunderbird/3.1.2 |
On 1/22/11 4:17 AM, Keith Winstein wrote:
> Hello,
>
> Readline 6.1 seems to have a problem when deleting a cell that consists of
> a Unicode base character followed by a combining character, if the next
> character in the line has the same base character.
>
> It seems like it ignores the combining character (such as an accent), and
> takes a shortcut by deleting the cell *after* the one it's supposed to
> delete. This would be ok if the two cells really were identical, but it
> isn't if the first one has an accent attached to it and the second one
> doesn't.
>
> How to reproduce:
>
> (1) Run in a UTF-8 implementation-defined locale (like LANG=en_US.utf8) and
> a UTF-8 terminal emulator.
>
> (2) Run readline-6.1/examples/rl
>
> (3) Type "zyx̂xab".
>
> (This string can be generated with perl -we '$|=1; binmode STDOUT, ":utf8";
> print "zyx"; print pack "U", 0x0302; print "xab\n"')
>
> (4) Hit the left arrow three times.
>
> (5) Hit the backspace key.
>
> What you see: zyx̂ab
>
> What you should see (and do see after Control-L): zyxab
Interesting. When I run the perl command above, I get distinct characters
in the readline line buffer:
'z' 'y' 'x' '\204' '\130' 'x' 'a' 'b'
with rl_point on the second `x'. Hitting backspace takes out the middle
two characters, since that's a valid character in the current character
set, leaving zyxxab. I use the standard mbrtowc interface; it returns a
single wide character with length 2. I'm not sure how to distinguish a
combining character from one that is not using only the standard Posix
interfaces.
The fact that mbrtowc and wcwidth can't seem to tell the redisplay code
that there's a combining character is the problem. It fools redisplay
into miscalculating where the lines actually begin to differ.
(FWIW, I'm using MacOS X and Terminal.)
Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU address@hidden http://cnswww.cns.cwru.edu/~chet/