bug-readline
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-readline] Bug re: Unicode combining characters


From: Chet Ramey
Subject: Re: [Bug-readline] Bug re: Unicode combining characters
Date: Sat, 29 Jan 2011 20:46:53 -0500
User-agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.8) Gecko/20100802 Lightning/1.0b2 Thunderbird/3.1.2

On 1/22/11 4:17 AM, Keith Winstein wrote:
> Hello,
> 
> Readline 6.1 seems to have a problem when deleting a cell that consists of
> a Unicode base character followed by a combining character, if the next
> character in the line has the same base character.
> 
> It seems like it ignores the combining character (such as an accent), and
> takes a shortcut by deleting the cell *after* the one it's supposed to
> delete. This would be ok if the two cells really were identical, but it
> isn't if the first one has an accent attached to it and the second one
> doesn't.
> 
> How to reproduce:
> 
> (1) Run in a UTF-8 implementation-defined locale (like LANG=en_US.utf8) and
> a UTF-8 terminal emulator.
> 
> (2) Run readline-6.1/examples/rl
> 
> (3) Type "zyx̂xab".
> 
> (This string can be generated with perl -we '$|=1; binmode STDOUT, ":utf8";
> print "zyx"; print pack "U", 0x0302; print "xab\n"')
> 
> (4) Hit the left arrow three times.
> 
> (5) Hit the backspace key.
> 
> What you see: zyx̂ab
> 
> What you should see (and do see after Control-L): zyxab

Interesting.  When I run the perl command above, I get distinct characters
in the readline line buffer:

'z' 'y' 'x' '\204' '\130' 'x' 'a' 'b'

with rl_point on the second `x'.  Hitting backspace takes out the middle
two characters, since that's a valid character in the current character
set, leaving zyxxab.  I use the standard mbrtowc interface; it returns a
single wide character with length 2.  I'm not sure how to distinguish a
combining character from one that is not using only the standard Posix
interfaces.

The fact that mbrtowc and wcwidth can't seem to tell the redisplay code
that there's a combining character is the problem.  It fools redisplay
into miscalculating where the lines actually begin to differ.

(FWIW, I'm using MacOS X and Terminal.)

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU    address@hidden    http://cnswww.cns.cwru.edu/~chet/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]