[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] rl_change_case: skip over invalid mbchars
From: |
Grisha Levit |
Subject: |
Re: [PATCH] rl_change_case: skip over invalid mbchars |
Date: |
Thu, 23 May 2024 20:56:07 -0400 |
On Thu, May 23, 2024 at 4:11 PM Chet Ramey <chet.ramey@case.edu> wrote:
>
> On 5/23/24 3:25 PM, Grisha Levit wrote:
> > On Thu, May 23, 2024 at 10:25 AM Chet Ramey <chet.ramey@case.edu> wrote:
> >>
> >> On 5/21/24 2:42 PM, Grisha Levit wrote:
> >>> Avoid using (size_t)-1 as an offset.
> >>
> >> I can't reproduce this on macOS. Where is the code that's using -1 as an
> >> offset?
> >
> > The loop in rl_change_case does the following:
> >
> > rl_change_case(count=-1, op=2) at text.c:1483:9
> > 1481 while (start < end)
> > 1482 {
> > -> 1483 c = _rl_char_value (rl_line_buffer, start);
> >
> > _rl_char_value(buf="\xc0", ind=0) at mbutil.c:493:23
> > 491 l = strlen (buf);
> > 492 if (ind + 1 >= l)
> > -> 493 return ((WCHAR_T) buf[ind]);
> >
> > (wchar_t) c = L'À'
>
> Nope, this is where you lose me. Using lldb with an input file created
> from the string you sent, I get c = (wchar_t) L'\U0000fffd', which fails
> the rl_walphabetic test. Even running the command as you posted it just
> prints `?'. What os are you using?
I think this is lldb being too clever and showing _any_ negative wchar_t
as the unicode replacement character.
(lldb) p (wchar_t)-1
(wchar_t) L'\U0000fffd'
(lldb) p (wchar_t)-64
(wchar_t) L'\U0000fffd'
The issue here is that on arm64 linux, char is unsigned, so the (wchar_t)
conversion of a plain char in the '\x80'-'\xFF' range yields a valid wide
character.
(lldb) p (wchar_t)( signed char)'\xC0' == L'\u00C0'
(bool) false
(lldb) p (wchar_t)(unsigned char)'\xC0' == L'\u00C0'
(bool) true