[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] rl_change_case: skip over invalid mbchars
From: |
Grisha Levit |
Subject: |
Re: [PATCH] rl_change_case: skip over invalid mbchars |
Date: |
Thu, 23 May 2024 15:25:19 -0400 |
On Thu, May 23, 2024 at 10:25 AM Chet Ramey <chet.ramey@case.edu> wrote:
>
> On 5/21/24 2:42 PM, Grisha Levit wrote:
> > Avoid using (size_t)-1 as an offset.
>
> I can't reproduce this on macOS. Where is the code that's using -1 as an
> offset?
The loop in rl_change_case does the following:
rl_change_case(count=-1, op=2) at text.c:1483:9
1481 while (start < end)
1482 {
-> 1483 c = _rl_char_value (rl_line_buffer, start);
_rl_char_value(buf="\xc0", ind=0) at mbutil.c:493:23
491 l = strlen (buf);
492 if (ind + 1 >= l)
-> 493 return ((WCHAR_T) buf[ind]);
(wchar_t) c = L'À'
This seems questionable since a string consisting of \xC0, and a string
actually representing \u00C0 (\xC3\x80) will both return the same thing.
The next check passes, since C is LATIN CAPITAL LETTER A WITH GRAVE
rl_change_case(count=-1, op=2) at text.c:1487:28
-> 1487 if (_rl_walphabetic (c) == 0)
1488 {
1489 inword = 0;
1490 start = next;
1450 continue;
_rl_walphabetic(wc=L'À') at util.c:89:5
88 if (iswalnum (wc))
-> 89 return (1);
So we call mbrtowc on the same string position and since this is not a
valid multibyte character, (size_t)-1 is stored in M.
rl_change_case(count=-1, op=2) at text.c:1512:22
-> 1512 m = MBRTOWC (&wc, rl_line_buffer + start, end - start,
&mps);
(size_t) m = 18446744073709551615
Then we again interpret \xC0 as if it were \u00C0:
rl_change_case(count=-1, op=2) at text.c:1514:20
1513 if (MB_INVALIDCH (m))
-> 1514 wc = (WCHAR_T)rl_line_buffer[start];
(wchar_t) wc = L'À'
And lowercase that character, storing its length in MLEN.
rl_change_case(count=-1, op=2) at text.c:1517:11
-> 1517 nwc = (nop == UpCase) ? _rl_to_wupper (wc) :
_rl_to_wlower (wc);
rl_change_case(count=-1, op=2) at text.c:1524:28
-> 1524 mlen = WCRTOMB (mb, nwc, &ts);
(wchar_t) nwc = L'à'
(int) mlen = 2
Since WC and NWC are different, and M (being (size_t)-1) is greater than MLEN:
rl_change_case(count=-1, op=2) at text.c:1544:13
1541 else if (m > mlen)
1542 {
1543 memcpy (s, mb, mlen);
-> 1544 memmove (s + mlen, s + m, (e - s) - m);
So the second arg to memmove is a pointer one behind S.