[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-readline] Search with multibyte characters broken with custom rl_ge
From: |
Ulf Magnusson |
Subject: |
[Bug-readline] Search with multibyte characters broken with custom rl_getc_function() (with analysis) |
Date: |
Tue, 24 Feb 2015 05:13:08 +0100 |
text.c has the following comment:
/* Bytes too short to compose character, try to wait for next byte.
Restore the state of the byte sequence, because in this case the
effect of mbstate is undefined. */
However, there doesn't seem to be corresponding handling for the search case.
If e.g. the first byte of an UTF-8 'ö' (0xC3 0xB6) character is input during
incremental search, it ends up in the following code in input.c:
if (_rl_get_char_len (mb, &ps) == -2)
{
/* Read more for multibyte character */
RL_SETSTATE (RL_STATE_MOREINPUT);
c = rl_read_key ();
RL_UNSETSTATE (RL_STATE_MOREINPUT);
}
rl_read_key() will in turn call rl_getc_function() -- without checking if
there's any input available. The end result for the test case below is that
'mb' gets set to { 0xC3, 0xC3 } (twice the first byte), and at this point
things are clearly broken. (What happens next is that the second byte (0xB6) is
recognized as a meta character, which aborts the search.)
Below is a test case that triggers the bug. Without sending Ctrl-R, 'ö' is
displayed properly. When sending Ctrl-R, we end up with an aborted search and
broken output.
#include <locale.h>
#include <readline/readline.h>
#include <unistd.h>
static unsigned char input;
static int input_avail = 0;
static int readline_getc(FILE *dummy) {
input_avail = 0;
return input;
}
static int readline_input_avail(void) {
return input_avail;
}
static void got_command(char *line) {}
static void feed_to_readline(char c) {
input = c;
input_avail = 1;
rl_callback_read_char();
}
int main(void) {
setlocale(LC_ALL, "");
rl_getc_function = readline_getc;
rl_input_available_hook = readline_input_avail;
rl_callback_handler_install("> ", got_command);
feed_to_readline('\x12'); // Ctrl-R (comment out to display 'ö' properly)
feed_to_readline('\xC3'); // First byte of UTF-8 'ö'
feed_to_readline('\xB6'); // Second byte of UTF-8 'ö'
sleep(1);
rl_callback_handler_remove();
return 0;
}
(The actual code I discovered this in is
https://github.com/ulfalizer/readline-and-ncurses
by the way. There it suffices to just press Ctrl-R and ö.)
Here's two unrelated nits I found:
- "(*rl_redisplay_function) ();" at the end of rl_display_search() might be
redundant. rl_message() already calls it.
- _rl_isearch_callback() assigns 'c' but doesn't use it.
PS. If I'm not confused and this is a real bug, it would be nice if you
included my nick (Ulfalizer) in any attributions. Looks good when job hunting.
:)
/Ulf
- [Bug-readline] Search with multibyte characters broken with custom rl_getc_function() (with analysis),
Ulf Magnusson <=