[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Bash doesn't handle C with acute accent properly during readline's rl_ch
From: |
Eduardo Bustamante |
Subject: |
Bash doesn't handle C with acute accent properly during readline's rl_change_case |
Date: |
Thu, 11 May 2017 07:56:06 -0500 |
The C with acute accent character: https://en.wikipedia.org/wiki/%C4%86
- Upper case
dualbus@debian:~$ printf '\U0106\n'
Ć
- Lower case
dualbus@debian:~$ printf '\U0107\n'
ć
Now, in bash, if you type in ć, then run readline `upcase-word' on it,
instead of ending up with the UTF-8 multibyte string for U+0106 (0xC4
0x86), you end up with 0x07 0x87.
The parameter expansion doesn't seem to have that problem so I think
it's a bug in readline:
dualbus@debian:~/src/gnu/bash$ a=ć; echo ${a^^}
Ć
dualbus@debian:~/src/gnu/bash$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
(gdb) bt
#0 rl_change_case (count=1, op=1) at text.c:1339
#1 0x0000000000525cad in rl_upcase_word (count=1, key=117) at text.c:1304
#2 0x00000000004fe7a7 in _rl_dispatch_subseq (key=117, map=0x771d80
<emacs_meta_keymap>, got_subseq=0) at readline.c:851
#3 0x00000000004fed6f in _rl_dispatch_subseq (key=27, map=0x772d90
<emacs_standard_keymap>, got_subseq=0) at readline.c:985
#4 0x00000000004fe149 in _rl_dispatch (key=27, map=0x772d90
<emacs_standard_keymap>) at readline.c:797
#5 0x00000000004fe0b9 in readline_internal_char () at readline.c:629
#6 0x00000000004ff6a2 in readline_internal_charloop () at readline.c:656
#7 0x00000000004fda12 in readline_internal () at readline.c:670
#8 0x00000000004fd8d0 in readline (prompt=0x899ce8 "bash-4.4$ ") at
readline.c:374
#9 0x000000000042cae8 in yy_readline_get () at ./parse.y:1456
#10 0x0000000000431a8b in yy_getc () at ./parse.y:1389
#11 0x0000000000432328 in shell_getc (remove_quoted_newline=1) at ./parse.y:2289
#12 0x0000000000430bb7 in read_token (command=0) at ./parse.y:3138
#13 0x000000000042c14e in yylex () at ./parse.y:2675
#14 0x0000000000428abe in yyparse () at y.tab.c:1827
#15 0x00000000004285ab in parse_command () at eval.c:294
#16 0x0000000000428392 in read_command () at eval.c:338
#17 0x0000000000428091 in reader_loop () at eval.c:140
#18 0x00000000004253bb in main (argc=1, argv=0x7fffffffe498,
env=0x7fffffffe4a8) at shell.c:794
(gdb) p rl_line_buffer
$1 = 0x83a408 "ć"
(gdb) finish
Run till exit from #0 rl_change_case (count=1, op=1) at text.c:1339
0x0000000000525cad in rl_upcase_word (count=1, key=117) at text.c:1304
1304 return (rl_change_case (count, UpCase));
Value returned is $2 = 0
(gdb) p rl_line_buffer
$3 = 0x83a408 "\a\207"
For some reason, rl_change_case thinks `c` is ASCII:
(gdb) call isascii((unsigned char)c)
$8 = 1
- Bash doesn't handle C with acute accent properly during readline's rl_change_case,
Eduardo Bustamante <=