[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
UTF-8 multi-byte characters are not displayed properly on Windows consol
From: |
LIU Hao |
Subject: |
UTF-8 multi-byte characters are not displayed properly on Windows consoles |
Date: |
Thu, 12 Jan 2023 15:30:20 +0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 |
Hello folks,
I'm mingw-w64 developer and MSYS2 contributor, and I maintain a GNU nano port to Windows [1]. First
of all, thank you for the great work!
Since Windows 10, the Windows console has gained UTF-8 support, which however has to be enabled
explicitly in system control panel. After UTF-8 support has been enabled and the UTF-8 code page has
been set up with the `chcp 65001` command, all standard C ctype functions can work on UTF-8 strings.
However, when GNU nano attempts to display a UTF-8 string, it is taken bytewise and becomes
gibberish. I have created this testcase, for example:
```
#include <ncursesw/ncurses.h>
int
main(void)
{
initscr();
addstr("»·"); // hex: C2 BB C2 B7
refresh();
getch();
}
```
The commented string literal contains two characters as four bytes. On Linux it is displayed
properly, but on a Windows UTF-8 console I get `»·`. How should I fix it?
[1] https://github.com/lhmouse/nano-win
--
Best regards,
LIU Hao
OpenPGP_signature
Description: OpenPGP digital signature