[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF-8 - retrieving and displaying multibyte characters.
From: |
Chris Jones |
Subject: |
Re: UTF-8 - retrieving and displaying multibyte characters. |
Date: |
Wed, 10 Jun 2009 22:53:09 -0400 |
User-agent: |
Mutt/1.5.13 (2006-08-11) |
On Sat, Jun 06, 2009 at 06:09:11PM EDT, Thomas Dickey wrote:
> On Sat, 6 Jun 2009, Chris Jones wrote:
[..]
> It's only minimal for the display (provided that the display uses
> POSIX characters ;-).
>
> I/O for UTF-8 does involve some changes...
I figured everything would have been transparent thanks to the magic of
the ncurses library.
> getch() will return (in effect) bytes;
> UTF-8 is (except for 0-127) a multibyte code.
I need to test further but my getch() returns the first UTF-8 encoded
byte and subsequent invocations of getch request more input for the
terminal.
> For reading UTF-8 you should be using wget_wch, which
> makes a distinction between characters and KEY_xxx codes.
With wget_wch, I do retrieve one integer value for the multi-byte
character at the terminal.. but presumably due to my setup or the
options in my code, the value of the integer is the "keycode" ..
0x20AC.. not the UTF-8 encoding.
> >I was naively expecting getch() to return 0xE282AC or maybe 0x20AC -
> >since I have coded the raw() function.
> >
> >What happened to the other two bytes, and is there a way to retrieve
> >them?
>
> wget_wch will return the 3 bytes as one character. wgetch will read
> each byte separately.
I need to run more tests.
[..]
> >Does anyone have some basic code available that demonstrates how I
> >would go about writing a program that might provide a dialog such as
> >this:
> The 'A' test in ncurses' test/ncurses.c exercises wget_wch
> A = wide-character keyboard and mouse input test
Took a while, but I eventually found from the "INSTALL" file that I
needed to configure ncurses to be UTF-8 aware via the --enable-widec
option.
Now I do have an 'A' option (capital A) when I ../run test/ncurses and
other wide characters-specific test programs in the test/ directory run
OK..
I haven't had time to play with it yet, but it looks like I'm getting
somewhere.
>From the look of it.. it would appear that the "wide" functions return
the U+ value - such as 20AC for the euro symbol - not the UTF-8 encoded
value - which means that if I want to do anything with it, I would need
to convert it myself to its 3-byte UTF-8 encoding.
Would you know if there is a simple way I can have the "make" step build
symbol tables to be created - as in "gcc -g ncurses.c" - so I can have
the source code available when I run the test/* programs under gdb..?
Thanks,
CJ