Re: UTF-8 - retrieving and displaying multibyte characters.

bug-ncurses

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 - retrieving and displaying multibyte characters.

From:	Chris Jones
Subject:	Re: UTF-8 - retrieving and displaying multibyte characters.
Date:	Wed, 10 Jun 2009 22:53:09 -0400
User-agent:	Mutt/1.5.13 (2006-08-11)

On Sat, Jun 06, 2009 at 06:09:11PM EDT, Thomas Dickey wrote:
> On Sat, 6 Jun 2009, Chris Jones wrote:

[..]

> It's only minimal for the display (provided that the display uses
> POSIX characters ;-).
> 
> I/O for UTF-8 does involve some changes...

I figured everything would have been transparent thanks to the magic of
the ncurses library.

> getch() will return (in effect) bytes;
> UTF-8 is (except for 0-127) a multibyte code.

I need to test further but my getch() returns the first UTF-8 encoded
byte and subsequent invocations of getch request more input for the
terminal.

> For reading UTF-8 you should be using wget_wch, which
> makes a distinction between characters and KEY_xxx codes.

With wget_wch, I do retrieve one integer value for the multi-byte
character at the terminal.. but presumably due to my setup or the
options in my code, the value of the integer is the "keycode" ..
0x20AC.. not the UTF-8 encoding.

> >I was naively expecting getch() to return 0xE282AC or maybe 0x20AC -
> >since I have coded the raw() function.
> >
> >What happened to the other two bytes, and is there a way to retrieve
> >them?
> 
> wget_wch will return the 3 bytes as one character.  wgetch will read
> each byte separately.

I need to run more tests.

[..]

> >Does anyone have some basic code available that demonstrates how I
> >would go about writing a program that might provide a dialog such as
> >this:

> The 'A' test in ncurses' test/ncurses.c exercises wget_wch
>       A = wide-character keyboard and mouse input test

Took a while, but I eventually found from the "INSTALL" file that I
needed to configure ncurses to be UTF-8 aware via the --enable-widec
option. 

Now I do have an 'A' option (capital A) when I ../run test/ncurses and
other wide characters-specific test programs in the test/ directory run
OK.. 

I haven't had time to play with it yet, but it looks like I'm getting
somewhere.

>From the look of it.. it would appear that the "wide" functions return
the U+ value - such as 20AC for the euro symbol - not the UTF-8 encoded
value - which means that if I want to do anything with it, I would need
to convert it myself to its 3-byte UTF-8 encoding.   

Would you know if there is a simple way I can have the "make" step build
symbol tables to be created - as in "gcc -g ncurses.c" - so I can have
the source code available when I run the test/* programs under gdb..?

Thanks,

CJ

[Prev in Thread]

Current Thread

[Next in Thread]

UTF-8 - retrieving and displaying multibyte characters., Chris Jones, 2009/06/06
- Re: UTF-8 - retrieving and displaying multibyte characters., Thomas Dickey, 2009/06/06
  - Re: UTF-8 - retrieving and displaying multibyte characters., Chris Jones, 2009/06/06
  - Re: UTF-8 - retrieving and displaying multibyte characters., Chris Jones <=

Prev by Date: getch() in nodelay mode returns KEY_RESIZE too late
Next by Date: Re: ncurses-5.7-20090530.patch.gz
Previous by thread: Re: UTF-8 - retrieving and displaying multibyte characters.
Next by thread: ncurses-5.7-20090606.patch.gz
Index(es):
- Date
- Thread