Re: insstr (from ncursesw) UTF-8 issue

From: Thomas Dickey
Subject: Re: insstr (from ncursesw) UTF-8 issue
Date: Thu, 15 Oct 2009 19:34:26 -0400 (EDT)

On Thu, 15 Oct 2009, Ciprian Dorin, Craciun wrote:

   Any updates on the instr issue?

I made a fix here:

        + add some test programs (and make these use the same special keys
          by sharing linedata.h functions):
        + correct internal _nc_insert_ch() to use _nc_insert_wch() when
          inserting wide characters, since the wins_wch() function that it used
          did not update the cursor position (report by Ciprian Craciun).

(it seems to work; I'll be out of town this Saturday and won't have time
for more than a small fix or two before then).

On Tue, Sep 8, 2009 at 3:49 AM, Thomas Dickey <address@hidden> wrote:
On Mon, 7 Sep 2009, Ciprian Dorin, Craciun wrote:

  Hello all!

  I've discovered a problem (I think it's a bug, but maybe I'm doing
something wrong) about the behavior of the insstr procedure in the
context of UTF-8 encodings. (I'm using the ncursesw library.)

  To keep the email short, I've attached a tar file that contains
all the necessary code to reproduce the bug. (Both a Python version
(from which I've started) and a C version.)
  * context.txt contains the current versions of ncursesw library,
and the result of the linking of my application;
  * bug.c and bug.py contain the sourcecode;
  * bug.c.bash and bug.py.bash the commands used to run the code;

  In summary the following code does not work correctly:
      mvinsstr (4, 0, "|aaist|"); // the correct expected order
(without special characters)
      mvinsstr (5, 0, "|\xc4\x83\xc3\xa2\xc3\xae\xc5\x9f\xc5\xa3|");
// the wrong order (with special characters)

  It should print the characters |ăâîşţ| (some Romanian diacritics)
(as in |aaist|), but instead it prints ||ţşîâă. It seems it sorts the
characters... Now the addstr works as expected.

Offhand, it looks like a bug in ncurses - I'll check/see what the fix would

(ins_wstr and related functions should work, though they're more cumbersome
to use).

Thomas E. Dickey

Thomas E. Dickey

