Re: Surrogate pairs for addwstr?

bug-ncurses

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogate pairs for addwstr?

From:	Bill Gray
Subject:	Re: Surrogate pairs for addwstr?
Date:	Mon, 11 Oct 2021 13:40:37 -0400
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0

On 10/11/21 12:05 AM, Tim Allen wrote:

On Sun, Oct 10, 2021 at 11:38:22AM -0400, Bill Gray wrote:

    The other way to put this would be to ask : if you're on a
system with 32-bit wchar_ts,  what should happen for this line?

   mvaddwstr( 0, 2, L"\xd83d\xdd1e Treble clef with a surrogate pair");


Honestly, what I'd *expect* to happen is a compile-time or run-time
error.


   As you thought,  it can't be a compile-time error in C,
because the string is not necessarily a Unicode one;  other
locales are supported.

   The run-time error is an interesting thought.  At least in
PDCurses,  addch() only fails if it can't scroll.  I could
imagine a "couldn't render that character" error condition
as well.  In PDCurses,  that would occur within waddch() and
then cause waddstr() to return ERR.

   I would still be arguing in favor of handling surrogates in
all cases,  but your point about them still not being handled
elsewhere changed my mind.  That's a tougher hurdle to get around.

   And thanks for the WCHAR_MAX == 65535 pointer.  I can't see
why that wouldn't work.

   In re "just use UTF8" : agreed,  yet another good reason to
do so.

Thanks!        -- Bill

Printing gibberish is never particularly helpful, but encouraging people
to assume wide-string literals (or wide-strings in general) use UTF-16
encoding seems like a bad idea. Sure, you can make it work transparently
for curses, but there's other libraries (like libc) that are likely to
get tripped up, and that seems like a foot-gun waiting to happen. Even
if you provide a utf16towcs() helper, people are going to forget to call
it since the input and output types are both wchar_t*.

The absolute simplest and safest thing a portable program could do is to
restrict itself to the Basic Multilingual Plane. The second simplest and
safest thing would probably be to store strings as UTF-8 (narrow) string
literals, and provide some kind of utf8stowcs() that decodes to UTF-16
or to UTF-32 depending on the value of WCHAR_MAX.


Tim.

[Prev in Thread]

Current Thread

[Next in Thread]

Surrogate pairs for addwstr?, Bill Gray, 2021/10/09
- Re: Surrogate pairs for addwstr?, Thomas Dickey, 2021/10/09
  - Re: Surrogate pairs for addwstr?, Thomas Dickey, 2021/10/09
- Re: Surrogate pairs for addwstr?, Tim Allen, 2021/10/09
  - Re: Surrogate pairs for addwstr?, Bill Gray, 2021/10/10
    - Re: Surrogate pairs for addwstr?, Thomas Dickey, 2021/10/10
    - Re: Surrogate pairs for addwstr?, Tim Allen, 2021/10/11
    - Re: Surrogate pairs for addwstr?, Thomas Dickey, 2021/10/11
    - Re: Surrogate pairs for addwstr?, Bill Gray <=

Prev by Date: Re: Surrogate pairs for addwstr?
Next by Date: CVE-2021-39537
Previous by thread: Re: Surrogate pairs for addwstr?
Next by thread: ANN: ncurses-6.2-20211009
Index(es):
- Date
- Thread