[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Surrogate pairs for addwstr?
From: |
Bill Gray |
Subject: |
Re: Surrogate pairs for addwstr? |
Date: |
Mon, 11 Oct 2021 13:40:37 -0400 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 |
On 10/11/21 12:05 AM, Tim Allen wrote:
On Sun, Oct 10, 2021 at 11:38:22AM -0400, Bill Gray wrote:
The other way to put this would be to ask : if you're on a
system with 32-bit wchar_ts, what should happen for this line?
mvaddwstr( 0, 2, L"\xd83d\xdd1e Treble clef with a surrogate pair");
Honestly, what I'd *expect* to happen is a compile-time or run-time
error.
As you thought, it can't be a compile-time error in C,
because the string is not necessarily a Unicode one; other
locales are supported.
The run-time error is an interesting thought. At least in
PDCurses, addch() only fails if it can't scroll. I could
imagine a "couldn't render that character" error condition
as well. In PDCurses, that would occur within waddch() and
then cause waddstr() to return ERR.
I would still be arguing in favor of handling surrogates in
all cases, but your point about them still not being handled
elsewhere changed my mind. That's a tougher hurdle to get around.
And thanks for the WCHAR_MAX == 65535 pointer. I can't see
why that wouldn't work.
In re "just use UTF8" : agreed, yet another good reason to
do so.
Thanks! -- Bill
Printing gibberish is never particularly helpful, but encouraging people
to assume wide-string literals (or wide-strings in general) use UTF-16
encoding seems like a bad idea. Sure, you can make it work transparently
for curses, but there's other libraries (like libc) that are likely to
get tripped up, and that seems like a foot-gun waiting to happen. Even
if you provide a utf16towcs() helper, people are going to forget to call
it since the input and output types are both wchar_t*.
The absolute simplest and safest thing a portable program could do is to
restrict itself to the Basic Multilingual Plane. The second simplest and
safest thing would probably be to store strings as UTF-8 (narrow) string
literals, and provide some kind of utf8stowcs() that decodes to UTF-16
or to UTF-32 depending on the value of WCHAR_MAX.
Tim.