bug-ncurses
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question: How does ncurses store and handle "wide" characters?


From: Thomas Dickey
Subject: Re: Question: How does ncurses store and handle "wide" characters?
Date: Mon, 31 May 2021 09:45:14 -0400
User-agent: Mutt/1.10.1 (2018-07-13)

On Sun, May 30, 2021 at 02:08:52PM -0400, pjfarley3@earthlink.net wrote:
> This is just a question, not a bug report.
> 
> I've been trying to determine from examining the ncurses header how "wide"
> characters are actually stored by ncurses, in pursuit of analyzing
> differences between ncurses and PDCurses, because PDCurses is all that is
> available on Windows systems unless one is using msys2 or cygwin.
> 
> As far as I can tell, the cchar_t type uses a 32-bit int in which the
> low-order 8 bits are for the character, the next-higher 8 bits are for the

no - cchar_t stores an array of wchar_t values.

There's an array because some characters are designed to combine
(e.g., overprint or otherwise modify) the base character (the first
item in the array).

In cchar_t, attributes and colors are stored in different fields.

chtype stores an 8-bit character, combined with attributes and colors.

(it's in the header file).

> color pair number and the high-order 16 bits are for attribute flags.  When
> I tracked down the wchar_t type in a linux system (Ubuntu 20.04) it seems to
> be defined as an "int".

actually an unsigned value.  wint_t is a signed value more/less equivalent,
but allows for a negative value which represents an error.
 
> I do not understand how the wchar_t and cchar_t types are used by ncurses to
> support wide characters, or the uses of the win_t type.

wint_t and wchar_t are standard types defined outside ncurses.
ncurses' header file has fallback definitions for those to handle old systems.
 
> In the PDCursesMod fork of PDCurses in the docs/MANUAL.md text file there is
> a clear description of the internal storage for characters, pasted below.
> The 32-bit version sacrifices 8 attribute flag bits for a 16-bit character
> storage, while the 64-bit version stores true Unicode characters in 21 bits,
> supports all ncurses attribute bits and has space to support much larger
> numbers of color pairs.
> 
> Is there any similar documentation for how ncurses currently stores and
> handles wide characters?

I haven't found it necessary to show the bit layout, since the header file
is commented well enough :-)
 
> Peter
> 
> 
> Extract from
> https://github.com/Bill-Gray/PDCursesMod/blob/master/docs/MANUAL.md:
> 
> Text Attributes
> ===============
> 
> If CHTYPE_32 is #defined,  PDCurses uses a 32-bit integer for its chtype:
> 
>     +--------------------------------------------------------------------+
>     |31|30|29|28|27|26|25|24|23|22|21|20|19|18|17|16|15|14|13|..| 2| 1| 0|
>     +--------------------------------------------------------------------+
>           color pair        |     modifiers         |   character eg 'a'
> 
> There are 256 color pairs (8 bits), 8 bits for modifiers, and 16 bits
> for character data. The modifiers are bold, underline, right-line,
> left-line, italic, reverse and blink, plus the alternate character set
> indicator.
> 
>    By default,  a 64-bit chtype is used :

...which doesn't allow for combining-characters :-)
 
> ----------------------------------------------------------------------------
> ---
> |63|62|61|60|59|..|34|33|32|31|30|29|28|..|22|21|20|19|18|17|16|..| 3| 2| 1|
> 0|
> ----------------------------------------------------------------------------
> ---
>          color number   |        modifiers      |         character eg 'a'
> 
>    We take five more bits for the character (thus allowing Unicode values
> past 64K;  the full range of Unicode goes up to 0x10ffff,  requiring 21 bits
> total),  and four more bits for attributes.  Three are currently used as
> A_OVERLINE, A_DIM, and A_STRIKEOUT;  one more is reserved for future use.
> On some platforms,  bits 33-40 are used to select a color pair (can run from
> 0 to 255). Bits 41 and 42 have been added to this to get 1024 color pairs.
> On some platforms (as of 2020 May 17,  WinGUI and VT),  bits 33-52 are used,
> allowing 2^20 = 1048576 color pairs.  That should be enough for anybody, and
> leaves twelve bits for other uses.

Go take a look at the header file, and see where the bits go.

Changing that would require an ABI change, which is far more work
than you might imagine, with little benefit.

-- 
Thomas E. Dickey <dickey@invisible-island.net>
https://invisible-island.net
ftp://ftp.invisible-island.net

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]