bug-ncurses
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CDK selection widget and aligning text


From: Tim Allen
Subject: Re: CDK selection widget and aligning text
Date: Sat, 21 Mar 2009 23:52:36 +1100
User-agent: Mutt/1.5.18 (2008-05-17)

On Sat, Mar 21, 2009 at 04:41:18AM -0700, TheLonelyStar wrote:
> The trouble is, I use UTF-8 characters (umlaute in example). And how to not
> know how to count the displayed string length ...

Most GUI toolkits have a 'measure text' function that tells you how wide
a particular text string will be in a particular font, after taking
different character widths into account. For ncurses, it's more
difficult since the on-screen width of a particular string is entirely
up to the terminal displaying the text - a terminal that doesn't support
UTF-8 will likely use one character-cell per byte, a terminal that
doesn't support combining characters will likely use an extra character
cell for them and so forth.

I believe that in an ideal world, an algorithm for measuring the
on-screen width of a Unicode string would be something like this:

    width = 0

    for each codepoint in string:
        combining_class = unicode_combining_class(codepoint)
        if combining_class > 0:
            # This is a combining character, taking no space on its own.
            continue

        east_asian_width = unicode_east_asian_width(codepoint)
        if east_asian_width in ('F', 'W'):
            # A "full-width" or "wide" East Asian glyph
            width = width + 2
        else:
            # A normal, one-character-cell glyph
            width = width + 1

    return width

There may be additional complications: for example, if the first
character of the string is a combining character, it might take up
a cell of its own since it has nothing to combine with. Also, the above
code assumes that all characters whose East Asian Width is "A" (for
"Ambiguous") are single-width. OS X's Terminal.app terminal emulator
actually has a configuration option to treat such characters as single-
or double-width, so if that setting is not set to "single width", your
display will be corrupted and there's nothing you can do about it. This
is particularly annoying since the ACS line-drawing characters so many
programs use are marked as Ambigous in Unicode.

Note the hypothetical "unicode_combining_class" and
"unicode_east_asian_width" functions. These would look up the relevant
codepoint properties from the Unicode Character Database. You'll want to
read its documentation for more details:

    http://www.unicode.org/Public/4.1.0/ucd/UCD.html

The East Asian Width property is defined in more detail here:

    http://unicode.org/reports/tr11/

There are all kinds of interesting corner-cases you can get into once
you start trying to take double-width characters into account - for
example, if you print a double-width glyph in the very last column of
the terminal, xterm will wrap the entire glyph onto the next line, while
iTerm on OS X will print a "#" in the last column of the current line,
and another in the first column of the next line.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]