bug-ncurses
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: ISO 2022 in terminals


From: Robinson, Herbie
Subject: RE: ISO 2022 in terminals
Date: Fri, 4 Feb 2011 17:19:15 -0500

Some information on character set encodings that may prove helpful.

1.  ISO-2022 (aka ECMA-35,
http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-035.pd
f) provides control sequences for leaving the ISO-2022 world and using a
completely different character encoding.  It's called DOCS or Designate
Other Coding System.  The other coding system may, or may not, have a
mechanism for returning to ISO-2022.  See section 15.4.2 in ECMA-35.

2.  The Unicode escape sequences are registered
(http://www.itscj.ipsj.or.jp/ISO-IR/):

        esc % G UTF-8 (any level) with standard return
        esc % / G       UTF-8 level 1 with no standard return
        esc % / H       UTF-8 level 2 with no standard return
        esc % / I   UTF-8 level 3 with no standard return

If I read the standard right, the standard sequence for getting out of
UTF-8 and back to ISO-2022 is "esc @" and it is supposed to restore the
designate and invoke state.  This would only be valid if the terminal
accepted "esc % G" to get into UTF-8 mode (i.e., there was no '/').
Also, note that according to the standard ISO-2022 controls are not
going to be valid after designating another coding system without a '/'
in the escape sequence (because it's supposed to restore the state upon
return).  If there is a '/' in the escape sequence, then all there is no
standard behavior at all; so, the other coding system might allow
ISO-1022 controls for designates and invokes or might not.

So as far the standards committees are concerned, UTF-8 should really be
a separate entity.  Of course, it's anybody's guess how many people have
actually followed the standards...

So how does this relate to ncurses?  

Well, the basic concept is that one defines the terminal capabilities to
the system by telling it what kind of terminal is connected on the line.
The terminal type name is basically connected to a set of capabilities.
This was great 20 years ago when there was an actual hardware boat
anchor at the end of a 7 bit async line and the number of different
kinds of boat anchors was limited.  Now, we have software on the other
end of the line with a wide variety of implementations (an xterm on a 15
year old Sun workstation is way different from an xterm on the latest
Linux distro).  To make matters worse, all of the software emulators are
widely configurable; so, one would really need an entire family of
terminal types for one emulator!  UTF-8 support is just another example
of either being stuck using the lowest common denominator or being in
terminal type hell.

What's really needed here is some way to query the "terminal" for
capabilities.  The basic idea would be to have "optional" capabilities
in a terminfo and a way to query the terminal to see if they are
supported.  The ncurses state would start out with all the optional
capabilities off and send a query to find out what capabilities are
supported.  If the terminal answered the query, the capabilities it
claimed to support would be turned on.  I think it would be possible to
use device control strings to formulate the queries and responses
(because they are ignored if they are unsupported).  Well, I've got to
get back to what I am supposed to be working on...

-----Original Message-----
From: address@hidden
[mailto:address@hidden On
Behalf Of Thomas Dickey
Sent: Thursday, February 03, 2011 9:17 PM
To: Keith Winstein
Cc: address@hidden
Subject: Re: ISO 2022 in terminals

On Thu, 3 Feb 2011, Keith Winstein wrote:

> On Fri, 4 Feb 2011, Tim Allen wrote:
>
>> On Thu, Feb 03, 2011 at 01:58:42PM -0500, Keith Winstein wrote:
>>> Is there any progress on finishing the move to UTF-8 so we can turn
>>> off the interpretation fo ISO 2022 sequences, or did it turn out
>>> this was a bad idea?
>> 
>> The trouble is that existing terminal emulators, and existing
>> terminal-using programs, aren't trying to target a hypothetical
>> 'best-practices terminal', they're targetting actual hardware
terminals
>> that supported ISO2022 sequences (or in the case of more modern
attempts
>> like gnome-terminal, targetting xterm which targets actual hardware
>> terminals). A pure UTF-8 terminal protocol is certainly possible, but
>> compatibility concerns[1] would make it pretty frustrating to use.
>
> Tim, that's fair, but from my point of view we already broke
compatibility by 
> going to UTF-8.
>
> A "UTF-8 vt220" is already a break with the vt220 -- e.g. if you send
a raw 
> C1 control it will not work, since the UTF-8 is below the
ECMA-48/vt220 
> layer. Now to mention a raw GR char.
>
> Apparently Markus Kuhn was hoping (well, his page still says) that the
world 
> would declare that a "UTF-8 vt220" also would refuse to honor ISO 2022
shift 
> sequences and would require UTF-8 for everything, including ACS
characters.
>
> Which does not seem that crazy -- if applications have to be
locale-aware to 
> generate C1 controls and GR characters in the proper encoding, why not

> require them to be locale-aware to generate ACS characters?

so it would seem - but real terminals weren't locale-sensitive...

> Sounds like it didn't work out that way, though.
>
>> If you're willing to settle for merely not having to implement ISO
2022 
>> yourself, rather than erasing it completely, have you tried luit?>
>>      http://invisible-island.net/luit/
>
> Yeah, implementing the vt220 shifts isn't the problem -- I just was
hoping to 
> be able to free the user from the possibility of getting locked into
an 
> alternate charset and having to type "reset".
>
> I could have it set NCURSES_NO_UTF8_ACS=1, but that won't be carried
over an 
> SSH connection (only TERM and sometimes LANG/LC_* are).
>
> Another question: Let's say I did make a new terminfo entry and you
guys 
> agreed to carry it in the terminfo database. Is there anything that
one can 
> put in the terminfo entry that gets this behavior (equivalent to 
> NCURSES_NO_UTF8_ACS=1), or do you really just have to have the _name_ 
> "linux"?

Well, it was "linux" only at first, since it was an exception.
"screen" crept in later, and then PuTTY (which is where I added
the environment variable).  I could add a terminfo flag, but
hadn't seen a limitation for the environment variable.

A terminfo flag would be only recognized by ncurses, anyway.

(none of the other curses implementations recognize ncurses' extended
features).

> I tried removing the smacs and rmacs capabilities but ncurses still
seems to 
> send ISO 2022 escape sequences. Perhaps there could be a new terminfo 
> capability that indicates that the terminal requires UTF-8 for ACS
characters 
> and will not honor ISO 2022?
>
> Thanks for your help,
> Keith
>

-- 
Thomas E. Dickey
http://invisible-island.net
ftp://invisible-island.net

_______________________________________________
Bug-ncurses mailing list
address@hidden
http://lists.gnu.org/mailman/listinfo/bug-ncurses



reply via email to

[Prev in Thread] Current Thread [Next in Thread]