[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: charsets and character sets (was: Re: 21.1: list-charset-chars)
From: |
Janusz S. Bień |
Subject: |
Re: charsets and character sets (was: Re: 21.1: list-charset-chars) |
Date: |
19 Feb 2002 19:42:36 +0100 |
User-agent: |
Gnus/5.09 (Gnus v5.9.0) Emacs/21.1 |
I quote my letter in full as I intended to send it also to emacs-devel
but forgot to add it to the adressee list.
On 19 Feb 2002 address@hidden (Janusz S. Bień) wrote:
> On Mon, 18 Feb 2002 "Eli Zaretskii" <address@hidden> wrote:
>
> > > From: "Ulrich Windl" <address@hidden>
> > > Date: Mon, 18 Feb 2002 15:58:51 +0100
> > >
> > > I found out that the result of list-charset-chars (e.g. for latin15) is
> > > contrary to the documentation: Only characters > 127 are displayed, but
> > > the name and documentation creates the impression that all characters
> > > are listed.
> >
> > What led you to believe that ASCII characters with codes below 128
> > belong to the other charsets? Whatever gave you that impression is
> > the place where the documentation should be improved, because ASCII
> > characters are a separate charset in Emacs.
>
> On Tue, 19 Feb 2002 "Ulrich Windl" <address@hidden> wrote:
>
> [...]
>
> > "list charset chars": What else than listing the characters in the
> > charset could be expected?
> >
> > Regards,
> > Ulrich
>
> The Emacs documentation fails to make clear distinction between Emacs
> charsets and character sets in the sense of ISO and related
> standards.
>
> Charset named e.g. latin15 *is not* ISO/IEC Latin 15 character set, it
> is just its right-hand part, registered as such in ISO International
> Register (available online) as ISO-IR 203. However, iso-8859-15
> *coding system* is equivalent to ISO/IEC Latin 15, cf. the output of
> `describe-coding-system':
>
> ------------------------------------------------------------------------------
> 0 -- iso-8859-15 (alias of iso-latin-9)
> ISO 2022 based 8-bit encoding for Latin-9 (MIME:ISO-8859-15)
> Type: 2 (variant of ISO-2022)
> Initial designations:
> G0 -- ascii:ASCII (ISO646 IRV)
> G1 -- latin-iso8859-15:Right-Hand Part of Latin Alphabet 9 (ISO/IEC
> 8859-15): ISO-IR-203
> -----------------------------------------------------------------------------
>
> Long, long ago I proposed to change the name of charsets
> appropriately, but my suggestion was rejected and I didn't pressed the
> point. I think there is now the right time to come back to the
> problem, as the correct terminology is important for the development
> work.
>
> My current proposal is:
>
> - make explicit in the manuals and documentation strings that
> charsets are Emacs specific technical terms,
>
> - add `describe-charset' analogical to `describe-coding-system' to
> minimize the chance of user confusion,
>
> - on the first convenient occasion rename `latin-15' and related
> charsets to something more adequate, e.g. `latin-no9-rp' (15 is the
> number of the ISO/IEC 8859 standard part which containes the
> definiton of Latin alphabet number 9 while `latin-15' suggests Latin
> alphabet number 15; `rp' is to stands for `right-hand part of',
> which is ISO/IEC technical term).
>
> Best regards
>
> Janusz
>
> --
> ,
> dr hab. Janusz S. Bien, prof. UW
> Prof. Janusz S. Bien, Warsaw Uniwersity
> http://www.orient.uw.edu.pl/~jsbien/
> ---------------------------------------------------------------------
> Na tym koncie czytam i wysylam poczte i wiadomosci offline.
> On this account I read/post mail/news offline.
On Tue, 19 Feb 2002 "Eli Zaretskii" <address@hidden> wrote:
[...]
> > I don't have a v21 Emacs at hand in the moment, but a ISO 8859 15
> > charset is a superset of US-ASCII
>
> Not in Emacs, it isn't.
Because charset *is not* character set.
> The full name of latin-iso8859-15 in Emacs
> is this:
>
> "Right-Hand Part of Latin Alphabet 9 (ISO/IEC 8859-15): ISO-IR-203."
>
> See mule-conf.el for more information. The ``right-hand part'' thing
> means that characters below 128 are not included.
In other words, the charset name is not adequate.
> What I'm asking is where would you suggest to explain this
> fundamental fact so that it becomes clear.
For example, after
-------------------------------------------------------------------------
International Character Set Support
***********************************
Emacs supports a wide variety of international character sets,
including European variants of the Latin alphabet, as well as Chinese,
Cyrillic, Devanagari (Hindi and Marathi), Ethiopic, Greek, Hebrew, IPA,
Japanese, Korean, Lao, Thai, Tibetan, and Vietnamese scripts. These
features have been merged from the modified version of Emacs known as
MULE (for "MULti-lingual Enhancement to GNU Emacs")
------------------------------------------------------------------------
add
To implement the character set support Emacs uses the notion
of charset. For historical reasons most 8-bit character codes
are considered to consist of two separate 7-bit charsets,
namely ASCII and so called right-hand part of the appropriate
character code, for example...
Please note also that characters belonging to different
charsets are always different, even if they look the same: the
letter o with acute accent from Latin alphabet no 1 (charset
`latin-no1-rp', intended to be used e.g. for French) is
different from the letter o with acute accent from Latin
alphabet no 2 (charset `latin-no2-rp', intended to be used
e.g. for Polish).
Best regards
Janusz
--
,
dr hab. Janusz S. Bien, prof. UW
Prof. Janusz S. Bien, Warsaw Uniwersity
http://www.orient.uw.edu.pl/~jsbien/
---------------------------------------------------------------------
Na tym koncie czytam i wysylam poczte i wiadomosci offline.
On this account I read/post mail/news offline.
- Re: charsets and character sets (was: Re: 21.1: list-charset-chars),
Janusz S. Bień <=