bug-standards
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnu.org #1363250] ASCII maintain.txt is no longer ASCII


From: Antonio Diaz Diaz
Subject: Re: [gnu.org #1363250] ASCII maintain.txt is no longer ASCII
Date: Sat, 02 Mar 2019 00:50:23 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14

Alfred M. Szmidt wrote:
    ASCII is the common subset of both UTF-8 and most single-byte charsets,
    specially ISO-8859-X.

Can you define what you mean with "pure ASCII"?

As you can see, I already defined ASCII in the message you are answering. ASCII (the common subset of both UTF-8 and ISO-8859-X) are the characters 0x00 to 0x7F. These are the only single-byte characters in UTF-8. Characters >= 0x80 are multibyte in UTF-8 and are shown as two or more random characters in 8-bit terminals.


Initially you mentioned issues viewing things on terminals.  That has
little to do with not using the eighth bit in a byte, since about one
third of the ASCII table is not viewable on dumb terminals.

I never mentioned dumb terminals, but single-byte (8-bit) terminals, like those showing ISO-8859-X. I also called them simple terminals because displaying an 8-bit charset is orders of magnitude simpler than displaying the full UTF-8 charset.

Let's see four examples. (TL,DR, multibyte UTF-8 characters look like shit on 8-bit terminals).

This is how maintain.txt should look:

   This document uses the gender-neutral third-person pronouns "person"
(which can be shortened to "perse"), "per", "pers" and "perself." These
pronouns (aside from "perse") were promoted, and perhaps invented, by
Marge Piercy in 'Woman on the Edge of Time'.  They are used just like
"she", "her", "hers" and "herself", except that they apply regardless of
gender.  For example, "Person placed per new program under the GNU GPL,
to maintain freedom for all users of per work, and this way perse knows
perse has done the right thing."

This is how maintain.txt looks on my ISO-8859-15 terminal viewed with less:

This document uses the gender-neutral third-person pronouns â<80><9C>personâ (which can be shortened to â<80><9C>perseâ<80><9D>), â<80><9C>perâ<80><9D>, â pronouns (aside from â<80><9C>perseâ<80><9D>) were promoted, and perhaps invente Marge Piercy in â<80><98>Woman on the Edge of Timeâ<80><99>. They are used just â<80><9C>sheâ<80><9D>, â<80><9C>herâ<80><9D>, â<80><9C>hersâ<80><9D> and â<80> gender. For example, â<80><9C>Person placed per new program under the GNU GPL,
to maintain freedom for all users of per work, and this way perse knows
perse has done the right thing.â<80><9D>

This is how maintain.txt looks on my ISO-8859-15 terminal viewed with ed:

   This document uses the gender-neutral third-person pronouns âpersonâ
(which can be shortened to âperseâ), âperâ, âpersâ and âperself.â These
pronouns (aside from âperseâ) were promoted, and perhaps invented, by
Marge Piercy in âWoman on the Edge of Timeâ.  They are used just like
âsheâ, âherâ, âhersâ and âherselfâ, except that they apply regardless of
gender.  For example, âPerson placed per new program under the GNU GPL,
to maintain freedom for all users of per work, and this way perse knows
perse has done the right thing.â

This is how maintain.txt looks on my ISO-8859-15 terminal viewed with moe, which shows the non-printable bytes of the multibyte UTF-8 characters:

   This document uses the gender-neutral third-person pronouns address@hidden@]
(which can be shortened to address@hidden@]), address@hidden@], address@hidden@] and address@hidden@
pronouns (aside from address@hidden@]) were promoted, and perhaps invented, by
Marge Piercy in address@hidden on the Edge of address@hidden  They are used 
just like
address@hidden@], address@hidden@], address@hidden@] and address@hidden@], except that they apply regar
gender.  For example, address@hidden placed per new program under the GNU GPL,
to maintain freedom for all users of per work, and this way perse knows
perse has done the right address@hidden

This is how maintain.info looks on my ISO-8859-15 terminal viewed with info:

   This document uses the gender-neutral third-person pronouns âpersonâ
(which can be shortened to âperseâ), âperâ, âpersâ and âperself.â\
 These
pronouns (aside from âperseâ) were promoted, and perhaps invented, by
Marge Piercy in âWoman on the Edge of Timeâ.  They are used just like
âsheâ, âherâ, âhersâ and âherselfâ, except that they apply rega\
rdless of
gender.  For example, âPerson placed per new program under the GNU GPL,
to maintain freedom for all users of per work, and this way perse knows
perse has done the right thing.â


    This is specially important for 'maintain.info' because it can't be
    converted (the tag table becomes incorrect). Any user of a single-byte
    terminal will need to rebuild 'maintain.info' from source (as I need to
    do to see it in one of my machines).

Why should it be converted? Info files are meant for an Info viewer,
it would be the task of the Info viewer to adjust its locale.

I know of no info viewer able to convert every UTF-8 character to a printable character in ASCII or ISO-8859-X. The info viewer in this machine (info (GNU texinfo) 4.13+) is not even able to convert the three-byte UTF-8 quotes present in maintain.info to ASCII.


anything that represents a character as a 8-bit byte will handle
UTF-8 just fine

It will handle UTF-8 just fine, but probably will show it wrong, like 'address@hidden' instead of '"'.

So, please, could maintain.txt and maintain.info be coded again in ASCII (as advertised) for maximum compatibility with UTF-8 and 8-bit terminals? Thanks.


Best regards,
Antonio.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]