bug-standards
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnu.org #1363250] ASCII maintain.txt is no longer ASCII


From: John Darrington
Subject: Re: [gnu.org #1363250] ASCII maintain.txt is no longer ASCII
Date: Wed, 27 Feb 2019 08:29:43 +0100
User-agent: NeoMutt/20170113 (1.7.2)

Just before we go off at too many tangents, a bit of background info for
this discussion.

* ASCII is a well defined standard, and all ASCII is UTF-8 (but the
  converse is not true).

*  The command iconv -f UTF-8 -t ASCII file will fail unless all the
   characters in file are already ASCII.  Hence it isn't a very useful
   command.

* The coding standards say that we should prefer ASCII wherever
  possible.  If it is not possible, then we should use UTF-8.


I think that Therese is saying that there are some files which are using
UTF-8 when ASCII would have sufficed.

J'


On Tue, Feb 26, 2019 at 12:06:56PM -0500, Alfred M. Szmidt wrote:
        > I have noticed that maintain.txt and maintain.info[1] are no longer 
in 
        > ASCII, but in UTF-8. In particular they contain lots of easily 
avoidable 
        > UTF-8 quoting characters (single and double quotes) that break 
        > displaying them in non-UTF-8 terminals. This is a pity because the 
main 
        > use of such simple formats is to be displayed in simple terminals.
     
     I'm not sure what is the definition of "ASCII" here, are you talking
     about "printable" characters?  In that case, the Info format has
     always contained non-printable/non-ASCII characters, most notably #o37
     for section splitting, the "#o0 #10 [" sequence for images, etc.  So
     these files have never been very readable on "simple text terminals"
     (what do you mean by that more exactly? VT100 dumb terminal?).
     
     For the text files, I think it still makes more sense to use UTF-8,
     the default locale these days on GNU/Linux is UTF-8, and many of the
     command line tools will output UTF-8 style quoting characters if that
     is so.  
     
     Could you run your files through iconv and convert them from UTF-8 to
     ASCII?  Maybe,
     
        iconv -f UTF-8 -t ASCII file...
     
        > Given that there is just one letter out of the ASCII range in 
        > maintain.{txt,info} (the '??' in 'risqu??'), could it be possible to 
keep 
        > these files as pure ASCII? Thanks.
     
     990 matches in 490 lines for "[^[:ascii:]]" in buffer: maintain.txt
     988 matches in 489 lines for "[^[:ascii:]]" in buffer: maintain.info
     
     These are mostly quotes, but you have bullets and copyright, em-dashes
     as well.

-- 
Avoid eavesdropping.  Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]