groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] German Umlaute, ISO-8859-1, postscript.


From: Roger Leigh
Subject: Re: [Groff] German Umlaute, ISO-8859-1, postscript.
Date: Sun, 16 May 2004 22:58:29 +0100
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux)

Jorgen Grahn <address@hidden> writes:

> This is maybe offtopic ... but the problem doesn't seem to be entirely your
> fault.  IMHO, either Redhat shouldn't make UTF-8 the default, or your vi
> clone is buggy. People do not expect their text files to *by default*
> contain anything other than one-octet-per-character text[0].

If you are using a UTF-8 locale, you *are* saying that UTF-8 is the
default encoding, and it's then perfectly reasonable to save the file
as UTF-8 by default.

If you edit a file encoded in ISO-8859-1, the editor should be able to
switch to this encoding just while editing the file, and preserve the
encoding while saving it.

If you use iconv(1), it should be trivial to recode it.

> [0] At least not yet, and not in countries where iso8859-1 is
> enough.

UTF-8 locales are usable right now.  Although I find the odd glitch,
it's 99% there.  The main problem I find is with fonts that don't
provide a Unicode HYPHEN when I view a manpage that requires them.
But that's down to my choice of terminal font.

All the major Linux distributions are switching to UTF-8 as the
default encoding.  Within the next year, there shouldn't be a single
distribution that doesn't have UTF-8 locales as the default.  And it
won't stop there.  For example, it is highly likely that in Debian,
all package control files will be UTF-8, and it may be required that
all documentation is (re-)coded in UTF-8.

Local character encodings and fonts using those encodings, such as are
common today, are going to be replaced with Unicode equivalents.  Even
if you find a particular ISO-8859 charset sufficient, these encodings
are the cause of a lot of problems.  Having a single universal
character set solves many problems we face today, and all the myriad
of encodings we have are essentially deprecated *right now*.

> Protocols and specific file formats yes, but not plain metadata-less
> text files.

The latter should be ASCII only.  IMO of course, but there are only
two sane encodings to use today: ASCII and UTF-8.  Anything else
should be recoded to UTF-8.

The only thing holding this back are tools (such as groff) that can't
cope.  For those, you can recode on the fly, or leave them in the old
encoding for the time being.


Regards,
Roger

-- 
Roger Leigh

                Printing on GNU/Linux?  http://gimp-print.sourceforge.net/
                GPG Public Key: 0x25BFB848.  Please sign and encrypt your mail.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]