bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCHes] Add basic multibyte charset handling to makeinfo


From: Miloslav Trmac
Subject: Re: [PATCHes] Add basic multibyte charset handling to makeinfo
Date: Tue, 05 Dec 2006 12:51:29 +0100
User-agent: Thunderbird 1.5.0.8 (X11/20061107)

Eli Zaretskii napsal(a):
>> Date: Mon, 4 Dec 2006 16:18:53 -0600
>> From: address@hidden (Karl Berry)
>> Cc: address@hidden
>>
>>     The attached patches add support for multibyte character sets (e.g.
>>     UTF-8) and multi-column characters (e.g. Chinese) to makeinfo.
> Not that I don't think this is great and don't thank Miloslav; I do.
> But can we please first discuss the problem with using the locale's
> encoding instead of @documentencoding?  Surely, we can solve that,
> can't we?
Not really.  AFAIK
- character set names are not portable across operating systems
- even if you know that "iso-8859-1" is an acceptable character set
  name, that doesn't mean a locale using that character set exists.
  $current_locale.iso-8859-1 most likely doesn't exist.

So, if we want @documentencoding, we can't use system locales, and we
need a replacement that does at minimum the equivalents of mbtowc () and
wcwidth ().  It is completely unreasonable to implement this directly
inside texinfo sources, and I don't think it is really practical to make
texinfo dependent on some other library that provides this functionality
(ICU, maybe?).

The standalone info reader ignores the "Local Variables: coding: ..."
trailer anyway, so the assumption that info files use the system's
character set is already present, although makeinfo doesn't currently
use it.

The UNIX world basically assumes a single system-wide character set (a
single character set must be used for the names in the filesystem, at
least);  while technically possible, adding character set indication to
every text file format and character set conversion to every program
using the file format is not practical: it is too much work, it adds
confusing failure modes and it breaks the traditional text manipulation
tool usage.

Thus I prefer a model in which all info files installed on the system
use a common character set, which is the same as the character set the
system is using for other purposes (UTF-8 is the obvious candidate).  If
the .texi files in released tarballs don't use this character set,
converting them would be the distributor's task.
        Mirek




reply via email to

[Prev in Thread] Current Thread [Next in Thread]