bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: `texindex` output depends on locale settings


From: arnold
Subject: Re: `texindex` output depends on locale settings
Date: Sun, 06 Nov 2022 06:33:33 -0700
User-agent: Heirloom mailx 12.5 7/5/10

Hi.

Thanks for the report. As written, texindex is indeed suitable only
for English; when I wrote it ~ 9 years ago, nobody said anything about
support for other languages.

I think this can be remedied, although there may be issues with
awk versions besides gawk as most don't support Unicode or other
multibyte character sets.

Arnold

Werner LEMBERG <wl@gnu.org> wrote:

>
> [texindex (GNU texinfo) 6.8dev]
> [GNU Awk 4.2.1, API: 2.0]
> [openSUSE Leap 15.4]
>
>
> There are two bugs with texindex, making it basically unusable for
> everything except English as the main document language.  For the
> report below, here is an input file.
>
> ```
> \input texinfo.tex
>
> @documentencoding UTF-8
> @documentlanguage ca
>
> @findex a
> @findex à
> @findex u
> @findex ù
>
> @printindex fn
>
> @bye
> ```
>
> * The first, really severe bug is that the resulting output is
>   completely broken if `texindex` is called with `LANG=C`.  Saying
>
>   ```
>   LANG=C texi2pdf sort-ca.texi 
>   ```
>
>   creates the following `.fns` output
>
>   ```
>   \initial {0xc3}
>   \entry{\code {à}}{1}
>   \entry{\code {ù}}{1}
>   \initial {A}
>   \entry{\code {a}}{1}
>   \initial {U}
>   \entry{\code {u}}{1}
>   ```
>
>   As can be seen, the `\initial` line contains a single byte (where
>   '0xc3' is a real byte), which suprisingly doesn't make pdftex abort,
>   but both xetex and luatex stop with errors.  I have to use a UTF-8
>   locale like `en_US.utf8` to get decent output.
>
>   I consider it very bad that `texindex` is locale-dependent.  IMHO
>   the proper solution is to make `texinfo.tex` emit a document
>   encoding statement to the (unsorted) index file that in turn gets
>   acknowledged by `texindex`.
>
> * While `texindex` is sensitive to the locale regarding the input
>   encoding, it isn't for collation: any `LANG` or `LC_COLLATE` setting
>   gets ignored.  Similarly, it ignores the `@documentlanguage`
>   instruction to derive a sorting order.  For example, the Catalan
>   order for the above example should be 'aàuù', however, in the output
>   it is sorted as `àùau'.
>
>   The proper fix would be to make `texinfo.tex` emit a document
>   language statement to the (unsorted) index file that in turn gets
>   acknowledged by `texindex`.
>
>
>      Werner



reply via email to

[Prev in Thread] Current Thread [Next in Thread]