[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: `texindex` output depends on locale settings
From: |
arnold |
Subject: |
Re: `texindex` output depends on locale settings |
Date: |
Sun, 06 Nov 2022 06:33:33 -0700 |
User-agent: |
Heirloom mailx 12.5 7/5/10 |
Hi.
Thanks for the report. As written, texindex is indeed suitable only
for English; when I wrote it ~ 9 years ago, nobody said anything about
support for other languages.
I think this can be remedied, although there may be issues with
awk versions besides gawk as most don't support Unicode or other
multibyte character sets.
Arnold
Werner LEMBERG <wl@gnu.org> wrote:
>
> [texindex (GNU texinfo) 6.8dev]
> [GNU Awk 4.2.1, API: 2.0]
> [openSUSE Leap 15.4]
>
>
> There are two bugs with texindex, making it basically unusable for
> everything except English as the main document language. For the
> report below, here is an input file.
>
> ```
> \input texinfo.tex
>
> @documentencoding UTF-8
> @documentlanguage ca
>
> @findex a
> @findex à
> @findex u
> @findex ù
>
> @printindex fn
>
> @bye
> ```
>
> * The first, really severe bug is that the resulting output is
> completely broken if `texindex` is called with `LANG=C`. Saying
>
> ```
> LANG=C texi2pdf sort-ca.texi
> ```
>
> creates the following `.fns` output
>
> ```
> \initial {0xc3}
> \entry{\code {à}}{1}
> \entry{\code {ù}}{1}
> \initial {A}
> \entry{\code {a}}{1}
> \initial {U}
> \entry{\code {u}}{1}
> ```
>
> As can be seen, the `\initial` line contains a single byte (where
> '0xc3' is a real byte), which suprisingly doesn't make pdftex abort,
> but both xetex and luatex stop with errors. I have to use a UTF-8
> locale like `en_US.utf8` to get decent output.
>
> I consider it very bad that `texindex` is locale-dependent. IMHO
> the proper solution is to make `texinfo.tex` emit a document
> encoding statement to the (unsorted) index file that in turn gets
> acknowledged by `texindex`.
>
> * While `texindex` is sensitive to the locale regarding the input
> encoding, it isn't for collation: any `LANG` or `LC_COLLATE` setting
> gets ignored. Similarly, it ignores the `@documentlanguage`
> instruction to derive a sorting order. For example, the Catalan
> order for the above example should be 'aàuù', however, in the output
> it is sorted as `àùau'.
>
> The proper fix would be to make `texinfo.tex` emit a document
> language statement to the (unsorted) index file that in turn gets
> acknowledged by `texindex`.
>
>
> Werner
- `texindex` output depends on locale settings, Werner LEMBERG, 2022/11/06
- Re: `texindex` output depends on locale settings,
arnold <=
- Re: `texindex` output depends on locale settings, Eli Zaretskii, 2022/11/06
- Re: `texindex` output depends on locale settings, Werner LEMBERG, 2022/11/06
- Re: `texindex` output depends on locale settings, Eli Zaretskii, 2022/11/06
- Re: `texindex` output depends on locale settings, Werner LEMBERG, 2022/11/06
- Re: `texindex` output depends on locale settings, Eli Zaretskii, 2022/11/06
- Re: `texindex` output depends on locale settings, arnold, 2022/11/06