groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Groff] Re: preconv


From: Bruno Haible
Subject: [Groff] Re: preconv
Date: Tue, 3 Jan 2006 18:01:49 +0100

Werner Lemberg writes:

> Can you provide a complete list of encodings supported on XEmacs (the
> latest version preferred)?  I would like to mark them correctly in my
> table for reference purposes.

XEmacs 21.5.24 appears to have the following coding-systems (excluding
useless iso-2022 variants):

iso-8859-1
iso-8859-2
iso-8859-3
iso-8859-4
iso-8859-5
iso-8859-6
iso-8859-7 = greek-iso-8bit
iso-8859-8
iso-8859-8-e
iso-8859-9
iso-8859-15
iso-8859-16
koi8-r
alternativnyj
gb2312 = cn-gb-2312 = chinese-euc
hz = hz-gb-2312
big5 = cn-big5
iso-2022-jp = junet
iso-2022-jp-1978-irv = old-jis
iso-2022-jp-2
jis7
jis8
euc-jp = euc-japan = japanese-euc
shift_jis = shift-jis
iso-2022-int-1
euc-kr = euc-korea
iso-2022-kr = korean-iso-7bit-lock
tis-620 = tis620 = th-tis620 = thai-tis620
tibetan = tibetan-iso-8bit
viscii = vietnamese-viscii
vscii = vietnamese-vscii
viqr = vietnamese-viqr
devanagari = in-is13194-devanagari
lao
windows-037
windows-437
windows-500
windows-708
windows-709
windows-710
windows-720
windows-737
windows-775
windows-850
windows-852
windows-855
windows-857
windows-860
windows-861
windows-862
windows-863
windows-864
windows-865
windows-866
windows-869
windows-874
windows-875
windows-932
windows-936
windows-949
windows-950
windows-1026
windows-1200
windows-1250
windows-1251
windows-1252
windows-1253
windows-1254
windows-1255
windows-1256
windows-1257
windows-1258
windows-1361
windows-10000
windows-10001
windows-10006
windows-10007
windows-10029
windows-10079
windows-10081

> >   - EUC-JISX0213 and Shift_JISX0213 are supported by glibc and
> >     libiconv nowadays.  You can add them to the table.
>
> I suppose those encodings exist on XEmacs, right?

These encodings are not built-in in XEmacs, rather they come as a
Mule-UCS add-on.

> > - In BOM_table, I would not comment out the little-endian UTF-32
> >   BOM.  It is the only way to prevent misinterpreting a file in
> >   little-endian UTF-32 as little-endian UTF-16.  You have to trust
> >   that the input file will not have NUL characters.
>
> Well, it's actually not necessary to make a difference: The `extract'
> method of the groff's `string' class removes all null bytes before
> passing the data to the function which tests for the coding tag --
> note that `check_encoding_tag' is called before `iconv'.

You are confusing me now, because check_encoding_tag
is looking for a "-*- ... -*-" line - which is actually useless if a manual
page were to be encoded in a UTF-16 or UTF-32 encoding.
It is even more confusing to see how the result of get_BOM is used:
get_BOM splits the input into 'BOM' and 'data', and then later they are
pasted together again, without looking at the 'BOM' value.

The way I would implement it, if a BOM has been found that indicates
a particular UTF-8/16/32 variant, it would set the value of the 'encoding'
variable, without even calling 'check_encoding_tag'. Because if you
find an UTF-16 encoded file that carries a "-*- coding: iso-8859-15 -*-"
line, the encoding is really UTF-16.

Bruno


reply via email to

[Prev in Thread] Current Thread [Next in Thread]