groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] mom : unicode in .INCLUDE'd files


From: Ralph Corderoy
Subject: Re: [Groff] mom : unicode in .INCLUDE'd files
Date: Fri, 21 Jul 2017 11:30:00 +0100

Hi Erich,

> When I enter unicode, like:
>
>                          ÄÖÜ SS ÒÓÔÕŎŌ Ç äöü ß òóôõŏō ç
>
> ...and process them with pdfmom, they show up perfectly.  But if I
> include the same characters in a file with the .INCLUDE macro, they
> disappear.

Those are Unicode codepoints, but what encoding are you using to
represent them in a file as bytes?  Is it UTF-8?  Only `Ŏ', U+014E,
isn't in ISO 8859-1, AKA Latin1.

> Processed with -P-bcu -Tutf8, they show up like wrong encoded strings.

troff(1) reads files of ISO 8859-1.  It sounds like, in this particular
test, you're giving it bytes of UTF-8 that it's trying to interpret as
ISO-8859-1.

U+00A3 is a `£'.  In UTF-8, it's two bytes;  the 0a is the linefeed.

    $ hd <<<£
    00000000  c2 a3 0a                                          |...|

iso-8859-1(7) shows c2 is `Â' and a3 is `£' and that's how groff
interprets these bytes.

    $ groff -Tutf8 <<<£ | grep .
    £

> I tried, in vain, the following pipe:
>
>     soelim example.mom | preconv -eutf8 |
>     groff -mom -Tutf8 -P-bcu  > example.txt

As Denis said, soelim(1) looks for `.so' lines.  `.INCLUDE' means
nothing to it.
http://git.savannah.gnu.org/cgit/groff.git/tree/src/preproc/soelim/soelim.cpp#n169
You could try replacing `.INCLUDE' with `.so'.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]