groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Not predefined Extended Latin character needed, interesting solution


From: Oliver Corff
Subject: Re: Not predefined Extended Latin character needed, interesting solution found
Date: Mon, 17 May 2021 15:47:02 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1

Hi Ingo,

that's interesting. When producing for a UTF8 target, your observation
is correct, but for PDF groff does not seem to function as naively assumed.

When I write U+010C (as a character, or in escape form doesn't matter),
my installation produces an "Ä" (A umlaut).

Try

printf '\xc4\x8cSSR' | groff -kT pdf > Ae.pdf

I get the following warning:

troff: <standard input>:1: warning: can't find special character
'u0043_030C'

and the PDF shows "ÄSSR".

My system: Linux fedora 5.11.15-200.fc33, groff 1.22.4, both in default
installation out of the box.

Anyway, for my purpose .AM solves the problem. Is it possible to include
that in the man pages of the groff system? I only found in online, as
indicated in my original post.

Best regards,

Oliver.


On 17/05/2021 15:35, Ingo Schwarze wrote:
Hi Oliver,

Oliver Corff wrote on Sat, May 15, 2021 at 11:39:31PM +0200:

I try to use the correct abbreviation for the former Czechoslovak
Socialist Republic, which is U+010C SSR (C + hacek, caron, wedge).
The first attempt (enter Unicode 0x010C directly, leaving everything to
preconv(1), did not work.
Works for me:

    $ printf '\xc4\x8cSSR' | mandoc
    $ printf '\xc4\x8cSSR' | groff -kT utf8

Both commands above produce the expected output for me (OpenBSD-current
with no fancy configuration changes, just using the default installation).

00000000  c4 8c 53 53 52 0a 0a 0a  0a 0a 0a 0a 0a 0a 0a 0a  |..SSR...........|
00000010  0a 0a 0a 0a 0a 0a 0a 0a  0a 0a 0a 0a 0a 0a 0a 0a  |................|


Then I consulted groff_char(7) but there is no
predefined \[vC], only \[vS] etc. for base letters s, S, z and Z. No C!
I keep scratching my head.
Works for me:

    $ printf '\\[u010C]SSR' | mandoc
    $ printf '\\[u010C]SSR' | groff -T utf8

Both commands above produce the expected output; specifically:

    $ printf '\\[u010C]SSR' | groff -T utf8 | hexdump -C
   00000000 c4 8c 53 53 52 0a 0a 0a  0a 0a 0a 0a 0a 0a 0a 0a |..SSR...........|


None of the other suggested notations (like \[u0043_030C] work (see
groff(7)) out of the box.
Mandoc doesn't support that syntax, but with groff, even that works for me:

    $ printf '\\[u0043_030C]SSR' | mandoc -T lint
   mandoc: <stdin>:1:1: WARNING: invalid escape sequence: \[u0043_030C]
    $ printf '\\[u0043_030C]SSR' | groff -T utf8 | hexdump -C
   00000000 c4 8c 53 53 52 0a 0a 0a  0a 0a 0a 0a 0a 0a 0a 0a |..SSR...........|


.AM
I don't think any fancy workarounds are needed.

Yours,
   Ingo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]