Hi Oliver,
Oliver Corff wrote on Sat, May 15, 2021 at 11:39:31PM +0200:
I try to use the correct abbreviation for the former Czechoslovak
Socialist Republic, which is U+010C SSR (C + hacek, caron, wedge).
The first attempt (enter Unicode 0x010C directly, leaving everything to
preconv(1), did not work.
Works for me:
$ printf '\xc4\x8cSSR' | mandoc
$ printf '\xc4\x8cSSR' | groff -kT utf8
Both commands above produce the expected output for me (OpenBSD-current
with no fancy configuration changes, just using the default installation).
00000000 c4 8c 53 53 52 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a |..SSR...........|
00000010 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a |................|
Then I consulted groff_char(7) but there is no
predefined \[vC], only \[vS] etc. for base letters s, S, z and Z. No C!
I keep scratching my head.
Works for me:
$ printf '\\[u010C]SSR' | mandoc
$ printf '\\[u010C]SSR' | groff -T utf8
Both commands above produce the expected output; specifically:
$ printf '\\[u010C]SSR' | groff -T utf8 | hexdump -C
00000000 c4 8c 53 53 52 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a |..SSR...........|
None of the other suggested notations (like \[u0043_030C] work (see
groff(7)) out of the box.
Mandoc doesn't support that syntax, but with groff, even that works for me:
$ printf '\\[u0043_030C]SSR' | mandoc -T lint
mandoc: <stdin>:1:1: WARNING: invalid escape sequence: \[u0043_030C]
$ printf '\\[u0043_030C]SSR' | groff -T utf8 | hexdump -C
00000000 c4 8c 53 53 52 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a |..SSR...........|
.AM
I don't think any fancy workarounds are needed.
Yours,
Ingo