groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question about Unicode Greek


From: G. Branden Robinson
Subject: Re: Question about Unicode Greek
Date: Fri, 12 Feb 2021 17:13:04 +1100
User-agent: NeoMutt/20180716

Hi Robert and Steffen,

At 2021-02-11T23:03:47+0100, Steffen Nurpmeso wrote:
> Robert Goulding wrote in
>  <CACE7msuTMpqMaMg8c9m1AeBJOt1DtwFQtd0f_vp10i8vZLvjTw@mail.gmail.com>:
>  |I've been away from groff for a long time; I think the last time I used it,
>  |there was no Unicode support at all. Now I'm interested in using it as a
>  |filter from markdown, through pandoc to groff to pdf.
>  |
>  |This is working well for me, except for a handful of files in which I use
>  |Greek with accents. I understand that groff doesn't have characters for
>  |accented Greek characters, and I'm willing to do the work to add them, I'm
>  |just trying to understand what's involved.
>  |
>  |So, here is a tiny document with some Greek in it:
>  |
>  |.LP
>  |ἐν ἀρχῇ ἦν ὁ λόγος, καὶ ὁ λόγος ἦν πρὸς τὸν θεόν, καὶ θεὸς ἦν ὁ λόγος
>  |
>  |When I run this through preconv, I get the following:
>  |
>  |.lf 1 rubbish.ms
>  |.LP
>  |\[u1F10][...]
>  |
>  |with all of the Unicode characters turned into the correct code numbers.
>  |When I run this through groff -ms -Tps I get the following errors:
>  ...
>  |This is what is puzzling me. The very first letter, ἐ, is correctly given
>  |its unicode description \[u1F10] by preconv; but then troff seems to
>  |decompose it into \[u03B5] which is ε and \[u0313] which is ̓ . So, if I
>  |wanted to tell groff how to print ἐ, how do I go about it, when there seem
>  |to be two internal representations?
> 
> It seems the groff source repository contains the necessary update
> in the afmtodit tables to include this character for non per-se
> Unicode aware output devices.  A new release will ship it thus.
> You could try to update the %AGL_to_unicode hash in
> /usr/bin/afmtodit of your installed groff accordingly, too.

I am not so sure this _is_ fixed.  Interestingly, it works for the grotty
output driver but not grops.  Here's what I get with groff git HEAD.

$ ./test-groff -Tutf8 -k -ms EXPERIMENTS/greek.ms | cat -s

ἐν  ἀρχῇ  ἦν ὁ λόγος, καὶ ὁ λόγος ἦν πρὸς τὸν θεόν, καὶ θεὸς
ἦν ὁ λόγος

$ ./test-groff -Tps -z -k -ms EXPERIMENTS/greek.ms
troff: backtrace: file 'EXPERIMENTS/greek.ms':2
troff: EXPERIMENTS/greek.ms:2: warning: can't find special character 
'u03B5_0313'
troff: backtrace: file 'EXPERIMENTS/greek.ms':2
troff: EXPERIMENTS/greek.ms:2: warning: can't find special character 
'u03B1_0313'
troff: backtrace: file 'EXPERIMENTS/greek.ms':2
troff: EXPERIMENTS/greek.ms:2: warning: can't find special character 
'u03B7_0342_0345'
troff: backtrace: file 'EXPERIMENTS/greek.ms':2
troff: EXPERIMENTS/greek.ms:2: warning: can't find special character 
'u03B7_0313_0342'
troff: backtrace: file 'EXPERIMENTS/greek.ms':2
troff: EXPERIMENTS/greek.ms:2: warning: can't find special character 
'u03BF_0314'
troff: backtrace: file 'EXPERIMENTS/greek.ms':2
troff: EXPERIMENTS/greek.ms:2: warning: can't find special character 
'u03BF_0301'
troff: backtrace: file 'EXPERIMENTS/greek.ms':2
troff: EXPERIMENTS/greek.ms:2: warning: can't find special character 
'u03B9_0300'
troff: backtrace: file 'EXPERIMENTS/greek.ms':2
troff: EXPERIMENTS/greek.ms:2: warning: can't find special character 
'u03BF_0300'

Moreover I don't recall any update to the afmtodit tables that covered
these sort of character combinations.  We (I) did update them to capture
some new code points from Unicode 13.0[1] and to kern the ellipsis
correctly[2].

Shouldn't the output driver (grops) be taking these NFD-decomposed
sequences and building the combined glyphs with overstriking?

Also, the PostScript output seems to be rendering all the Greek letters
slanted instead of upright.  Surely that's not correct?

Regards,
Branden

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]