groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [groff] Accented Cyrillic characters


From: Ralph Corderoy
Subject: Re: [groff] Accented Cyrillic characters
Date: Thu, 02 Aug 2018 12:26:31 +0100

Hello Robin!

> Currently, I'm just adding a standalone UTF composite accent character
> (U+0301) after every vowel I want to show stress on since Unicode does
> not seem to define separate codepoints for all of the Cyrillic
> accented vowels.

That's the recommendation in
https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode

> the terminal emulator (at least URXVT) will combine the accent and the
> vowel into a single glyph.

xterm(1) does too.  libvte-based terminals seem to place it on the line
above!?

> This approach of adding accents causes problems with tbl, though. The
> combination of the two characters into a single glyph screws up tbl's
> (and/or Groff's) assumptions. For instance, in a table like:
>     | саморазруше́ние |
>     | foo bar         |
> the bars won't properly line up.

It boils down to persuading `\w', used by tbl(1), that the U+0301 takes
no space.

    $ groff -Tutf8 >/dev/null
    .nr w \w'A'       
    .tm \nw 
    24
    .nr w \w'\[u0435]'
    .tm \nw 
    24 
    .nr w \w'\[u0435]\[u0301]'
    .tm \nw          
    48 
    $

Tricks like overstrike with `\o' and moving left with \h affect the \w
but don't give the desired output because grotty(1) also processes them.

> For instance, \[u0435_0301] should theoretically also format as an
> accented Cyrillic e.  But what happens instead is that the accent is
> dropped during formatting.  Curiously, this works when using latin
> characters. For instance, \[e u0301], \[e aa], \[e '] will result in a
> properly accented latin e.

I think those are mapped onto their Unicode rune, and as you start by
saying, then isn't one for U+0435 combined with U+0301.

    $ cd /usr/share/groff/1.22.3/font/devutf8
    $ grep 0435 R
    u0435_0300  24      0       0x0450
    u0435_0308  24      0       0x0451
    u0435_0306  24      0       0x04D7
    $ grep '0045.*0301' R 
    u0045_0301      24      0       0x00C9
    u0045_0304_0301 24      0       0x1E16
    u0045_0302_0301 24      0       0x1EBE
    $

I look forward to solutions and workarounds from the others here.  :-)

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]