groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [groff] Mapping of \(bu to MIDDLE DOT


From: G. Branden Robinson
Subject: Re: [groff] Mapping of \(bu to MIDDLE DOT
Date: Thu, 28 Mar 2019 21:00:48 +1100
User-agent: NeoMutt/20180716

At 2019-03-27T04:34:18+0000, Jeff Conrad wrote:
> Is there a reason that tty.tmac translates \(bu to \(pc or \(md
> regardless of the output device or whether \(bu is available?
> 
> .ie c\[pc] \
> .  tr \[bu]\[pc]
> .el \
> .  if c\[md] \
> .    tr \[bu]\[md]

Are you looking at an old implementation?  There's some important
context missing here:

$ nl /usr/share/groff/1.22.4/tmac/tty.tmac | sed -n '14,21p'
    14  .if !'\*[.T]'utf8' \{\
    15  .  ie c\[pc] \
    16  .    tr \[bu]\[pc]
    17  .  el \
    18  .    if c\[md] \
    19  .      tr \[bu]\[md]
    20  .\}
    21  .

> The only thing I can find on this is Ingo's message of 30 November 2015
> ("bullets render as question marks"); I agree with his statement
> "Perhaps not the best possible choice."

It sure seems like you might be re-reporting a problem Carsten Kunze
raised in June 2015, and which prompted Werner to wrap the conditional
you mention in an "if device is not UTF-8" block:

https://lists.gnu.org/archive/html/groff/2015-06/msg00040.html

> Perhaps it makes sense for Tlatin1, which doesn't have a true bullet,
> but it seems silly for Tutf8 (or my Tcp1252).

Your cp1252 device does raise a good point.

I reason about it this way:

Really we shouldn't be conditional on UTF-8 per se, but on the existence
of the bullet glyph in the font for the tty device.  However, the tty
device ignores fonts, and tty devices are really just character
encodings, which do or do not support given characters, and the glyph
repertoire is hidden from us by the device implementation; for instance,
numerically, terminal emulators (including Unix "console" devices) are
probably far and away the preponderant nroff-style devices in the world,
and there is no interface layer defined anywhere I know of such that
these devices can report their character repertoire up to an
application.  VGA-style console devices, framebuffer consoles, and GUI
terminal emulators can even change these on the fly.  (Who else
remembers live-hacking the display font in MS-DOS?)

So Werner's fix worked because there were (and are) no nroff/tty devices
in the groff tree that supported the bullet character _except_ -Tutf8.

My recommendations are:
1) Upgrade to groff 1.22.4; and
2) Change the conditional on line 14 of tty.tmac from:

    14  .if !'\*[.T]'utf8' \{\

to:

    14  .if !c\[bu] \{\

...and tell us if that fixes your problem.

Personally, I advocate incorporating cp1252 into groff.  It's only an
8-bit character set, should therefore be a low maintenance burden, and
really should make life a bit more bearable for groff's Windows users.
And that's good PR for groff, GNU, copyleft, and Free Software.

> Even for Tlatin1, I'd prefer an asterisk or even the age-old
> overstruck '+' and 'o'.  Isn't the general rule for nroff to make the
> best possible visual approximation when the true character isn't
> available?

As noted above, knowing what will actually show up on the output device
is, in principle, impossible for nroff/tty output devices.  However, we
can generally assume that users of 8-bit encodings will have
comprehensive fonts available by default--they'd have to go out of their
way to avoid them.

Life is harder in UTF-8 world.

To get that asterisk:

In your documents, or your .troffrc, could you not do this?

.fchar \[bu] *

As a minor point, I do think the existing fallback should be reversed in
order:

From:

.fchar \[bu] \z+o

To:

.fchar \[bu] \zo+

That way the plus sign wins on a non-overstriking device, instead of the
"o".

The \z+o status quo seems to follow a pattern that makes sense for
modified letterforms, i.e., \z'a; on a 7-bit ASCII, non-overstriking
device, you want the "a" to "win", because it carries the more important
semantic information.

That reasoning does not hold for bullet substitutes, which simply need
to stand out graphically (your argument for not using a middle dot or
centered period, which may be as small as one pixel on some devices),
and not be semantically confusable with text.

As "o" is actually a word (even in English, though much more prominently
in Spanish), I find the present arrangement unfortunate.

Regards,
Branden

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]