groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG] -T html: \- rendered as something different than ASCII 45


From: Alejandro Colomar (man-pages)
Subject: Re: [BUG] -T html: \- rendered as something different than ASCII 45
Date: Tue, 25 Jan 2022 14:02:05 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.1

Hi Branden,

On 1/25/22 01:58, G. Branden Robinson wrote:
Hi Alex,

At 2022-01-24T22:13:32+0100, Alejandro Colomar wrote:
Hi Branden,

And another html bug; however, this one seems to be a browser bug, but
please confirm.

Maybe not.

For the following code:

[
.TP
.B \(aq\-\(aq
Empty white cell.
]

groff(1) generates the following HTML code:

[
<p><b>'&minus;'</b></p></td>
<td width="5%"></td>
<td width="22%">
]

However, both firefox and chrome show something that if copy&pasted to
a terminal is different from ASCII 45, and is longer than the proper
minus sign.

If your system works like mine, it _is_ a "proper minus sign".

$ lynx -dump EXPERIMENTS/chess-init.6.html | sed -n '21p' | xxd
00000000: 2020 2063 6865 7373 e288 9269 6e69 740a     chess...init.

It seems that my system is not like yours in some way:

$ lynx -dump http://www.alejandro-colomar.es/man/man6/chess-init.6.html | sed -n '21p' | xxd
00000000: 2020 2063 6865 7373 2d69 6e69 740a          chess-init.

lynx(1) is printing 2d, which is ASCII 45.



And UTF-8 E2 88 92 is...

$ unicode −
U+2212 MINUS SIGN
UTF-8: e2 88 92 UTF-16BE: 2212 Decimal: &#8722; Octal: \021022
−
Category: Sm (Symbol, Math); East Asian width: N (neutral)
Unicode block: 2200..22FF; Mathematical Operators
Bidi: ES (European Number Separator)

Should I report a bug to firefox?

No, you're getting correct output...almost.

\- to U+2212 is a wholly legitimate mapping for troff typesetting going
back to 1973.

But man(7) pages are an issue.  There, a "real" minus sign is almost
never wanted.  It makes sense for the man(7) package to have a bespoke
mapping for the minus sign glyph to the basic Latin hyphen-minus on
devices that distinguish them.

I see the following in /etc/groff/man.local on my Debian system with
groff 1.22.4:

.  \" Debian: "\-" is more commonly used for option dashes than for minus
.  \" signs in manual pages, so map it to plain "-" for HTML/XHTML output
.  \" rather than letting it be rendered as "&minus;".
.  ie '\*[.T]'html' \
.    char \- \N'45'
.  el \{\
.    if '\*[.T]'xhtml' \
.      char \- \N'45'
.  \}

I don't know which .tmac file is in effect in my system now that I'm using groff compiled from source instead of the system one. `make install` added a huge number of configuration files:

$ find / 2>/dev/null | grep tmac | grep -v ^/home | grep local | wc -l
102


In groff Git HEAD, we have this in an.tmac:

.\" === Define/remap characters. ===
.
.\" For UTF-8, map the minus sign to the hyphen-minus to facilitate
.\" copy and paste of code examples, file names, and URLs embedding it.
.if '\*[.T]'utf8' \{\
.  char \- \N'45'
.  char  - \N'45'
.\}

As a related matter I would kill the second 'char' request (remapping
the unescaped input dash).  The first should be done not just for
'utf8', but 'html' and 'xhtml' as well.

Yup.


Would you like to file this one as well?

Done.


Regards,

Alex


--
Alejandro Colomar
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]