groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Wishlist] Don't discard groff comments on HTML output


From: G. Branden Robinson
Subject: Re: [Wishlist] Don't discard groff comments on HTML output
Date: Tue, 25 Jan 2022 12:43:37 +1100
User-agent: NeoMutt/20180716

Hi Alex,

At 2022-01-24T22:48:29+0100, Alejandro Colomar wrote:
> Hi Branden,
> 
> I'd like to see groff comments preserved in the HTML output (as HTML
> comments).
> 
> So, for `groff -T html ...`,
> 
> .\" hello world
> 
> would be transformed to
> 
> <!-- hello world -->
> 
> Sounds good?

That's a bigger challenge than the other items you've raised so far
(well, the grohtml relative inset thing, I can imagine being a real PITA
to hammer out, but _conceptually_ it's easy).

The problem is that troff(1) disposes of comments entirely very early in
parsing.  Importantly, they're stripped out of macro definitions before
the definition is even stored.

It's possible these issues could be overcome by converting comments into
a device control command escape sequence (\X''), but there are quoting
issues to consider (although _maybe_ my recent change to how characters
in such escape sequence get mapped when being written to the
device-independent output addresses that, or makes doing so easier[1],
and possibly other matters I haven't thought of.

So this one is a heavier lift, I think.

Regards,
Branden

[1] commit 9d61b3d142842589b90d7eda0ed3270fbbf6166f
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
Date:   Fri Oct 1 19:20:25 2021 +1000

    [troff]: Enable ASCII in device control escapes.

    [troff]: Convert special character glyphs corresponding to Unicode Basic
    Latin ("ASCII") code points to those code points when they occur in
    device escapes.  (They should be correct for IBM code page 1047 as well,
    but this is untested.)  This is necessary for encoding URLs in device
    control commands.  Special character identifiers are presumed to be the
    defaults documented in groff_char(7); this is a design gap that we
    should consider addressing.  (We don't have a way to ask "is this the
    special character corresponding to Unicode basic Latin code point X?")

    * src/roff/troff/input.cpp (encode_char): Do it.

    I'm not documenting this in NEWS as it feels like a pretty dusty corner
    even though I'm about to leverage it for something of much higher
    visibility.

Also see:
65737d48ad7e75353a67e4f408bb68bc5d5b0773
3d1988cabc90f3c4b0b0000bb4a809be61eeba3c
eb695ab2b5e2bae54afa102355c493bda6e29d3e

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]