groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP


From: G. Branden Robinson
Subject: Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP
Date: Sat, 28 Oct 2023 14:34:45 -0500

Hi Ingo,

Fair warning: this message is (even) more opinionated than usual.

At 2023-10-26T18:37:58+0200, Ingo Schwarze wrote:
> G. Branden Robinson wrote on Tue, Oct 24, 2023 at 04:54:21AM -0500:
> > I haven't explicitly made the connection to HTML before,
> 
> Well, when designing a new language from scratch, you should
> always consider prior art, to reduce the risks of reinventing the
> wheel, introducing new design mistakes, and leaving design gaps.

I agree, particularly as I observed this industry send self-proclaimed
star kickers of programming language design onto the field only to see
themselves ricochet the ball off the goal posts into their own faces
over and over again as they fought to supplant C without even paying
attention to anything that had been done by, say, Ada.

(Take a drink.  You knew I was going to use that example.)

But hey, a lot of books and training got sold, so they were successful.

At least Stroustrup was aware of Ada, which is why he kept trying to
clone its best features (except for tasking; Narain Gehani, also of Bell
Labs, tried that, and did not succeed in the market[1]).

> In particular, when designing a markup language for documentation, i
> consider it critical to carefully compare the design to HTML, LaTeX,
> and mdoc(7) before making final decisions, and there may be a few
> more that might also be worth looking at for comparison.

I was not attempting to confess ignorance of HTML, but making the much
simpler statement that I had not previously bothered to attempt a
mapping from man(7) macro names to HTML elements.  I am not yet au
courant with HTML5, but I was paying close attention when HTML4 appeared
on the scene and regarded it as a significant improvement over its
predecessors.  Subsequently I put a lot of distance between myself and
web development.  Watching the W3C/WHATWG debacle and the repeated
attempts by market behemoths to colonize web technologies with one
rent-extraction scheme after another sufficed to keep me away.  I've
started reading up on HTML5 and I see some nice possibilities.  Enough
that I'm beginning to think about what an attempt to attack the grohtml
problem again from the ground up would look like.[2]

> Looking at DocBook is likely *not* a good idea though simply because
> its design is so atrocious in so many respects that you will waste
> massive amounts of time without learning anything, excpt maybe what
> not to do.

I remember excitedly buying O'Reilly's DocBook book back around the year
2000, with high hopes.  You can imagine the excitement draining away
from my face as I beheld the unholy mess before me.

> That doesn't mean the new design must follow the existing languages,
> but not even considering HTML 5 when designing a markup language
> feels like straightforward negligence to me.  :-(

Worry not.  I'm reading.  The 1,500-page spec of the "Living Document"
is a bit discouraging in its length, however.  Alex Colomar thinks
groff_man_style(7) is dauntingly long at 20.

> > LS -> <UL>
> > TP -> <LI> ... </LI>
> > LE -> </UL>
[...]
> Well, *if* you really want to totally redesign the very foundations
> of man(7) and change it from almost presentation-only and almost
> in-line-macro only to the totally different paradigms of semantic
> markup and block oriented, that is definitely one among the many task
> involved in redesigning.

At this point I don't think "totally redesigning" man(7) is necessary,
either in general or to achieve the specific aim above.  man(7) already
has no list-structuring macros, so I don't have to delete anything.
Just add a bit of information that, for anything other than HTML output,
can be harmlessly ignored anyway.

The same goes for the keep macros KS/KE that I want.  Not setting type
on a page?  Ignore them.

> Yes, structural markup of a list requires saying where the list
> starts, where the list ends, and where each item begins.  So that
> part of the design of .LS feels right.  If i understand correctly,
> the .LS macro will not accept text arguments, which is also good.
> The naming seems fair, too.  I did *not* review the proposal though,
> so there may be downsides that i am unaware of.  All i'm saying is
> that these three points look good, and that the man(7) language indeed
> has one of its major weaknesses in this area.

Okay.  Well, crashing ideas up against reality is one thing feature
branches are good for.

> I agree that .br+.ns is not better than .TQ in a man(7) page.
> 
> The following probably wouldn't be too hard to fix:
> 
>    $ man -O tag=ns roff
>      ns      Turn on no-space mode.  Currently ignored.
>   [...]
> 
> The fact that mandoc(1) still ignores .ns is a (weak) indication
> that the number of real-world manual pages directly using it is
> likely very small.

I think that is likely.

> > The "semantic lift" that `TQ` attempts to achieve here is, I think,
> > an improvement on use of `br` and `ns`, even if it imparts no
> > semantic content per se, but rather works around the entangled
> > formatting consequences of the `TP` macro that does.
> 
> To avoid muddling the waters, i tend to distinguish "semantic markup"
> (when talking about markup that indicates, for example, a variable
> name or stress emphasis) and "structural markup" (when talking about
> markup that indicates, for example, section headings, lists, and
> displays).

You've said this before.  I put "semantic lift" in scare quotes
purposefully--mockingly.  I think it's a term that is, in practice, used
without a clear definition so as to aid the production of hype in
promoting "solutions".  You know, kind of like "open source".

_Linguistically_, I don't know that stress emphasis falls within the
domain of semantics.  Maybe.  I'd be interested to hear from a competent
linguist.

> The concepts of structural markup is closely related to the concept of
> block nesting, just like the structural programming paradigm is
> closely related to the concept of code block nesting.  Both became
> mainstream around the same time: FORTAN 77 and C are structural
> languages, FORTAN 66 is not.  (The concepts involved are of course
> much older.)

Agreed.  In the mid-1970s, Bell Labs was pretty excited about RATFOR
(Rational Fortran), which added structure to spaghetti-noodle FORTRAN.
Too bad they didn't take the opportunity to develop Rational Roff.  ;-)

Thanks for acknowledging the errors.  I don't want to beat you up over
'em, just to get the record straightened, and I think it is.

I'm sure to make fresh errors of my own, probably sooner than I even
suspect...

Regards,
Branden

[1] https://www.amazon.com/stores/Narain-Gehani/author/B001HQ5MQ8

    I've read it (own a copy, even).  Lexically it makes no effort to
    hide its inspiration, and it's a pretty awkward fit with C.  But my
    guess is that it failed for a deeper reason than that.  The Labs had
    Unix kernel hackers and (so I feverishly imagine) there was no
    damned way they were going to delegate scheduling decisions to a
    language runtime in user space.  An unfortunate thing to be stubborn
    about, because FEWER MODE SWITCHES.  Much of the history of the
    Linux kernel is story after story of services being pulled into it
    to overcome latencies imposed by context and (especially) mode
    switches.  A lot of these switches, you could get rid of if you had
    user-space services authenticating to each other with strong
    cryptography and then passing messages or sharing memory.  This is
    nothing new; it's the microkernel concept.  The biggest swindle ever
    sold in the study of operating systems is that Mach was deserving of
    the name "microkernel".  But a lot of people profited by that
    swindle--and still are profiting--so it is sacrosanct.

    https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.55.9939

[2] That may sound like a surprising statement, or at least one begging
    follow-up.  The idea is this: grohtml attempted to solve the HTML
    translation problem by working _completely generally_ with any valid
    *roff input, which is hopelessly loose and not block-structured.

    But who on Earth has a raw *roff document that they want to convert
    to HTML, except *roff developers themselves?

    So the nucleus of the idea is this: grohtml's core principle of
    reading context-supplying tags--"device tags" that get written to
    device-independent output as "x" extension commands--from the input
    is a sound one.  But I don't see why the formatter itself has to
    care about them.  I don't get why we have a "mini-troff state
    machine" inside troff.  I think that (A) only macro packages (or
    ambitious document authors) should produce these tags and (B) only
    the HTML output driver should interpret them.  What the output
    driver _should_ do is be absolutely fascist about requiring proper
    block structure.  Wanna make a paragraph?  Fine.  Write
    paragraph-begin _and_ paragraph-end tags around it.  Want to center
    a few lines?  Fine.  Write centering-begin and centering-end tags
    around them.  Wanna whine that the formatter itself already knows
    how much to center because you told it how many lines you wanted?
    Tough.  Go block-structured or go home, if you want HTML output.

    This way, instead of boiling the boundless ocean of all possible
    *roff input documents, all you have to do is add block-structured
    device tagging to a handful of macro packages: the ones we already
    ship, and which we need to be updating to support PDF bookmarks and
    similar anyway.  And which 99+% of all groff documents destined for
    HTML formatting will be written in in the first place.

    I don't think this idea is all that different from that in Mulley
    and Lemberg's paper,[3] but I have to admit, I find parts of that
    paper difficult to understand.

    I think we might have better luck if we tried to make groff's HTML
    story a little _less_ ambitious at the formatter level.  Nothing
    else in a typical groff pipeline gives a damn about block structure,
    and nothing else needs to.

    I have to say I also really hate the names currently in use for the
    device tags.  Recapitulating the inscrutable names of *roff requests
    is not helpful.  Say what you mean.  In English.

[3] https://www.gnu.org/software/groff/grohtml.pdf

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]