groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Groff] ESR in manpages versus the WEB


From: mhobgood
Subject: [Groff] ESR in manpages versus the WEB
Date: Sat, 30 Dec 2006 14:26:43 -0600

Hi all,
It has been an overly long time since I have burdened the list with any opinions of a personal nature. Current discussion about simplifying groff documentation is the culprit responsible for this gross violation of silence.

Eric S. Raymond wrote on 22 December 2006:
I want to drastically simplify the markup used in several pieces of
groff documentation, eliminating a lot of the hairy custom macros they
presently use.

Technically this won't be hard; I could make the required changes in a few
hours.  But I hear you asking "Why fix what ain't broken?".

The immediate technical answer is "the macro hackery is getting in the
way of lossless translation to a Web-ready format".  The more extended
answer raises some philosophical issues about groff's place in the world.


So, we start out with two subjects. One, the lossless translation of man pages to some Web-ready format. This is a purely technical subject, akin to translating C language source code to assembler source. The other, groff's place in the world, is indeed a philosophical question. It may or may not have any legitimate bearing on the first subject.

It's the 21st century, all the documentation on my system ought to present as a hypertexted local Web through my browser.

Subject two. That is your personal preference. Myself, I'm quite happy to use other forms for documentation; forms that do not invoke my browser at all. Do not be mistaken into thinking I speak only of legacy documents. The mail program I am using at this moment comes with Mac OSX. The only documentation is has is via the Mac help system. A quick check shows that the help viewer doesn't invoke any browser at all. When I click on help, the OS launches the help viewer; a program that is standalone. This is true of the mail program, the Safari web browser and all other programs that come with this Mac.

But a "big bang" solution -- everybody rewriting their stuff in HTML or whatever -- can't be imposed, if for no other reason than that the coordination problem is too hard.

General. Coordination would indeed be hard. I suspect you will find however, that coordination is easy compared to defeating human nature.

That means that in order for us to get to hypertext Nirvana, there
have to be lossless (or near-to-lossless) translation paths from every
legacy format to HTML.

General.  Nirvana, from Webster's Third New International Dictionary...
2a -- a place or state of rest, harmony, or pleasure, (oblivion, paradise).
2b -- a goal hoped for but apparently unattainable, (dream).

The hardest format to webify in the Unix world is also the most
important one -- man pages. (By way of GNUish contrast, TeXinfo is
much easier.)  There are a large number of tools that attempt this
out there.  In general, they do a crappy job.

Subject one. Question: what constitutes a crappy job? Does information disappear? Does it not display asectically pleasing? The first would be a crappy job, the second not so. Another question: how do you define webify? Are you concerned with just displaying man pages in some web format, or are you attempting to create hyperlinks and such?

Five years ago I decided to solve this problem.  And I did.  I wrote a
program called 'doclifter' that takes man-page sources in one end and
emits XML-Docbook out the other.  XML-Docbook to HTML is, of course,
easy.

Subject one.  Commendable.  I congradulate you on this.

In the five years since I wrote doclifter, I've been using it to
do periodic audits of the man-page corpus, or at least as much of it
as is represented by a full-boat Red-Hat/Fedora-Core installation.  In
FC6 this is over 13,000 man pages.

The purpose of these audits is twofold:

(1) Improve doclifter's performance (its clean-translation rate is
now 96%).

(2) Feed fix patches back to man-page maintainers to clean up
broken markup (I've had nearly 300 patches accepted).

Subject one. Again commendable. 96% is good, very good. Helping authors to find broken markup is good. If your program did nothing else, this would justify its
existence.

The end goal is to be able to announce that transitioning away
from man pages to HTML is a *solved problem*.  When I get the
look-ma-no-hands rate below 1%, I figure we can declare victory
and go to the next phase.

Clue about the next phase: last year I got a change into the man(1)
sources is that tells it what to do when it finds an HTML source
where it's expecting a man page, e.g hand off to a browser. The
technical preconditions are nearly in place to kill off man pages
as a presentation format. Think about that :-)

Subject two. This is really your nirvana. Others may not agree with you. With some 13000 man pages in your corpus, I would suggest that many find man pages the way to go. Had they not, they would have put them into someother format.


After five years of effort, I am down to fewer than 4% translation
failures.  I'm to the point where pushing individual man-page
cleanups to individual projects is actually more efficient than
crocking doclifter to handle yet another weird edge case.

Subject one. It may be more efficient from your point, but do not forget that it is you who are attempting to do a translation. Man page authors are under no obligation to change the way they do things to support some other program. The real impetus is on your shoulders to make your program cope.

There remains a tiny hard core of 47 pages (0.3%) that can't be
fix-patched.  They remain unliftable.  Of these, 25 are from netpbm
and 7 (0.05%) are from groff.

Thus, groff is my second largest source of man pages that can't be lifted
to DocBook. The largest is netpbm, and I'm working with its maintainer
to fix that now.

So this is the answer to "why fix it?".  Because the groff pages
presently do elaborate, bizarre things that doclifter can't cope with.
In this they are *unique*.  I mean *unique*.  Everywhere else the
problem is almost entirely broken markup, not things people did
deliberately.

Subject one. Again, if it is a case of broken markup your program is helping. Feedback to authors allows them to fix the manpage. But, if a manpage does elaborate, bizarre things, it is your program that must cope. I will return to this in a moment.

I want to fix the groff documentation so that it's no longer in the
way of automatic lifting of *everything* to HTML.  (As a side benefit,
the markup in the groff documentation will become easier to maintain.)
The only downside might be a slight decrease in the visual quality of
the printed versions -- in particular, command synopses might no
longer look quite as pretty.

Subject one. Easier maintenance of groff documentation is a good benefit. The decrease in visual quality isn't.

Eric S. Raymond, on 30 December 2006, wrote in response to D.E.Evans:
D. E. Evans <address@hidden>:

P.S. I know that Eric is shooting for something more dynamic, but
does this have to be made anymore complicated than a better export
facility for groff (improvements or replace for grohtml that is
both standardized for HTML, and perhaps has an XSLT function for a
simplified XHTML or XML doc)?


grohtml produces crappy HTML because groff sees nothing but
presentation level.  I've explained this general problem at least
twice now.


It seems that if DocBook wants to do something with man pages, why don't they provide an import function?


Because DocBook isn't a program, it's a markup format.  In effect,
doclifter *is* its "import function" for man pages.  The DocBook
maintainers point to doclifter from their website.


Subject one. It is your program. It is you who are attempting to translate groff markup to html.

There's enough problems in the symantic web world that a dynamic
system along Eric's thinking seems premature or superfluous,


Dynamic?  What's dynamic about it?  Feed man markup in one end of a
pipeline, get high-quality HTML out the other.

Subject one. And, the crux of the problem. You are attempting to convert one format, man page markup, to another format, html.

Earlier, I have promised to return to subject one. Allow me to do so now.

Anytime it is desired to translate one format, one language into another, the translator must understand, in full, the source language. And make no mistake, you are attempting to translate a "typesetting assembly language" into a "hypertext markup language". Your program must be able to handle any valid construct in troff. If it cannot, it is broke, and it, your program, must be fixed. Finding invalid troff markup proves your program is capable of partial translation. Not being able to handle valid troff markup proves your program still needs work.

Really, all the other discussion you have made concerns subject two, the philosophical place of groff in this world. I stated earlier, that this may or may not have any legitimate bearing on subject one. Truthfully Eric, it does not. If your desire is to create a program that can convert valid troff markup to html, you must take that markup as it is. Man page markup is valid troff markup. To ask that man page markup be changed, to ask that others change what they want inorder to allow your program to function is plain conceited. On the other hand, to request changes to man page markup to ease maintenance, or to make it easier to write man pages, would be a legitimate request.

D.E. Evans asked about an improved grohtml, or even a replacement. Perhaps grohtml can be improved. Since its inception, it has come along quite nicely. Perhaps your program could replace it once your program is capable enough.

Of course, many emails have flown back and forth, not just the two I've extracted quotes from. Ideas of creating manpage templates, writing manuals explaining how to write man pages, good statistics on what markup is actually being used etc, all have occurred from your proposal. This, coupled with now knowing we have available a fairly good "man page checker" which your program does, may not get you nirvana, but it has been benefical to "man".

Cordially,
Michael D. Hobgood





reply via email to

[Prev in Thread] Current Thread [Next in Thread]