groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] Building a troff parser


From: Steffen Nurpmeso
Subject: Re: [Groff] Building a troff parser
Date: Fri, 27 Feb 2015 13:50:23 +0100
User-agent: s-nail v14.7.11-208-gc14db07-dirty

Hallo Ingo,

Ingo Schwarze <address@hidden> wrote:
 |> Eric Andrew Lewis wrote on Thu, 26 Feb 2015 07:49:18 -0500:
 |>> I'm interested in building a troff parser to extract information
 |>> from manpages (e.g. what do the flags mean when we say `rm -rf *`?).

 |Steffen Nurpmeso wrote on Fri, 27 Feb 2015 11:31:36 +0100:
 |> For the mdocmx(7) project i have written a simple mdoc(7)
 |> parser in awk(1), the entire thing 18966 bytes [...]
 |
 |That's clearly bad advice.  There are still missing parts
 |in mandoc, but the mdoc(7) parser is among the parts that are
 |most stable and best understood.  Rewriting *that* over and over
 |again is not going to solve a problem.  Besides, an mdoc(7)
 |parser written in awk(1) already exists, written in 1991
 |by Henry Spencer:
 |
 |  http://manpages.bsd.lv/history/spencer_22_10_2011.txt
 |  http://manpages.bsd.lv/history.html#x1991_awf

Of course mdocmx.sh is not a formatter but only a parser.
Surely mdocml(1) may serve him better regarding completeness,
robustness against sick constructs etc., but, as i said,
"Dependent on what you want it may be a starting point".
Since the parser core handles continuation lines, quoting etc. it
may really be that --- since imho his desire cannot be fulfilled
for any other language than mdoc(7) anyway, even with immense
amounts of lookahead and lookbehind hacks, which possibly could
result in false interpretations (though for example

 (7) (a reference extension for the
 .I mdoc
 (7) semantic markup language used for manual pages).
 .IP "r or ^R or ^L"
  Repaint the screen.

may be relatively easy to get right).
So he will reach that dead-end much sooner when he uses a parser
that can be adjusted in a few minutes than when he first has to
learn using an complete C library.
People should do what i have done at the end of December 2012 and
convert their manual from man(7) to mdoc(7).  It was a frustrating
week, but most projects don't have such monsters laying around:

  nail.1   | 6154 +[..]
   4 files changed, 2820 insertions(+), 3350 deletions(-)

For example and for testing purposes i have converted the NetBSD
manual page bus_dma.9 to mdocmx(7) and it wasn't that hard to do.
Any maybe at some later time there will be good PDF output
possible, too, with table of contents, document-internal
cross-references etc.
Which i think would be crucial for better acceptance -- and that
is the sole road to better man(1)uals, all in all, and imho.

--steffen



reply via email to

[Prev in Thread] Current Thread [Next in Thread]