groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] Status of the portability work, and plans for the future


From: Eric S. Raymond
Subject: Re: [Groff] Status of the portability work, and plans for the future
Date: Mon, 8 Jan 2007 05:22:00 -0500
User-agent: Mutt/1.4.2.2i

Werner LEMBERG <address@hidden>:
> > Werner Lemberg wanted to know the status of \~.  I found 17 uses
> > within the groff documentation and 4 outside it.  Of those 4, two
> > were errors.  So it's not much needed for manual pages, which is a
> > good thing as it is not portable.  In particular, I was unable to
> > discover any corresponding ISO entity or Unicode character.
> 
> Both `\<SP>' and `\~' (and \0) are equivalent to &nbsp;

Good. That's the behavior you're already getting from doclifter conversion,
so I guess we can close this issue.
 
> > I think we can declare Latin-1 and the intersection of groff glyphs
> > with HTML entities portable as well, [...]
> 
> I think this is something beyond us. Restricting man pages to latin-1
> encoding is bad.

Right.  Gunnar had already mostly persuaded me I was mistaken about
this; I was waiting on (and expecting) your confirmation that I had
misstepped.  You guys understand i18n much better than me, so I will
try to do as you and he direct.

>                   Instead, I suggest the route which is outlined in
> preconv.man (part of the CVS).
> 
>   1. If the input encoding has been explicitly specified, use it.
> 
>   2. Otherwise, check whether the input starts with a Byte Order Mark.
>      If found, use it.
> 
>   3. Finally, check whether there is a known coding tag in either the
>      first or second input line.  If found, use it.
> 
>   4. If everything fails, use a default encoding as given by the
>      current locale, or `latin1' if the locale is set to `C', `POSIX',
>      or empty.

I'm willing to try to implement this protocol for doclifter, but it
doesn't settle what the portability rule ought to be, which is our
concern right at the moment.  What encoding(s) are we willing to count
on third-party viewers to support?  

Gunnar seems to think UTF-8 is the right direction.  I could go with
that; doclifter happens to be written in Python, which has good UTF-8
support so implementing the right things shouldn't be too hard.

> Instead of using the groff's `uXXXX' glyphs, doclifter would directly
> map to HTML entities.

There may be a misunderstanding here -- doclifter never generates HTML
entities.  Instead it generates ISO XML entities.  These sets do 
overlap, but they are neither formally nor actually identical. The
HTML set is much smaller.

In fact, *all* defined groff-1.19 glyphs except the old Bell Labs
bracket-pile graphics get mapped to ISO entities -- even the exotica
like yogh and o-with-ogonek.  It took a lot of work building
translation tables, but I have nailed this part of the problem down
solid.

> > 1) Trim the groff manual pages so they use only the portable subset,
> > plus the .SY and .OP macros that Werner and I have characterized.
> 
> While I fully support .SY and .OP I wonder whether we need another
> macro to better separate content from formatting issues.  Gunnar, any
> suggestions here?

I would also welcome any such suggestions.  Especially from Gunnar,
but from anyone else as well.

> > Yes, I know, Bernd Warken is in love with the hyperextended macros
> > on groffer.1 and elsewhere, and will go ballistic.  Too bad for him;
> > we've established that they break too much software to live.
> 
> Well, I won't change groffer.man -- this is his contribution.

Uh oh.  You just invoked my hacker-anthropologist mode...I've seen
this kind of talk before and the results tend to be *bad*.

It's possible that "no change" is the right answer, but because it's
"his contribution" is not a sufficient reason.  As the project lead,
you have the responsibility to make a decision on factual and
technical grounds.  If you then fail to carry through that decision
merely to avoid upsetting someone, you will be failing your
responsibility, your other developers, and eventually your users.

And note that I am not saying you should only carry through your
decision if it goes the way I want.  If you conclude that simplifying
the groff-page macros is the wrong thing to do on technical and
factual grounds, you should act consistently in accord with that
decision and tell *me* to get stuffed.

It is not required that either Bernd or I *like* your decision, only
that we live with it -- unless we're willing to fork the project and
lead the forks ourselves.

On any project (with rare exceptions that don't work very well) there
is someone who has to make these decisions even when they are
uncomfortable and someone is likely to throw a fit.  On this project
it's you.  Sorry to have to rough you up a bit about this, but you're
talking about shirking that duty.  *Don't.* Evading it never works out
well.

> It seems that grohtml does a quite decent job for this man page: What
> about putting it into an exception list (even if it is the only
> member) so that it is converted with `groff -Thtml' instead of
> doclifter?

Werner, in situations like this, exception lists frighten the shit out of me.

The problem is that once it is known that you have one, people invent
all sorts of clever, plausible reasons they should be on it rather
than doing the bit of extra work needed for a clean solution.  The
complexity overhead of managing the exceptions goes up at least as the
square of the number of exceptions.  In an amazingly short time, you
end up head-down in a swamp as nasty and fetid as the one you
originally set out to drain.

Does it sound like I'm speaking from bitter experience?  Yes.  Yes, in
fact, I am.  *shudder* Let's not go there...there may have to be an
exception list someday, but we should fight to avoid starting one as
long as possible.  Nothing in the present corpus makes one necessary.

> BTW, some man pages documenting groff itself will never be conformant.
> It would be completely ridiculous to modify, say, groff_char.man so
> that groff specific extensions would be avoided.  We need an Orwellian
> approach here: All man pages are equal, but some are more equal than
> others. :-)

I agree with your point here, but let's be careful not to muddle
separate issues together -- the undoubted fact that groff_char.man
cannot be portable is no reason to refrain from cleaning up pages that
*can* be portable, like groffer.1.

And even for pages that can't be strictly viewer-portable, simplifying
them to the point where doclifter can lift them will have benefits.

It's interesting that you picked groff_char.man as an example, because I
can tell you this: there is no reason in the universe we should
be unable to generate good XML-DocBook from that page.  I've already
done the hard part by embedding all the right glyph-to-entity mappings
in doclifter.
 
> > 2) Patches for .SY/.OP/.EX/.EE/.DS/.DE support should be developed
> > for the KDE help browser and shipped as soon as possible.
> 
> What I consider even more important is that all man pagers (which
> don't use groff internally) emit a warning if they can't display the
> man page correctly.

Fair point. I'll add this to the work plan as a long-term item
I'm not ready to schedule yet.

>                  Ideally, they should use groff for formatting
> (opening a TTY window showing `man' output would be sufficient IMHO)
> if the number of problems exceeds a certain threshold.

And that's an excellent idea for a general fallback. 

> > 2) When, in the portable-subset description, can we say that
> > .EX/.EE, .SY/.OP, and .DS/.DE should be considered portable and no
> > longer need local definitions?
> 
> I really don't know.  Just remember that Debian (and thus probably
> Ubuntu as well) still uses the groff 1.18 series, for example.

Yes.  Actually, I suspected before you brought it up that Debian stable 
is probably the langest release cycle we'll have to cope with.
-- 
                <a href="http://www.catb.org/~esr/";>Eric S. Raymond</a>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]