groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [groff] groff as the basis for comprehensive documentation?


From: Ingo Schwarze
Subject: Re: [groff] groff as the basis for comprehensive documentation?
Date: Sun, 22 Apr 2018 22:38:42 +0200
User-agent: Mutt/1.8.0 (2017-02-23)

Hi John,

John Gardner wrote on Sat, Apr 21, 2018 at 04:48:33PM +1000:

> Ingo, I've spent the last 13 years in front-end web development,
> and I've been writing standards-compliant websites for almost
> a decade.

Sounds like you might have valueable input that could end up
improving the mandoc -Thtml output.  Note that i did *not*
claim that i specialize in anything related to HTML/CSS, which
i actually do not.  Quite to the contrary, whenever i had
questions related to HTML/CSS, i had a hard time finding any
developer who knew much about it.  So i might finally get some
real help, looking forward to that...

>> I see absolutely nothing semantic in there, it looks like a
>> purely presentational style sheet to me.

> ... yes, that's the entire reason CSS exists: to separate
> presentation from content (the latter being tantamount with
> "semantics" as understood by web authors and those of us who
> actively follow modern web standards).

Wait - the point of CSS is to select adequate presentation
for content of a given kind or class, right?  So the CSS, on its
input side, first needs to be told, by HTML elements and attributes,
what kind or class of content some text belongs to, and then has
to select the presentational attributes using selectors addressing
these kinds and classes of HTML elements, right?

What i meant by the above sentence is that the CSS you gave,

https://rawgit.com/Alhadis/Stylesheets/master/complete/manpage/manpage.css

with the exception of the dfn{} and kbd{} selectors, selects nothing
based on kind or class of content or semantic function.  What you
do with def{} handles one single macro, and kbd{} is used for very
different kinds of content that need completely different formatting,
namely fixed syntax elements like command line options (.Fl) and
fixed option arguments (.Cm) on the one hand but also code examples
(.Dl) on the other hand - consequently, the EXAMPLES section is
indeed rendered in a misleading way on your example page, compare to

  https://man.openbsd.org/mandoc.1#EXAMPLES

So, with your stylesheet, almost all semantic information from all
macros seems completely unhandled to me - or what am i missing?

> Redundant title attributes on everything.

They are not redundant.  Their purpose is to display the semantic
function of the word in a tooltip when hovering the mouse over it.
I'll gladly do that in a better way if i find one or someone directs
me into the right direction, but when i read the CSS standard, i
failed to find any other way.

As a matter of fact, the very document you quote uses title
attributes for exactly the same purpose:

https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/data-*

Look at the source code for the elements that show tool tips,
like "HTMLElement" or "DOMStringMap".

> Actually, worse than redundant:
>    it screws with assistive technologies like screen-readers, which
>    might read the contents of a tag to the user using the title
>    attribute if one is present.

The title element would be read *in addition* to the contents of
the element, rather than instead of the contentis, right?  That
would actually be useful, because not being able get a visual
impression of the page as a whole, hearing the page word by word,
it is even harder to correctly guess the semantic function of key
words because you at first lack the necessary context of these
words, which a visual impression of the page can provide without
reading anything.  Besides, listening to a screen reader, you
wouldn't hear what is bold or italic, so getting the meta-information
across in some different, verbal way seems useful to me.

I admit, though, that i relatively rarely work with people who use
screen readers, only every few months maybe, and never asked them
to test the mandoc output.

>    If you want to attach page or application-specific metadata to
>    elements, use data-*
>    
> <https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/data-*>
>    instead.

Looking at the link given there,

https://html.spec.whatwg.org/multipage/dom.html#embedding-custom-non-visible-data-with-the-data-*-attributes

i read:

  3.2.6.6 Embedding custom non-visible data with the data-* attributes
  [...]
  These attributes are not intended for use by software that is not
  known to the administrators of the site that uses the attributes.

That is *not* at all what i want.  It is vital that any browser,
even those i never heard of, is able to show this information to
the user.

It is neither "non-visible" nor "private" data, but an important
part of the output, to be shown directly to the user.

So i don't quite understand why you suggest data-*.
No user would never see those attributes, right?

> *Presentational tags used instead of those conveying text-level
> semantics: *You're
>    literally doing what mdoc(7) tells you not to do, except in HTML form:
>       - -b, -S, -o:
>       Flags/options should be represented using kbd tags, as they describe 
> user
>       input <https://developer.mozilla.org/en-US/docs/Web/HTML/Element/kbd>.

Not doing that is a deliberate compromise.

I want the output to be comprehensible even without any stylesheet.

For that reason, i cannot use <kbd> for fixed syntax elements 
because without a style sheet, that would make them indistinguishable
from example user input, which is a very important distinction in
manual pages, clearly visible even in terminal output.

I could not find any HTML element that is adequate for fixed
syntax elements and clearly distinguishable from example code.
Can you point me to one?

Besides, i never said that the final output rendering of a document
must be perfect code in the final target language.  As a matter of
fact, that is almost never possible, just like you cannot preserve
all the subleties and beauty of a poem when translating it into a
different language.  The target language always provides more
potential for making distinctions in some areas than the source
language (causing clumsy final output not using the full potential
of the target language because the source language lacks information)
and it always provides less potential for making distinctions in
other areas, causing information to be lost or represented in
non-standard ways.

All i'm saying is that the *source* document must be rich in semantic
markup.  Most of that is inevitably lost when rendering into any
target format, even if the target language is semantically rich
itself, like HTML.  There is no problem with that, because you must
never use the transformed, inevitably degraded document as the
starting point of another transformation, but you should only display
it as it is.

Being forced to use <b> for .Fl and .Cm is a consequence of the
fact that i could not find distict elements for syntax elements
and examples in HTML.

>       - =*option*:
>       Parameters should use var tags to indicate a placeholder
>       name for an expectant value

Not sure what you are asking for here, for all i can tell, they
*do* use <var> tags; i see var.Ar and var.Fa in mandoc.css,
and running mandoc(1) shows to me that these <var> elements
actually get emitted for .Ar macros.

Can you show at which place exactly this goes wrong?

>       - Use dfn to markup the defining subject's name.
>         For mdoc, this means *Nm*

I dimly remember that i considered that, but decided against it
because the default rendering is indistinguishable from <var>,
so you get the extrenmely confusing situation that the topic
of the page looks as if needed to be replaced by something else.

Similar compromise as for .Fl and .Cm:  Fall back to presentational
formatting that also works without any style sheet.

To summarize so far, i disagree with your explicit statement that
the only reason why semantic markup matters in the *source*
document is as a means to help getting to a good final formatting
and visual result.  It has uses beyond that, some of which were
mentioned in replies.  But i also disagree with your apparently
implicit assumption that the markup in *target* language must adhere
to language purity standards.  Ar *that* stage, all that matters
is presentation, and when presentational needs and language purity
conflict, at *that* stage, language purity must be sacrificed to
achieve the best possible visual result.  Obviously, HTML language
purity would be important if HTML were the source language.

(Side note:  The situation is slightly different for -Tman
and -Tmarkdown because the whole point of these output formats
is that they *will* get translated again.  So in these two cases,
language purity is paramount, and visual quality often has to be
sacrificed because trying to be too smart would ruin portability,
which is the whole point of *that* exercise.)

But you are certainly not supposed to ever process the HTML output
of mandoc again.  If you need it in a different format, restart
from the original document, please.


>    - *Inconsistent or incorrect use of sectioning elements*
>    You linked to https://man.openbsd.org/gcc.1 as an example. CTRL+F and
>    search for "Options Controlling the Kind of Output".
>    I'd hotlink the section directly, but you neglected to use an ID
>    attribute or even an anchor element with a name attribute.

That's not my fault.  The original markup is:

.br
.ne 5
.PP
\fBOptions Controlling the Kind of Output\fP
.PP

How is mandoc(1) supposed to figure out that that is intended as a
section header?  It looks like an ordinary, admittedly very short
paragraph of text even to a human reader.

>    Did you mean to use all those separate <dl> tags as an
>    indication of quality output, or was
>    that an oversight?

You mean,

  <dl class="Bl-tag">
    <dt class="It-tag"><i>file</i><b>.cc</b></dt>
    <dd class="It-tag"></dd>
  </dl>

?

Sorry, but that is in the original input file, too:

  .IP "\fIfile\fR\fB.cc\fR" 4
  .IX Item "file.cc"
  .PD 0
  .IP "\fIfile\fR\fB.cp\fR" 4
  .IX Item "file.cp"
  .IP "\fIfile\fR\fB.cxx\fR" 4
  .IX Item "file.cxx"

The document explicitly requests paragraphs with text in the
head and empty bodies, and mandoc faithfully renders that.
How could it guess that the author actually meant a single
list entry with hard line breaks inside the head element?

I came across man pages in practice where it was unclear which of
the two was intended even to the human eye, though in the case at
hand, a human reader *can* probably understand what is meant -
but a program can hardly decide that.

>    - *Pointless empty elements everywhere*

You mean,

  <i></i><i>source</i><i>.</i><i>suffix</i> <i></i>

The input file is actually forcing that with the following
nonsensical low-level roff(7) code:

  \fI\fIsource\fI.\fIsuffix\fI\fR

I say, garbage in, garbage out.  The output is correct, by the
way, and renders correctly.  The empty elements are rendered
faithfully and have no effect.

Is your point that the parser should filter such nonsense out?  That
doesn't seem like a particularly good idea to me.  Such nonsensical
input is rare in the first place, so filtering would provide little
benefit, but trying to detect and remove it adds additional code
with potential for additional bugs, and the possibilities for insane
input are limitless, so you can't possibly filter all insanity out,
even if you tried to do so with substantial amounts of additional
code.

By the way, the reason for the insanity is that the gcc.1
man(7) code is autogenerated from perlpod(1) code which is
in turn autogenerated from texinfo(5) code, and the POD
already contains weird stuff like "F<I<source>.I<suffix>>"
which pod2man(1) handles poorly.

You can't really expect to win a beauty contest by putting make up
on a pig that has been passed through a meat grinder, glued back
together, and passed through another meat grinder.  ;)

>    - *Class attributes assigned to elements which should be
>      styled using SIMPLE stylesheet declarations*

Can you be more specific?  I considered each element very carefully,
and for many of them, i could not find good matches in the HTML
standard.  So i decided to use classes for *all* elements that carry
semantic significance, both in those few cases where HTML provides
adequate standard elements like <var> and <code> and in the larger
number of cases where good matches are not available and i had to
fall back to presentational elements or <span>.

> Also, half of your "semantic stylesheet <https://man.openbsd.org/mandoc.1>"
> is redundant and repeating default properties. Many values aren't actually
> doing anything,

There are three reasons for that:

 1. When elements nest, seemingly redundant attributes that
    match the defaults without nesting can suddenly become
    relevant.
 2. People can edit the stylesheet and add their own rules.
    In that case, attributes that are redundant in the
    unchanged CSS can suddenly become relevant.
 3. Because 1 & 2 apply to a significant fraction of cases
    and there is a risk to overlook cases when trying to
    minimize the default CSS, i decided to list all attributes
    that a given elements wants to set for clarity and
    robustness, even if it can be shown that the default
    would do for a specific case.

Of course, without reference to a specific attribute of a specific
rule, i can't say which reason it is, or which combination of
multiple reasons.

> and several rulesets are empty altogether.

Of course.  The default stylesheet serves a double purpose.
It is short and small enough to be used per default out of the
box.  But it is also intended as a starting point for people
who want to customize their rendering, so it provides a
complete listing of the classes that mandoc emits.

> I can't go on.
> I'm feeling queasy with fremdschaemen.
> Seriously.

I do have the impression that you might be able to provide useful
feedback that could result in specific improvements, but from the
above relatively unspecific comments, i so far can't deduce any
specific plans regarding what to improved.

All the same, thanks for looking at these matters,
  Ingo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]