groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [groff] groff as the basis for comprehensive documentation?


From: John Gardner
Subject: Re: [groff] groff as the basis for comprehensive documentation?
Date: Mon, 23 Apr 2018 13:17:16 +1000

It's taken me several hours to write this, so it damn well better be
enlightening.

> What i meant by the above sentence is that the CSS you gave,
> with the exception of the dfn{} and kbd{} selectors, selects
> nothing based on kind or class of content or semantic function.
>
> consequently, the EXAMPLES section is indeed rendered in a
> misleading way on your example page

Actually, that stylesheet is unfinished. It's because of the browser's
default styling that most of the page's content is convincingly styled. As
for the examples section, yeah, it was jarring enough I pushed a fix just
for that. Note the selector I used in the ruleset
<https://github.com/Alhadis/Stylesheets/blob/master/complete/manpage/manpage.css#L68-L70>:
‘samp kbd’ applies its properties to any <kbd> element nested inside a
<samp>, which as the spec states
<https://www.w3.org/TR/html50/text-level-semantics.html#the-kbd-element>
represents
the part of a user's input as echoed back by the system (for a command
prompt, that means the entire line). Contextually applying or removing
properties is common in CSS.

> So, with your stylesheet, almost all semantic information from all
> macros seems completely unhandled to me - or what am i missing?

I don't understand what you mean when you say "semantic information". The
stylesheet is building off the default browser styling, which doesn't need
to be repeated. It's senseless to write this, for example:

b { font-weight: bold; }
i { font-style: italic; }

u { text-decoration: underline; }


... because <b> tags are already bold by default, <i> tags are italic by
default, etc. The same is true for this:

kbd { font-weight: bold; }



Since a browser's default styling varies between vendors, it's common
practice to use something like Normalize.css
<http://necolas.github.io/normalize.css/> to smooth out variances. I
use a modified
version of it
<https://github.com/Alhadis/Stylesheets/blob/master/reset/normalize.css>
which the HTML page I wrote uses.

> They are not redundant. Their purpose is to display the semantic
> function of the word in a tooltip when hovering the mouse over it.

What the hell. Why? Is every reader expected to understand what Ar means if
the cursor rolls over “*options”* or what-have-you? I'm puzzled, and can't
see any *practical* reason for this. Compare that with the tooltips of the
links you mentioned, which show a short summary for *unrelated* content:

The HTMLElement interface represents any HTML element.
Some elements directly implement this interface, others
implement it via an interface that inherits it.


> The title element would be read *in addition* to the contents of
> the element, rather than instead of the contentis, right?

No. In place of. Moreover, this depends on how a screen-reader is
configured to handle title attributes.

> That is *not* at all what i want.  It is vital that any browser,
> even those i never heard of, is able to show this information to
> the user.
>
> It is neither "non-visible" nor "private" data, but an important
> part of the output, to be shown directly to the user.

Again, you'll have to explain why tag-names in a tooltip are at all
relevant to the authored material. If information is important for the
reader, it should obviously be part of the content itself. Whether
something is an argument, a flag, or a command modifier should be quite
evident to a reader by context.

> I want the output to be comprehensible even without any stylesheet.

I'd say the following is comprehensible without *anything*:

[-ac] [-I os=name] [-K encoding] [-mdoc | -man]


> I could not find any HTML element that is adequate for fixed
> syntax elements and clearly distinguishable from example code.
> Can you point me to one?

What do you mean by "fixed syntax"? Monospaced text? There are several:

   - <kbd> <https://developer.mozilla.org/en-US/docs/Web/HTML/Element/kbd> is
   used to represent user input verbatim. Historically, this was only used to
   represent keys on a keyboard (hence the name). Its meaning was broadened in
   HTML5 to refer to any form of user input: menus, command lines, spoken
   commands, et al.
   - <samp> <https://developer.mozilla.org/en-US/docs/Web/HTML/Element/samp>
   is used for program output, such as a command transcript
   
<https://developer.mozilla.org/en-US/docs/Web/HTML/Element/samp#Sample_output_including_user_input>
   .
   - <code> <https://developer.mozilla.org/en-US/docs/Web/HTML/Element/code> is
   used for anything else, and is typically what you'll use for inline
   monospace text. For code blocks, it should be used inside <pre>
   <https://developer.mozilla.org/en-US/docs/Web/HTML/Element/pre>:
e.g., <pre><code>
   … </code></pre>. Note that <pre> alone doesn't mean source code: it
   could be used for chat logs and ASCII art, for example. Furthermore, it
   doesn't imply a fixed-pitch typeface will *always* be used: simply that
   whitespace must be preserved by the user-agent verbatim.


> I dimly remember that i considered that, but decided against it
> because the default rendering is indistinguishable from <var>

So use your stylesheet. Recall that defaults will differ between browsers,
and are also subject to user preference.

> Ar *that* stage, all that matters is presentation, and when
> presentational needs and language purity conflict, at *that*
> stage, language purity must be sacrificed to achieve the best
> possible visual result.

There are two languages. CSS is for presentation, HTML is for expressing
document content and structure ("structure" being used in the sense of an
element hierarchy, not a visual one). Keeping them separate is less about
"purity" than keeping both lean and flexible.

> I say, garbage in, garbage out.  The output is correct, by the
> way, and renders correctly.  The empty elements are rendered
> faithfully and have no effect.

Is this why you listed these pages as examples of mandoc's output quality?
To assert how accurately it renders crap? Or did you not check the links
before giving them to me?

> So i decided to use classes for *all* elements that carry semantic
significance

That's going way too far. Stuff like code.Fl and code.Cm makes sense. These
do not: h1.Sh, h2.Ss, div.Pp.

> But it is also intended as a starting point for people
> who want to customize their rendering, so it provides a
> complete listing of the classes that mandoc emits

Consider limiting classes to HTML elements which lack an obvious mdoc
counterpart. It's also best to group them in a shared ancestor and use
descendant selectors to target elements within. For example,

dl.Bl-inset { }
dt.It-inset { }
dd.It-inset { }

is more cleanly expressed as:

dl.inset    { }

dl.inset dt { }
dl.inset dd { }


Many mdoc elements have sensible overlap with HTML, and benefit not having
classes to distinguish them. In these cases, it's better to style tags
specifically:

.Lk => a[href]

.Mt => a[href^="mailto:";]
.Sh => h1
.Ss => h2

.Pp => p

.Em => i

.Sy => s

.Bl => ol, ul, dl


Note that I've recommended <i> as a replacement for .Em: the <em> tag is
visually similar, but also affects how text is spoken aloud by a
screen-reader. I understand mdoc's .Em tag is charged with no such meaning.

Also, consider enclosing each section and subsection in an element with an
ID (the way I've done). If an author wants to style .Nm differently in the
preamble, for example, they can use #name dfn .

Nitpick: consider renaming .selflink to .permalink instead. The former is
quite vague.


On 23 April 2018 at 06:38, Ingo Schwarze <address@hidden> wrote:

> Hi John,
>
> John Gardner wrote on Sat, Apr 21, 2018 at 04:48:33PM +1000:
>
> > Ingo, I've spent the last 13 years in front-end web development,
> > and I've been writing standards-compliant websites for almost
> > a decade.
>
> Sounds like you might have valueable input that could end up
> improving the mandoc -Thtml output.  Note that i did *not*
> claim that i specialize in anything related to HTML/CSS, which
> i actually do not.  Quite to the contrary, whenever i had
> questions related to HTML/CSS, i had a hard time finding any
> developer who knew much about it.  So i might finally get some
> real help, looking forward to that...
>
> >> I see absolutely nothing semantic in there, it looks like a
> >> purely presentational style sheet to me.
>
> > ... yes, that's the entire reason CSS exists: to separate
> > presentation from content (the latter being tantamount with
> > "semantics" as understood by web authors and those of us who
> > actively follow modern web standards).
>
> Wait - the point of CSS is to select adequate presentation
> for content of a given kind or class, right?  So the CSS, on its
> input side, first needs to be told, by HTML elements and attributes,
> what kind or class of content some text belongs to, and then has
> to select the presentational attributes using selectors addressing
> these kinds and classes of HTML elements, right?
>
> What i meant by the above sentence is that the CSS you gave,
>
> https://rawgit.com/Alhadis/Stylesheets/master/complete/manpage/manpage.css
>
> with the exception of the dfn{} and kbd{} selectors, selects nothing
> based on kind or class of content or semantic function.  What you
> do with def{} handles one single macro, and kbd{} is used for very
> different kinds of content that need completely different formatting,
> namely fixed syntax elements like command line options (.Fl) and
> fixed option arguments (.Cm) on the one hand but also code examples
> (.Dl) on the other hand - consequently, the EXAMPLES section is
> indeed rendered in a misleading way on your example page, compare to
>
>   https://man.openbsd.org/mandoc.1#EXAMPLES
>
> So, with your stylesheet, almost all semantic information from all
> macros seems completely unhandled to me - or what am i missing?
>
> > Redundant title attributes on everything.
>
> They are not redundant.  Their purpose is to display the semantic
> function of the word in a tooltip when hovering the mouse over it.
> I'll gladly do that in a better way if i find one or someone directs
> me into the right direction, but when i read the CSS standard, i
> failed to find any other way.
>
> As a matter of fact, the very document you quote uses title
> attributes for exactly the same purpose:
>
> https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/data-*
>
> Look at the source code for the elements that show tool tips,
> like "HTMLElement" or "DOMStringMap".
>
> > Actually, worse than redundant:
> >    it screws with assistive technologies like screen-readers, which
> >    might read the contents of a tag to the user using the title
> >    attribute if one is present.
>
> The title element would be read *in addition* to the contents of
> the element, rather than instead of the contentis, right?  That
> would actually be useful, because not being able get a visual
> impression of the page as a whole, hearing the page word by word,
> it is even harder to correctly guess the semantic function of key
> words because you at first lack the necessary context of these
> words, which a visual impression of the page can provide without
> reading anything.  Besides, listening to a screen reader, you
> wouldn't hear what is bold or italic, so getting the meta-information
> across in some different, verbal way seems useful to me.
>
> I admit, though, that i relatively rarely work with people who use
> screen readers, only every few months maybe, and never asked them
> to test the mandoc output.
>
> >    If you want to attach page or application-specific metadata to
> >    elements, use data-*
> >    <https://developer.mozilla.org/en-US/docs/Web/HTML/
> Global_attributes/data-*>
> >    instead.
>
> Looking at the link given there,
>
> https://html.spec.whatwg.org/multipage/dom.html#embedding-
> custom-non-visible-data-with-the-data-*-attributes
>
> i read:
>
>   3.2.6.6 Embedding custom non-visible data with the data-* attributes
>   [...]
>   These attributes are not intended for use by software that is not
>   known to the administrators of the site that uses the attributes.
>
> That is *not* at all what i want.  It is vital that any browser,
> even those i never heard of, is able to show this information to
> the user.
>
> It is neither "non-visible" nor "private" data, but an important
> part of the output, to be shown directly to the user.
>
> So i don't quite understand why you suggest data-*.
> No user would never see those attributes, right?
>
> > *Presentational tags used instead of those conveying text-level
> > semantics: *You're
> >    literally doing what mdoc(7) tells you not to do, except in HTML form:
> >       - -b, -S, -o:
> >       Flags/options should be represented using kbd tags, as they
> describe user
> >       input <https://developer.mozilla.org/en-US/docs/Web/HTML/
> Element/kbd>.
>
> Not doing that is a deliberate compromise.
>
> I want the output to be comprehensible even without any stylesheet.
>
> For that reason, i cannot use <kbd> for fixed syntax elements
> because without a style sheet, that would make them indistinguishable
> from example user input, which is a very important distinction in
> manual pages, clearly visible even in terminal output.
>
> I could not find any HTML element that is adequate for fixed
> syntax elements and clearly distinguishable from example code.
> Can you point me to one?
>
> Besides, i never said that the final output rendering of a document
> must be perfect code in the final target language.  As a matter of
> fact, that is almost never possible, just like you cannot preserve
> all the subleties and beauty of a poem when translating it into a
> different language.  The target language always provides more
> potential for making distinctions in some areas than the source
> language (causing clumsy final output not using the full potential
> of the target language because the source language lacks information)
> and it always provides less potential for making distinctions in
> other areas, causing information to be lost or represented in
> non-standard ways.
>
> All i'm saying is that the *source* document must be rich in semantic
> markup.  Most of that is inevitably lost when rendering into any
> target format, even if the target language is semantically rich
> itself, like HTML.  There is no problem with that, because you must
> never use the transformed, inevitably degraded document as the
> starting point of another transformation, but you should only display
> it as it is.
>
> Being forced to use <b> for .Fl and .Cm is a consequence of the
> fact that i could not find distict elements for syntax elements
> and examples in HTML.
>
> >       - =*option*:
> >       Parameters should use var tags to indicate a placeholder
> >       name for an expectant value
>
> Not sure what you are asking for here, for all i can tell, they
> *do* use <var> tags; i see var.Ar and var.Fa in mandoc.css,
> and running mandoc(1) shows to me that these <var> elements
> actually get emitted for .Ar macros.
>
> Can you show at which place exactly this goes wrong?
>
> >       - Use dfn to markup the defining subject's name.
> >         For mdoc, this means *Nm*
>
> I dimly remember that i considered that, but decided against it
> because the default rendering is indistinguishable from <var>,
> so you get the extrenmely confusing situation that the topic
> of the page looks as if needed to be replaced by something else.
>
> Similar compromise as for .Fl and .Cm:  Fall back to presentational
> formatting that also works without any style sheet.
>
> To summarize so far, i disagree with your explicit statement that
> the only reason why semantic markup matters in the *source*
> document is as a means to help getting to a good final formatting
> and visual result.  It has uses beyond that, some of which were
> mentioned in replies.  But i also disagree with your apparently
> implicit assumption that the markup in *target* language must adhere
> to language purity standards.  Ar *that* stage, all that matters
> is presentation, and when presentational needs and language purity
> conflict, at *that* stage, language purity must be sacrificed to
> achieve the best possible visual result.  Obviously, HTML language
> purity would be important if HTML were the source language.
>
> (Side note:  The situation is slightly different for -Tman
> and -Tmarkdown because the whole point of these output formats
> is that they *will* get translated again.  So in these two cases,
> language purity is paramount, and visual quality often has to be
> sacrificed because trying to be too smart would ruin portability,
> which is the whole point of *that* exercise.)
>
> But you are certainly not supposed to ever process the HTML output
> of mandoc again.  If you need it in a different format, restart
> from the original document, please.
>
>
> >    - *Inconsistent or incorrect use of sectioning elements*
> >    You linked to https://man.openbsd.org/gcc.1 as an example. CTRL+F and
> >    search for "Options Controlling the Kind of Output".
> >    I'd hotlink the section directly, but you neglected to use an ID
> >    attribute or even an anchor element with a name attribute.
>
> That's not my fault.  The original markup is:
>
> .br
> .ne 5
> .PP
> \fBOptions Controlling the Kind of Output\fP
> .PP
>
> How is mandoc(1) supposed to figure out that that is intended as a
> section header?  It looks like an ordinary, admittedly very short
> paragraph of text even to a human reader.
>
> >    Did you mean to use all those separate <dl> tags as an
> >    indication of quality output, or was
> >    that an oversight?
>
> You mean,
>
>   <dl class="Bl-tag">
>     <dt class="It-tag"><i>file</i><b>.cc</b></dt>
>     <dd class="It-tag"></dd>
>   </dl>
>
> ?
>
> Sorry, but that is in the original input file, too:
>
>   .IP "\fIfile\fR\fB.cc\fR" 4
>   .IX Item "file.cc"
>   .PD 0
>   .IP "\fIfile\fR\fB.cp\fR" 4
>   .IX Item "file.cp"
>   .IP "\fIfile\fR\fB.cxx\fR" 4
>   .IX Item "file.cxx"
>
> The document explicitly requests paragraphs with text in the
> head and empty bodies, and mandoc faithfully renders that.
> How could it guess that the author actually meant a single
> list entry with hard line breaks inside the head element?
>
> I came across man pages in practice where it was unclear which of
> the two was intended even to the human eye, though in the case at
> hand, a human reader *can* probably understand what is meant -
> but a program can hardly decide that.
>
> >    - *Pointless empty elements everywhere*
>
> You mean,
>
>   <i></i><i>source</i><i>.</i><i>suffix</i> <i></i>
>
> The input file is actually forcing that with the following
> nonsensical low-level roff(7) code:
>
>   \fI\fIsource\fI.\fIsuffix\fI\fR
>
> I say, garbage in, garbage out.  The output is correct, by the
> way, and renders correctly.  The empty elements are rendered
> faithfully and have no effect.
>
> Is your point that the parser should filter such nonsense out?  That
> doesn't seem like a particularly good idea to me.  Such nonsensical
> input is rare in the first place, so filtering would provide little
> benefit, but trying to detect and remove it adds additional code
> with potential for additional bugs, and the possibilities for insane
> input are limitless, so you can't possibly filter all insanity out,
> even if you tried to do so with substantial amounts of additional
> code.
>
> By the way, the reason for the insanity is that the gcc.1
> man(7) code is autogenerated from perlpod(1) code which is
> in turn autogenerated from texinfo(5) code, and the POD
> already contains weird stuff like "F<I<source>.I<suffix>>"
> which pod2man(1) handles poorly.
>
> You can't really expect to win a beauty contest by putting make up
> on a pig that has been passed through a meat grinder, glued back
> together, and passed through another meat grinder.  ;)
>
> >    - *Class attributes assigned to elements which should be
> >      styled using SIMPLE stylesheet declarations*
>
> Can you be more specific?  I considered each element very carefully,
> and for many of them, i could not find good matches in the HTML
> standard.  So i decided to use classes for *all* elements that carry
> semantic significance, both in those few cases where HTML provides
> adequate standard elements like <var> and <code> and in the larger
> number of cases where good matches are not available and i had to
> fall back to presentational elements or <span>.
>
> > Also, half of your "semantic stylesheet <https://man.openbsd.org/
> mandoc.1>"
> > is redundant and repeating default properties. Many values aren't
> actually
> > doing anything,
>
> There are three reasons for that:
>
>  1. When elements nest, seemingly redundant attributes that
>     match the defaults without nesting can suddenly become
>     relevant.
>  2. People can edit the stylesheet and add their own rules.
>     In that case, attributes that are redundant in the
>     unchanged CSS can suddenly become relevant.
>  3. Because 1 & 2 apply to a significant fraction of cases
>     and there is a risk to overlook cases when trying to
>     minimize the default CSS, i decided to list all attributes
>     that a given elements wants to set for clarity and
>     robustness, even if it can be shown that the default
>     would do for a specific case.
>
> Of course, without reference to a specific attribute of a specific
> rule, i can't say which reason it is, or which combination of
> multiple reasons.
>
> > and several rulesets are empty altogether.
>
> Of course.  The default stylesheet serves a double purpose.
> It is short and small enough to be used per default out of the
> box.  But it is also intended as a starting point for people
> who want to customize their rendering, so it provides a
> complete listing of the classes that mandoc emits.
>
> > I can't go on.
> > I'm feeling queasy with fremdschaemen.
> > Seriously.
>
> I do have the impression that you might be able to provide useful
> feedback that could result in specific improvements, but from the
> above relatively unspecific comments, i so far can't deduce any
> specific plans regarding what to improved.
>
> All the same, thanks for looking at these matters,
>   Ingo
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]