groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] Support 2-digit \sNN only in compatibility mode.


From: G. Branden Robinson
Subject: Re: [PATCH v2] Support 2-digit \sNN only in compatibility mode.
Date: Mon, 6 Apr 2020 04:11:02 +1000
User-agent: NeoMutt/20180716

At 2020-04-04T16:11:38+0200, Ingo Schwarze wrote:
> Hi Branden,
> 
> G. Branden Robinson wrote on Sat, Apr 04, 2020 at 05:29:38AM +1100:
> 
> > Second attempt.
> 
> I have no very strong opinion about this, so i don't strictly object
> to changing it, if people here really consider it valuable to have.
> Then again, I'm still not quite sure what exactly the point is.

To make the language more regular.

>  * We can discourage using \s12 and \s48 in the documentation
>    without changing the language syntax and semantics, if we want
>    to (that would probably make sense, as far as it is not already
>    done).

True.  We _could_ have a warning/deprecation cycle, but the general
tenor of feedback (and my own inclination) is to be more aggressive than
that.

>  * We can throw a warning about \s36 (Consider using \s[36] because
>    that's more readable.) and also about \s40 (Do you really want
>    to print the digit zero in point size 4?  If so, consider saying
>    \s[4]0 for clarity.) without changing any behaviour.

I looked into this.  The first is already implemented, just in
compatibility mode only.  To me, with my limited understanding of the
parser internals, the latter appears to be impossible without
token pushback.

>  * Either way, i would consider \s12 a legacy idiom, supported for
>    backward compatibility only.  What is the point of making a
>    feature more intuitive that is only supported for backward
>    compatibility in the first place?
> 
>  * While the new interpretation of \s12 is formally rigorous, it
>    makes extremely little sense for practical use.  Why would anybody
>    ever want to print anything in point size 1?  Or even in point
>    size 2 or 3?  That feels really useless.  Syntactic purity and
>    simplicity of a language is indeed a goal, but not for its own
>    sake, only in so far as it makes learning the language easier,
>    and that's not the case here because no new users need to learn
>    about \s12 anyway.

I don't have an argument against these points; I can only say that for
me, the learnability of the language, which already seems a bit baroque,
is made easier by ensuring that the syntax is regular.  It may come down
to you and I simply weighting factors differently.

>  * While i don't really see much benefit in making a legacy feature
>    nicer when there already are better modern features providing
>    the same functionality, there definitely is a cost: some
>    documents will become harder to process in a correct way,
>    and some people who value their finger memory more than code
>    clarity will be annoyed for little gain.  It's definitely
>    setting a trap for some experienced users.

That is indeed a hazard.  I'm not thrilled about it

>  * "Compatibility mode", in general, is much less helpful than it
>    usually seems.  Every time you think about it, you tend to only
>    think about that one feature at hand and all seems fine in that
>    mindset: if you want to old behaviour, switch the mode on, else
>    leave it off.
>    But software evolves gradually over the decades.  There isn't a
>    single point in time such that before is legacy and afterwards
>    is modern.  So the more you add to compatibility mode, the more
>    often people will have the problem that they need to process
>    documents that *require* compatibility mode in some respects,
>    but only work *without* compatibility mode in other respects,
>    because the documents were written at a time when some mildly
>    modern features had already been introduced and could already
>    be used freely (outside compatibility mode, that is), while at
>    that same time, some antique feature were still fully supported
>    that now begin to require compatibilty mode.  The more time goes
>    by, the more severly broken the whole fundamentally flawed concept
>    of compatibility mode will eventually become.  In that sense,
>    the concept of compatibilty mode is a time bomb, or more precisely
>    a smouldering fire that will slowly, but eventually develop into
>    a considerable blaze.  The less it is touched, the less risk is
>    added.

This is a legitimate gripe but given the choice between shunting the
legacy parse into compatibility mode and abandoning it entirely, the
decision seemed straightforward.

In my opinion, compatibility mode is best used for documents of
historical value for which exact preservation is itself a virtue.  As
the earliest troff documents reach fifty years in age, I feel this is a
legitimate concern.

What compatibility mode is not, is an escape hatch for people who hate
groff extensions on general principle, or hate only some of them.  For
that we'd need a whole new interface for cherry-picking them, as you
imply, and the result would invite a combinatorial explosion of
supported configurations.

I don't see compatibility mode as the smoldering fire that you do; it's
a closet.  We put things there that are oddments and quirks that mar or
obstruct clean design.

> The phrase "single-digit point-size escapes with two digits" sounds
> confusing and oxymoronic.  Maybe "point size escapes of the form
> \sNN", if this goes in?

I've changed this for my commit.  The ironic phrasing was deliberate, a
sort of wry commentary on my part.

It's now:

        Move special-case interpretation of the '\sN' form of point-size
        escapes when 'N' is 1, 2, or 3 to compatibility mode (groff -C)
        only, and throw error diagnostic with suggestion for remedy if
        encountered.

> Indeed, the groff warning categories are severely ill-designed.
[...]
> I have no idea how to deal with the current mess.  Just add it to
> whatever category comes closest, even if it doesn't really fit?

Yikes.

> In any case, *please* do not use shortcomings of the GNU troff
> specific message system as an argument for or against any syntactic
> or semantic choices regarding the roff(7) language at large.

I don't think I am.  The only issue here is the diagnostic level.

People _do_ have a way of silencing the message: fix the document.

Museum curators who hold exact preservation of historical documents as
their highest priority should not be offended by the diagnostic.

To paraphrase a Debian colleague, running a modern compiler against
historical C code "is like turning on the pain", even when all problems
related to changes in the language itself are discarded.

> >   (this behavior of AT&T troff dates to 1976 but apparently was not
> >   documented until 1992).  Why not throw the warning outside of
> >   compatibility mode too?  Because outside of compatibility mode we
> >   (now) have an unambiguous parse.
> 
> Right, but the meaning of the what was parsed makes absolutely no
> sense,

I think you overstate.  We're one stage in a pipeline.  People might
have perfectly legitimate reasons to render something in 1-point type.
For instance, they might use that as a form of in-band signaling to some
device backend, or even a PostScript or PDF mangler.

That said, I surmise right along with you that when people write \s12
they usually mean "give me the next text in 12-point type".

And in Lions-book-era C, when people wrote:

sp=-4;

They meant "decrement 'sp' by 4".

Would that the level of fastidious cleanup that C received by 1978 had
been applied to troff/nroff.  Ossanna's untimely passing doubtless
played a role here.

> so it would be clearly useful to warn about \s12 in the
> new world:  what it does is almost certainly not what the author
> intended, and with such a small point size, the misformatting
> will even be quite severe, likely producing unreadable text.

That's why this is a NEWS item.  When we get around to releasing release
candidates, perhaps we'll start to get a better picture of the level of
potential controversy here.

> This comment feels quite excessive to me.  Very long comments may
> occasionally make sense above functions that are very long and very
> complicated, but you clearly shouldn't comment a five-line if block
> in the middle of a relatively simple, straight-forward function
> with a 15-line comment, in particular if the block is neither
> all that important nor hard to understand.  Such a long comment
> makes the actual code hard to find, and makes the file hard to read
> because you have to scroll so much.
> 
> "Note: Very special case!" says nothing at all.

Again, editorializing on my part about the ridiculousness of the
grammar.  Superfluous, I admit.

> "If we have .. read another digit" and "We therefore support it
> only in compatibility mode" is totally obvious in the first place:
> 
>       i++;  /* increment the variable i by one */

Harsh, man.  :P

From the perspective of the grammar, a better analogue is:

int read_one_token() {
  tok.get();
  if (chain && of && conditions)
    tok.get();
}

It's just loony to have a syntax form that implies a one-char argument
which SOMETIMES reads two chars.  That's what got this whole thing
started.  When one violates the principle of least surprise--in this
case from the perspective of someone struggling to mentally model
groff's parser state in their head, which is precisely what people do
when they write documents to be interpreted by it--one should damn well
justify oneself.

To those who howl that parsing '\s12' as "typeset a 1-point '2'"
violates THEIR principle of least surprise, I agree!  It may have been a
bad idea to admit this syntax form into the language in the first place
given the unlikelihood of documents confining themselves strictly to
single-decimal-digit point sizes while also unavailing themselves of
the .ps request.

But dropping support for \sN (with no ±) would be even more disruptive
for about the same gain.  \s0 remains handy and we'd then have to
justify why we accept ONLY a zero there--if in fact we did.

I could be talked into it.  It'd make diagnosing the obsolete form
regardless of compatibility mode quite easy.

But not for this commit.

> "This is because ... living fossil" may be appropriate in a commit
> message, but not as a comment, and in the commit message, a reference
> to the mailing list discussion may nearly suffice.
> 
> "This DWIM syntax is surprising" doesn't belong in the code at all;
> we might want to warn users in the manual page, though.
> 
> I don't think any comment is really needed at all because the
> code is very obvious and straightforward.  If you absolutely
> want to say something,
> 
>       // Support the legacy form \s10 to \s39.
> 
> or something like that should be sufficient.

Okay.  My long comment is not a hill I care to die on.

Regards,
Branden

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]