Re: groff maintainership, release, and blockers (was: groff 1.23.0.rc2 r

groff
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: groff maintainership, release, and blockers (was: groff 1.23.0.rc2 r

From:	Ingo Schwarze
Subject:	Re: groff maintainership, release, and blockers (was: groff 1.23.0.rc2 readiness)
Date:	Sat, 27 Aug 2022 11:50:56 +0200
Hi Branden,

G. Branden Robinson wrote on Fri, Aug 26, 2022 at 02:04:57PM -0500:

[...]
> The FSF provides useful infrastructure.

Fair point.  You are right that the effort required to run servers for
a VCS, ticket handling, web, and mail is not negligible.  So if the
people actually doing the administrative work think the FSF services
are worth spending a few months on processes now and then, so be it.
Of course i do *not* advocate selling out to some commercial hosting
service (like github), or to some third-party service with a totally
insane, unusable API (like github).

[...]
> The FSF has its problems but selling out to a copyright rentier firm
> seems like a pretty low risk.

Even i consider that particular risk as so low that it doesn't matter.
And even if it happened, the FSF could still be abandoned at *that*
point.

[...]
> The grass isn't so green.  In my experience nearly everything in
> software management that looks "agile" and low-friction is that way
> because there is some serious infrastructure beneath that people have
> worked hard to make unobtrusive.

That rings very true to me.


> At 2022-08-26T13:51:25+0200, Ingo Schwarze wrote:
>> Branden wrote:

[...]
>>> But for the sake of transparency, in the meantime, he asked if the
>>> current HEAD was good enough to tag as "rc2" and I said "yes".

>> Sorry, i fail to understand that.  The acronym "RC" stands for "release
>> candidate".  I would define a "release candidate" as "a version that
>> is believed to be ready for release".

> Apparently we have a terminological and/or philosophical disagreement.

The conversation below reveals that indeed the majority of the points
i raised were caused by me misunderstanding what you meant when
saying "RC".

> My objective since Bertrand added the first automatic tests before the
> groff 1.22.4 release has been to _never_ have Git HEAD in a state where
> _any_ tests fail,

That's a worthy goal - incidentally, i do the same for mandoc, and
i think that is *very* common practice in most free or commercial
software contexts.

It has nothing to do with whether or not a tree is in a state that
is close to mature enough for a release in the near future.  Even
if a project has an above-average regression suite, the tree can easily
be in a state that is highly unstable and likely contains many new
regressions even when the test suite succeeds.

Slowly adding tests to groff is not a bad idea, but right now, the
groff test suite still has close to zero coverage, so it is almost
meaningless in this respect.

> Therefore, by that standard, any commit not marked "Test fails at this
> commit."..._is a release candidate_.

I consider that a ridiculous standard.  What's the point of
having a term for something if the defining property is trivial?
Or expressing the same question differently, why have a term that
means nothing?  In particular if the term is commonly used in a
completely different sense in the context.

But i do admit that disagreement is purely terminological, so we won't
die from not resolving it.

> On the other hand, that statement is unrealistic.  We don't have a
> regression test for every defect in groff because, like any
> non-formally-verified codebase of non-trivial complexity, groff has bugs
> we don't know about and therefore cannot test for.  It can also have
> bugs that we know about but don't understand well enough to a write a
> test for, and bugs that manifest only on platforms or configurations
> that its regular developers don't test.

Indeed.  It is by definition impossible to measure and/or prove
the density of such bugs.  But sometimes, one does have reasonable
grounds for an informal, non-quantitive estimate of the density,
and when well-informed people feel it is lower than usual in their
project, they usually say "now would be a good time for release".

> None of these are novel observations; it's why people have "continuous
> integration" infrastructures.

What i'm saying is that even though *functionally*, groff-current
is decisively better than groff-1.22.4 in large numbers of respects,
my impression is the regression density in groff has, during the last
two years, never been as high as it is right now.  I feel less sure
about the years before, but if i remember correctly, that statement
not only applies to the last two years, but to the last decade.

I would call such a state "aggressive unstable development", i.e. the
direct opposite of "beta", let alone "RC".  My question is: how do we
get from an unstable development state to an RC-ready state?

Then again, you seem to disagree that the current development state
is unstable, in which case maybe mopping of the remaining known
puddles and then releasing is not unreasonable.

>> The purpose of an RC is to have it tested on as many platforms and for
>> as many different purposes as possible, to confirm that indeed no
>> undiscovered regressions exist

> I wouldn't say "no" undiscovered regressions, I'd say "minimal".  There
> will be bugs we don't turn up, and bugs that even exotic platforms don't
> expose.

Yes, bad wording on my part, i meant s/exist$/are found/.

>> and that in particular the last few commits made before the RC did not
>> cause regressions.

> groff is an assembly of modular components.  We can be reasonably
> confident that a change to a macro package cannot cause a bug in the
> formatter.  (It might _expose_ one.)  We can be reasonably confident
> that a change to xtotroff(1) is not going to cause a bug in grog(1).

True.  Then again, a serious bug in *any* component may be a serious
bug in the product as a whole.

> More importantly and practically, everyone who looks at groff's commit
> history knows that I make lots of changes to documentation.  Such a
> change can be ill-advised, incorrect, or even stupid, but it's not going
> to cause a SEGV in the formatter.

Yes, an argument can be made that documentation changes can be made right
up to the final RC - even though it would be unfortunate to put in a
*major* documentation change so close to release that it slips in,
and then find out after release that lots of people hate that change
in particular.  But that can be handled with common sense and is not
a critical consideration.

> So, while I think your narrowly drawn concept of an RC works well for an
> individual software module, it applies less comfortably to a more
> complex assembly with loose coupling among components.

I think the reverse is true.  If you release a small one-source-file
program, it may occasionally be feasible to sneak in a small fix
right before release (even though it's a risk even in that case).

But in a larger project, release discipline is *more* important,
not less.  In both free and commercial software projects i'm familiar
with, the tree is typically locked after an RC is issued, and
non-critical changes are typically no longer allowed.  In particular
in large projects because those tend to be harder to test than small
ones.

>> These tests cause non-trivial work for significant numbers of people,
>> most of whom are *not* groff developers, so an RC should only be made
>> when the software is really believed to be ready - both out of respect
>> for testers' time and because releasing multiple RCs will weary out
>> testers and increase the likelihood of serious bugs slipping into the
>> release: some testers will not have the time to test over and over
>> again, so the more RCs you ship, the less test coverage you get.

> I agree with this, but I reiterate that, in a sense, we've had literally
> thousands of RCs since groff 1.22.4.  Stability, on the platforms that
> are readily available to be tested by the developers, has not been a
> problem.  Even when I've made what I consider to be a boneheaded mistake
> and moved quickly to fix it, the stakes have typically not been high
> (even if one considers typesetting software mission-critical).
> 
> Take the most recent example.
> 
> https://git.savannah.gnu.org/cgit/groff.git/commit/?id=87efb8ff373d5cf3b92be9d21445a80b264fa961
> 
> What happened here?  When an invalid document (one that selected an
> impossible or unavailable font family) is rendered, groff 1.22.4 would
> issue a diagnostic, albeit a slightly confusing one, but for nearly the
> past year, groff Git would not.
> 
> A _valid_ input document was not affected at all.

Indeed, i agree the *majority* of regressions is likely low-impact.
Releasing when there are as few as reasonably possible, and not at
a random point in time as your definition of RC appears to suggest,
still makes sense, tough.

[...]
> But according to Savannah, we have 7 blocker issues right now.
[...]
>     #62933 [man] produce hyperlinks in PDF output
>       Macro man 5 - Blocker gbranden 2022-08-21

I think it would be a major mistake to include it.  If i understand
correctly, the code needed for the feature does not even exist yet.
How can something be included in a release that cannot even be tested
by the time people want to roll an RC?  So it seems obvious to me
this has to be deferred.  It is pretty obvious the order needs to be
"development before release", not the other way round, right?

>     #62926 [mdoc] align styling of titles and man page cross references
>       with man(7)
>       Macro mdoc 5 - Blocker gbranden 2022-08-20
> 
>       This is a complaint of Ingo's, and a goal of mine.  The idea is to
>       have a more consistent presentation of man(7) and mdoc(7) pages,
>       making it harder for the _reader_ of a man page to deduce which
>       macros were used.  This makes a better user experience.  (Why
>       should they care which package a document uses?  Man pages should
>       look like man pages.)  groff man(7) and mdoc(7) have differed in
>       minor rendering details for as long as they have existed.

While i agree with the broad direction you give in your summary in this
mail (but not with the ticket title), i think broadly user-visible
formatting changes should go in *way* before a release.  Rushed in at
the last minute, they cause major noise if people run diff(1) to look
for regressions, making it likely that unrelated bugs slip into the
release, hidden by the noise.  So IMHO this ought to be postponed until
after release.

>     #62918 Wrong GhostScript version reported during build
>       Font - others/general 5 - Blocker Need Info alhadis 2022-08-19
> 
>       This needs research.  I think it might be the same as #62860
>       below.

It may be sane to treat this as a blocker.  Generally, making sure
the build system works properly is important for release, and much
improvement was achieved in the surroundings, so this could possibly
be a regression, releated to one, or worth fixing before release for
other reasons.

>     #62860 [build] audit for inappropriate use of system groff resources
>       General 5 - Blocker gbranden 2022-08-04
> 
>       This is the most serious from a strict engineering perspective.
>       Apparently sometimes the build finds itself with recourse to
>       installed groff and will use it instead of the groff artifacts
>       that are generated _by_ the build.  Since groff is pretty stable,
>       no known _practical_ problems arise from this, but it still feels
>       to me a bit like driving around town with your car doors removed.
>       A strictly disciplined build environment (i.e., build in a chroot
>       with no installed groff) won't have a problem with this, but users
>       of the groff distribution archive may well play fast and loose.
>       We should keep that from influencing our build artifacts.
> 
>       Still, if nothing is known to be _broken_, can it be a blocker?
> 
>       It is also the item on the list requiring the most effort, so I
>       have to admit I've been procrastinating it a bit.  grepping
>       straces is not my idea of a party.  Someone want to volunteer?  :)

Two comments:

 (1) It is possible this *cannot* be a blocker because the scope of
     the effort may be too vast and the effort cannot be spent.
 (2) It is possible one or more problems hide in there that would
     be worth fixing before release.

So, not treating it as a blocker certainly wouldn't be wrong,
and yet, *if* a specific issue is found and the fix is low-risk,
committing it may be OK even close to release.

>     #62774 [mdoc] warn if any of `Dd`, `Dt`, `Os` not called
>       Macro mdoc 5 - Blocker gbranden 2022-07-16
> 
>       This is Ingo and me wanting to be more fastidious about handling
>       of invalid, degenerate mdoc(7) input.  I don't think handling
>       invalid input in an ugly way can be a blocker unless you _fail_ to
>       handle it, which in C all too often means an infinite loop, SEGV,
>       or undefined behavior.  But here, the handler is a macro package.
>       The consequence appears to be a choice between different ugly ways
>       of rendering invalid pages.

Useful non-existant functionality that needs non-trivial design effort,
and after that some implementation effort, to resolve.  Clearly not
release-critical, and likely to cause further regressions if something
goes wrong.  So i would advise against attempting it after an RC2
in the sense you define.

>     #61423 [libgroff] allow paths in "name" directive of font description 
> file, restoring historical groff behavior
>       Font devps 5 - Blocker Need Info gbranden barx 2021-11-04
> 
>       This ticket is, at this point, a documentation issue.  I need to
>       decide whether a behavior change (to stop accepting dubious input)
>       is worth documenting more prominently.

Fair enough, i don't feel stressed about this either way.

>     #58930 take baby steps toward Unicode
>       Core 5 - Blocker Need Info gbranden barx 2020-08-10
> 
>       Like the previous, this is a reminder to myself to decide how I
>       want to document something.  Once that is addressed, the ticket
>       itself will remain open, with a lower severity.  Sorry to say,
>       "full Unicode support" is not pending and not expected for the
>       1.23 release.  (But I'm getting better ideas about how to perturb
>       the code base toward it afterward.)

So, obviously not a "release blocker" in the usual sense, and you are
already planning to change the severity field.  At which point that
reassignment is done is not important.

> There have been recent discussions about gropdf's handling of "download"
> files, including multiple people expressing a desire for grops to work
> the same way.  That's not a blocker yet because, last I checked, the
> gropdf work wasn't done yet.  (Ralph had some good suggestions.)  I'd
> really like to get these changes and alignment into 1.23 but it's
> obviously at risk because the grops changes haven't been written yet.
> But, under your definition, it _can't_ be a blocker because the programs
> _work_.  As far as I know they work at least as well as they did in
> groff 1.22.4.  gropdf assuredly works better; Deri has fixed several
> Savannah tickets regarding it over the past 4 years.

Wait a second.  A new feature *can* be a blocker for a release if the
release is far enough in the future that the feature can be designed,
implemented, reviewed, systematically tested, and tested in practice
before getting anywhere near beta state.  In fact, that's how software
development cycles often work: A release is driven out of the barn,
then right afterwards a list of desired features is drafted and marked
as "blockers" even though not even the design has been decided yet,
let alone the implementation.

Yes, during my history of commercial software development, i did
occasionally hear people say: "I need an RC by tomorrow 2 P.M.
Oh and by the way, my customer X had this smart and conceptually
novel idea.  It seems like a good suggestion.  Can you include that?"
But those people were Sales Account Managers, not Engineers.
Not even talking about the reaction of the Engineering Department,
suffice to say the Support Department was rarely enthusiastic
about calling that kind of release management "agile".  ;-)

> _None_ of these are really release critical.  They are simply things
> that I feel would be much better to address before release than not.

Some are, some may be better deferred to after release.

> On the other hand, applying your definition strictly, we'd almost never
> have any blocker bugs at all.  I can't remember one ever having arisen
> in the five-plus years I've been contributing.  (Maybe there was, it was
> my fault, I fixed it really quickly, and my sense of shame has effaced
> it from my memory.)
> 
> You could indeed say that I am abusing the Savannah ticket tracker and
> its "Blocker" severity to serve as a sort of to-do list, since we don't
> have anything legitimately blocking the release.

Fair enough.  Not a real problem to use it that way.

> A lot of software projects would call this a nice problem to have.
> 
> It has occurred to me that, once a Blocker item is resolved, I should
> knock its severity back down where it really belongs; I'm not sure that
> I have consistently done so in the past, and as the mantle of
> "maintainer" slowly settles over my shoulders it will be more obviously
> my (self-imposed) duty to do that.

Sounds desirable to me.  Then again, if the severity of a few closed
tickets is inaccurate, that is not a major problem.

> So, we could indeed stop using the Savannah Blocker severity for the
> purpose I'm employing it.
> 
> But I ask you: what _good_ would that do, apart from satisfying your
> personal esthetic of release management?  groff needs release processes
> that work for us.  I haven't caused anyone to leap onto any Blocker
> tickets and spend late nights on them.  (If I have, TELL ME!)

Right, fair enough.

> [resequencing two paragraphs here]
>> Not only do we have a significant numbers of open blockers, but i
>> also reported that the mandoc test suite found thirty-seven changes
>> of behaviour between the last groff release and groff-current that
>> i did not find the time to analyze just jet (in addition to changes
>> that i already investigated and that turned out to be in part groff
>> regressions, in part mandoc bugs, and in part intentional and useful
>> changes in groff behaviour).

> It seems likely that the RC2 process is going to take long enough that
> you will have time to triage the remainder of these 37 issues.  I regret
> any bugs I've introduced to groff; at the same time it sounds like
> mandoc's test suite is doing its job admirably--it is creating work,
> yes, but also exposing bugs in multiple formatters and identifying
> places where increasing behavioral parity between groff and mandoc will
> redound to all users' benefit.

Do not over-estimate the quality of the mandoc test suite.
It is very strongly focussed on -mandoc -Tascii, and even in that
very narrow region, i believe its coverage is far below 50%.
Testing coverage for -Tutf8, -Thtml, and raw roff(7) is likely
in the low single-digit percent range, and coverage for -Tpdf
and -Tps is precisely zero.  And yet, triaging the results causes
non-trivial work.

So far, i focussed on analysis, i.e. i picked the first issue,
looked it until i understood it, then either fixed it in mandoc
or reported it to groff or adjusted the test to the new desired
behaviour.  After repeating that a few times, i occasionally
did a git pull, resulting in a slow *increase* rather than a slow
*decrease* of the number of un-analyzed issues over time, on average.

If you say "we want to issue an RC in about two weeks", i would
temporarily switch to triaging mode:

 (1) keep inspection times short
 (2) add mandoc issues to TODO rather than fixing them
 (3) try to order issues by priority (even though the true
     importance is only known *after* analysis)
 (4) if an issue seems potentially medium or high priority
     but a full analysis turns out to take too much time,
     report the partial results to groff and move on to the
     next one

That way, we would get at least a rough estimate of how much
unintentional change in behaviour we are accepting in those areas
where the mandoc test suite provides partial coverage.

But if you talk about "some good suggestions" and "let's roll an RC"
in the same breath, i feel seriously confused as to whether i should
minimize the total time needed for analysis (accepting that my report
"i looked at least briefly at every difference and while some are
annoying, none are critical" would come later) or work in triage
mode (getting an overview as early as possible, but increasing the
total working time needed).

>> After the RC, it is the critical to not commit anything except fixes
>> for critical regressions that people reported from RC testing.
>> In particular, after an RC, no bugs must be fixed that were already
>> known before the RC was sent out.

> Bertrand and I have agreed upon a less strict approach.  We, or
> possibly I, will identify a recent commit in groff Git from which he
> will upload a distribution archive of groff 1.23.0.rc2.  I've
> volunteered to write (most of) a release announcement email to try to
> drum excitement and get people who didn't test rc1 to test this one.
> 
> We'll get feedback on that, and either address or postpone work on
> issues currently marked as Blockers, and address any (likely build
> system-related) problems reported, probably from relatively exotic host
> environments.  (I find it quite sad that any ISA that isn't x86 or ARM
> is "exotic" these days.  I remember when Debian shipped its stable
> release for 11 platforms, including PowerPC, HP PA-RISC, MIPS, SPARC,
> DEC Alpha, Itanium, and m68k.)
> 
> That done, RC3 will be tagged and released.  At that point, no further
> "code" changes will be done on the master branch until after final
> release.  Even a build/install failure on a platform might not gate the
> release at that point, unless it's a regression from RC2, because we
> asked for testers _for_ RC2.  If an "exotic" host environment regresses,
> it might have to wait for a 1.23.1 release (which I personally would be
> anxious to do under such circumstances).

That does not sound unreasonable to me.  I think saying in the mail,
with this priority but not necessarily this order or wording, would
be useful, but ultimately your call:

 (1) If you are short on time and can only afford one set of tests,
     testing the final RC3 is more important than testing RC2.
 (2) If you have the time needed for *two* sets of tests, testing
     both is useful because the earlier we know of problems the better.
     That's particularly relevant if you use an unusual platform
     or use groff in unusual ways.
 (3) RC2 is absolutely *not* an improved version of RC1 but a
     completely new beta with lots of new functionality.
 (4) The reason for calling RC2 an "RC" is exclusively file name
     uniformity.  In reality, it is more like an unfinished release
     preview or a beta.

> If, somehow, we break Linux x86-64 with the RC3 tag (I can't imagine
> this would happen--Bertrand I would both refuse to attach such a tag to
> a commit that misbehaved so badly), then we'd need to have an RC4.
> 
> But not otherwise.

Yes, very unexpected things may sometimes happen, but i agree Linux x86-64
is by far the least likely platform to break, so i don't worry too much
about that.

> Documentation updates will proceed throughout the RC process, all the
> way up to final release.

Maybe i would refrain from major, conceptual, potentially controversial
documentation changes during the RC process, but that's not a
critical point.

> If there are none after RC3, then the "1.23.0"
> tag will be applied to the same commit as "1.23.0.rc3".  Then, an
> official distribution archive gets created and announced to info-gnu
> (and here, of course).
> 
> That's the plan.  This is, as far as I can recall, exactly what we did
> for groff 1.22.4.

It does sound reasonable.

>> So it is totally obvious to me that the code base is *not* in a good
>> shape and quite far from being ready for an RC.

> I don't agree.  I think it is very close, though of course I reserve the
> right to change my mind if one of the 37 mandoc regression tests points
> out something horrific.  But if it does, I will wonder how _I_ didn't
> have a regression or unit test for the same misbehavior already, since
> I've spent the last 4+ years studying groff's own man page renderings in
> text, HTML, and PDF closely on a nearly daily basis.
> 
> Discussions like this are why I _didn't_ want the maintainer job.  It
> takes a lot of time to explain and justify what are ultimately
> irreducibly subjective criteria.  On the other hand, we all benefit from
> the transparency.

I agree.

> [snipping your projected timeline because while I'd like to dismiss your
> time estimates as too pessimistic, I can't]

If we beat the timeline with a good release, i'm happy.

Yours,
  Ingo
[Prev in Thread]
Current Thread
[Next in Thread]
Re: groff 1.23.0.rc2 readiness, (continued)
Prev by Date: Re: groff maintainership, release, and blockers
Next by Date: Re: groff maintainership, release, and blockers
Previous by thread: Re: groff maintainership, release, and blockers
Next by thread: Re: groff maintainership, release, and blockers (was: groff 1.23.0.rc2 readiness)
Index(es):
- Date
- Thread