groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: parallel make problem with our Texinfo manual


From: Ingo Schwarze
Subject: Re: parallel make problem with our Texinfo manual
Date: Fri, 21 May 2021 18:23:56 +0200
User-agent: Mutt/1.12.2 (2019-09-21)

Hi Branden,

G. Branden Robinson wrote on Fri, May 21, 2021 at 04:57:59PM +1000:

> I have a reproducible problem that I don't understand.

Not very surprisingly, i didn't instantly manage to reproduce your
problem on a different machine and operating system.  That's typical 
for races, they depend on a lot of details.

> I've elided my actual build directory name below because it's not
> important.
> 
> $ echo '@c' >> doc/groff.texi
> $ (cd build && make -j all check)

The next line is from "make check":

> make  check-am

The next line is from "make all":

> LANG=C \
> LC_ALL=C \
> makeinfo -o doc/groff.info --enable-encoding -I.../groff/build/../doc 
> .../groff/build/../doc/groff.texi

Strangely, i don't get the next line at all:

> make[1]: Entering directory '.../groff/build'

The next line is again from "make check":

> make

I don't get the following line either:

> make[2]: Entering directory '.../groff/build'

I do not get the following second copy of the makeinfo invocation:

> LANG=C \
> LC_ALL=C \
> makeinfo -o doc/groff.info --enable-encoding -I.../groff/build/../doc 
> .../groff/build/../doc/groff.texi
> makeinfo: rename doc/groff.info failed: No such file or directory
> make[2]: *** [Makefile:12224: doc/groff.info] Error 1
[...]

> Essential parts of this include:
> (1) I have to actually modify groff.texi.  Touching its timestamp does
> not suffice.
> (2) I have to call the "all" _and_ "check" targets.
> (3) I have to use the "-j" flag.

In my tests, i'm using "make -j 4 all check" because without specifying
the maximum number of parallel jobs an an argument "4" to the "-j" option,
i get nothing but

   $ make -j all check
  make: illegal argument to -j option -- all -- invalid
  usage: make [-BeiknpqrSst] [-C directory] [-D variable] [-d flags] [-f mk]
            [-I directory] [-j max_processes] [-m directory] [-V variable]
            [NAME=value] [target ...]

> What I think is happening is that make(1) is forking off a job for each
> of the "all" and "check" targets, and they are racing against each
> other.  One of them always loses, so I always get the error.

That seems likely, yes.

> I see Ingo is feeling feisty,

Heh.  :-|

> so I'll add this: please don't advise me to not use -j.
> Our *.am files work fine with it in most respects.

No, and yes.

Parallel builds can be useful, not only for wasting less time waiting
for builds to finish but also for finding bugs in dependency
specifications in Makefiles.  More often than not, build failures
that only happen with -j indicate Makefile bugs, for example bugs
in dependency specifications, and sometimes other bugs, too.

What you describe certainly feels like a build system bug to me.

> In a minor mystery, I don't know what's generating that ENOENT
> diagnostic; on my system, makeinfo is a symlink to texi2any (as it is
> for most people, I expect),

For me, /usr/local/bin/gmakeinfo is just a Perl script, not a symlink,
installed by the texinfo-6.5p4 package.

> and in that Perl script I can't find a line corresponding to it.

Well, makeinfo is a large beast, including lots and lots of modules:

    require Texinfo::ModulePath;
    use Texinfo::Common;
    use Texinfo::Convert::Converter;
    require Texinfo::Parser;
    require Texinfo::Convert::HTML;

Just as a few examples...
It won't be easy finding anything in there.
Oh wait, grep(1) to the rescue:

   $ grep -R 'rename.*failed' /usr/local/share/texinfo/Texinfo 
  /usr/local/share/texinfo/Texinfo/Convert/Info.pm:
  $self->document_error(sprintf($self->__("rename %s failed: %s"),

The line before that is:

          unless (rename($self->{'output_file'}, 
                         $self->{'output_file'}.'-'.$out_file_nr)) {

So makeinfo(1) writes stuff to doc/groff.info, then renames the
written file to groff.info-1 and so on.  Now if two jobs do that
at the same time, we get this beautiful example of cooperation:

  job 1: starts writing doc/groff.info
  job 2: starts writing doc/groff.info - same name, different inode;
         the file being written by job 1 is now unlinked but still open
  job 1: finishes writing the unlinked file
  job 1: renames doc/groff.info to doc/groff.info-1 - and actually,
         that is the file being written by job 2
  job 2: finishes writing the file that it originally called doc/groff.info
         but that is now already called doc/groff.info-1 instead
  job 2: tries to rename doc/groff.info to doc/groff.info-1 -
         BOOM, because doc/groff.info no longer exists

Quite funny that the content of doc/groff.info-1 is prossibly
already correct, and so is the file name, and yet makeinfo(1) crashes.

Now, obviously this code in makeinfo(1) is rather fragile.
The standard idiom for handling such tasks is using mkstemp(3)
rather than using constant filenames for temporary files, which
would mitigate the problem somewhat.  But that's not our task here,
improving that is a job for the texinfo crowd over there --->.  ;-)

Our job here is to make sure our build system doesn't run makeinfo(1)
twice in the same make job - not doing that makes sense anyway, and
even more so given makeinfo's apparent fragility.

> Any ideas?

I don't understand yet why your make falls into the trap of running
makeinfo twice and mine doesn't, but i thought i might share these
partial results right away to reduce the risk of multiple people
doing the same analysis in parallel.  Races between developers, y'know.

Yours,
  Ingo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]