bug-ncurses
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: make install can fail with parallel make


From: Egmont Koblinger
Subject: Re: make install can fail with parallel make
Date: Sun, 14 Aug 2016 00:22:50 +0200

Hi guys,

My 2 cents, not related to ncurses in particular:

10+ years ago I used to work for a small Linux distribution. We built our build system from scratch. Each package was separately built in designated chroot (usually ./configure; make; make install – or whatever was required for that package), these kinds of builds were parallelized across multiple computers. It worked like a charm. We had a very few special cases (maybe 2-3 packages out of 1000) where the build was undeterministic for various weird reasons that we tracked down individually (the one I remember was that the 1 second precision of files' timestamps wasn't fine-grained enough, we needed to insert a "sleep 1.1" somewhere to make absolutely sure the timestamp increased).

At one point a colleague of mine insisted on trying "make -j2" (or -j4, can't recall – whatever) builds by default to speed them up. I firmly opposed, but we gave it a try. Out of the ~1000 packages we had we saw a failure after like perhaps the 10th package that we tried to build. That was the moment we reverted this change for good.

What's behind the story?

"make" (without -j) is deterministic and hence reproducible. Apart from a very few really tricky cases, if it works on the author's computer then it'll work on yours.

"make -j..." is undeterministic and hence unreproducible. Even for small projects the order of builds can vary in hundreds or thousands of ways. Noone is ever able to test them all. If it builds successfully on the author's machine many times, there's still absolutely no guarantee that it'll always build on yours. (Especially since the author probably doesn't use parallelization).

Getting Makefiles (or any other build rules) correct is tricky. There is absolutely no guarantee that these build rules are correct and solid across all kinds of parallelization and timings. And if something's not tested, you should not expect it to work correctly.

My concluision:

Feel free to use -j for your own project, provided that as soon as you face a glitch you immediately track it down, or even better, if you have some automated tools to generate the build rules that guarantee to always work.

Feel free to use -j for someone else's project if you know for sure that they also use it, or perhaps it's there by default in the build rules.

If you don't have a solid proof that the author uses -j or any other means that guarantee that it works: just forget it!

I'm not saying there's no such bug in ncurses. You've found it so apparently it's there. But I guess you could similarly find thousands of other projects that are broken similarly and it's just not feasible to get them all fixed (and make sure that they are indeed robustly fixed, no matter how the parallel build happens to go). Just accept that we live in an imperfect world, and stay safe by using serial builds unless you're absolutely certain you're safe to go parallel. Go and grab one more coffee or take a longer walk until the build completes, or use your remaining CPUs/cores to run a totally independent task. The amount you'd win with a successful parallel build is nothing compared to what you'd lose from uncaught/untraceable/unreproducible bugs of a failing parallel one.

Cheers,
egmont



On Sat, Aug 13, 2016 at 9:57 PM, Shaun Jackman <address@hidden> wrote:
Thanks for looking into it, Thomas. Could you please respond at this GitHub issue:
https://github.com/Homebrew/homebrew-dupes/pull/630

-- 
Shaun Jackman
http://sjackman.ca

On August 13, 2016 at 15:11:00, Thomas Dickey (address@hidden) wrote:

On Mon, Jul 04, 2016 at 02:34:45AM -0400, Shaun Jackman wrote:
> `make install` can fail with parallel make. `make -j1` works around the issue.
> See https://github.com/Homebrew/homebrew-dupes/issues/622
> and https://github.com/Homebrew/homebrew-dupes/pull/630
>
> ```
> installing ../lib/libncursesw.a as
> /home/nh79w/.linuxbrew/Cellar/ncurses/6.0_1/lib/libncursesw.a
> /home/nh79w/.linuxbrew/bin/ginstall -c -m 644 ../lib/libncursesw.a
> /home/nh79w/.linuxbrew/Cellar/ncurses/6.0_1/lib/libncursesw.a
> ranlib /home/nh79w/.linuxbrew/Cellar/ncurses/6.0_1/lib/libncursesw.a
> installing ../lib/libncursesw_g.a as
> /home/nh79w/.linuxbrew/Cellar/ncurses/6.0_1/lib/libncursesw_g.a
> /home/nh79w/.linuxbrew/bin/ginstall -c -m 644 ../lib/libncursesw_g.a
> /home/nh79w/.linuxbrew/Cellar/ncurses/6.0_1/lib/libncursesw_g.a
> /home/nh79w/.linuxbrew/bin/ginstall: cannot stat
> '../lib/libncursesw_g.a': No such file or directory
> make[1]: *** [Makefile:2840: install] Error 1
> make[1]: Leaving directory
> '/tmp/ncurses-20160613-61631-qne5hl/ncurses-6.0/ncurses'
> make: *** [Makefile:115: install] Error 2
> ```

hmm - the logs at

https://gist.github.com/anonymous/3d5a6fa1fcb71d7d55f0afbabcc7b6b3#file-02-make-L2-L1440

give the needed information. But my attempt to reproduce this with -j4
hasn't succeeded (I don't have hardware that could do -j20).

--
Thomas E. Dickey <address@hidden>
http://invisible-island.net
ftp://invisible-island.net

_______________________________________________
Bug-ncurses mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/bug-ncurses



reply via email to

[Prev in Thread] Current Thread [Next in Thread]