guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parallel guix builds can trample?


From: Phil Beadling
Subject: Re: Parallel guix builds can trample?
Date: Mon, 17 Jan 2022 17:23:24 +0000

Hi Ricardo, all,

 

I think we’ve worked out what the issue is, and have a proposed workaround, and perhaps a case for solving the problem in Guix itself (depending on what you people think!).

 

The issue is that despite each build being performed in its own isolated container, these containers are fed by the same per-user cached source directory.  In the case where different versions of the same repo are built at once, this results in a race condition.

 

In our case we have one Linux account that does a lot of automated Guix builds for us.

 

One example is this account watches our source control and automatically rebuilds all outstanding Pull Requests (PRs) on a repo, after a separate successful merge to our integration branch.  PRs are uniquely identified as monotonically increasing PR numbers eg, PR-1, PR-2, PR-3 and so on.  Each is a different branch on the same Repo with slightly different candidate changes in it.  They are automatically kept up to date with the integration branch.

 

To do this our watcher fires off (near) instantaneously dozens of guix builds, each with their own local channel customized for the PR it is building.  Doing them in parallel is important to make the system usably responsive.

 

Each fired process does this:

 

What we think is happening is the following:

 

 

Thus there is a race condition in this scenario.  We can provide a longer test script to demo this if required – it’s quite straightforward to reproduce just with a bash script, now we know what is causing it.

 

Our workaround has been to change XDG_CACHE_HOME for each PR build we do.  But this is a bit unsatisfactory as it effects processes beyond Guix – it casts too wide of a net, but it does resolve the problem for the time being.

 

Do people think this is enough of an issue to make a switch available in Guix to prevent sharing of cached clones?  This would be easy enough to implement – a crude solution would be that each cache directory name would simply be generated using a SHA of a string which includes the PID or similar to ensure a unique name, and because it is never going to be reused it could be deleted immediately after the build.

 

Whilst this is unlikely to happen at the console, as people script guix build use-cases to fit their own problems (in particular building lots of variations of a single piece of software) – I can see this causing a headache?  I think at least the manual should make it clear that you cannot build 2 packages referencing the same repo at the same time with the same user (unless I’ve missed this bit I don’t think it’s made explicitly clear?).  An even simpler change would be introduce a lock file that refused the 2nd build and at least preventing the race condition happening, and ensuring referential transparency, or simpler still just placed a warning on stderr?

 

If people are amenable to adding a switch or other config option, we’d be happy to look writing the patch?


Any thoughts/comments/advice?


Cheers!
Phil.


On Wed, 12 Jan 2022 at 09:37, Phil <phil@beadling.co.uk> wrote:
Hi - more details below.

Ricardo Wurmus writes:

>
> How are you using Guix with this?  Do you generate Guix package
> expressions?  Do you use “guix build --with-commit”?
>

The situation is like this - if we had a directory of clones of my
channel:

- pr-1
- pr-2
- pr-3
- pr-4
... and so on

Initially all the clones are taken from the master branch of my
channel and are all identical - but we change the version and commit to
match the head of each PR branch as per below.

Each clone looks like this:
- pr-1
      - my-package.scm
- pr-2
      - my-package.scm
and so on....

Each my-package.scm has a package like below - the inital packages are all
identical, but my system effectively seds the version and commit values
like the below.  These values are never committed back to master they
are used only as local channels to build each PR to test each build
still passes.

(define-public my-package
  (package
    (name "my-package")
    (version "this-is-different-for-each-pr")  ;; replace master version
    (source
      (git-checkout
        (url "ssh://same@repo:7999/same/repo.git")
        (commit "this-is-different-for-each-pr") ;; replace master version
everything else remains the same in the package....


At this point we have lots of local channels referencing different commits, in
the same package, ready to build - so I spawn them all simultaneously -
the equivalent pseudo-shell that I will mock up today would be:

# define some sort of return code array:
RC=[]

for dir in pr-dirs
  RC[${dir}]=`guix build -K -L ${dir} my-package & 2>&1 > /tmp/${dir}.log`  # note the ampersand
wait

for rc in $RC
  if $rc.value != 0:
    report the failure of build $rc.key

What I'm seeing occasionally is that the logs and return code for say directory pr-1
and appearing in the guix build for pr-3 or pr-6 instead.

We know this becuse the code is different enough in pr-1 that it's logs
are unique across all the PRs.  We can also check the source code if the
build fails using --keep-failed to show it doesn't match the commit id
in the package used to build it.

Hopefully that makes sense?  I can post the actual shell script once
I've written the mock.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]