[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Update on bordeaux.guix.gnu.org
From: |
Ludovic Courtès |
Subject: |
Re: Update on bordeaux.guix.gnu.org |
Date: |
Sun, 28 Nov 2021 18:26:05 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) |
Hello,
Christopher Baines <mail@cbaines.net> skribis:
> I've been doing some performance tuning, submitting builds is now more
> parallelised, a source of slowness when fetching builds has been
> addressed, and one of the long queries involved in allocating builds has
> been removed, which also improved handling of the WAL (Sqlite write
> ahead log).
>
> There's also a few new features. Agents can be deactivated which means
> they won't get any builds allocated. The coordinator now checks the
> hashes of outputs which are submitted, a safeguard which I added because
> the coordinator now also supports resuming the uploads of outputs. This
> is particularly important when trying to upload large (> 1GiB) outputs
> over slow connections.
>
> I also added a new x86_64 build machine. It's a 4 core Intel NUC that I
> had sitting around, but I cleaned it up and got it building things. This
> was particularly useful as I was able to use it to retry building
> guile@3.0.7, which is extremely hard to build [2]. This was blocking
> building the channel instance derivations for x86_64-linux.
>
> 2:
> https://data.guix.gnu.org/gnu/store/7k6s13bzbz5fd72ha1gx9rf6rrywhxzz-guile-3.0.7.drv
Neat! (Though I wouldn’t say building Guile is “extremely hard”,
especially on x86_64. :-)) The ability to keep retrying is much
welcome.
> On the related subject of data.guix.gnu.org (which is the source of
> derivations for bordeaux.guix.gnu.org, as well as a recipient of build
> information), there have been a couple of changes. There was some web
> crawler activity that was slowing data.guix.gnu.org down significantly,
> NGinx now has some rate limiting configuration to prevent crawlers
> abusing the service. The other change is that substitutes for the latest
> processed revision of master will be queried on a regular basis, so this
> page [3] should be roughly up to date, including for ci.guix.gnu.org.
>
> 3:
> https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-substitute-availability
That’s good news. That also means that things like
<https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-reproducibility>
should be more up-to-date, which is really cool! This can have a
drastic impact in how we monitor and address reproducibility issues.
> Now for some not so good things:
>
> Submitting builds wasn't working quite right for around a month, one of
> the changes I made to speed things up led to some builds being
> missed. This is now fixed, and all the missed builds have been
> submitted, but this was more than 50,000 builds. This, along with all
> the channel instance derivation builds that can now proceed mean that
> there's a very large backlog of x86 and ARM builds which will probably
> take at least another week to clear. While this backlog exists,
> substitute availability for x86_64-linux will be lower than usual.
At least it’s nice to have a clear picture of which builds are missing,
how much of a backlog we have, and what needs to be rebuilt.
> Space is running out on bayfront, the machine that runs the coordinator,
> stores all the nars and build logs, and serves the substitutes. I knew
> this was probably going to be an issue, bayfront didn't have much space
> to begin with, but I had hoped I'd be further forward in developing some
> way to allow moving the nars around between multiple machines, to remove
> the need to store all of them on bayfront. I have got a plan, there's
> some ideas I mentioned back in February [4], but I haven't got around to
> implementing anything yet. The disk space usage trend is pretty much
> linear, so if things continue without any change, I think it will be
> necessary to pause the agents within a month, to avoid filling up
> bayfront entirely.
Ah, bummer. I hope we can find a solution one way or another.
Certainly we could replicate nars on another machine with more disk,
possibly buying the necessary hardware with the project funds.
Thanks for the update!
Ludo’.