guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Thoughts on building things for substitutes and the Guix Build Coord


From: Ludovic Courtès
Subject: Re: Thoughts on building things for substitutes and the Guix Build Coordinator
Date: Tue, 17 Nov 2020 23:10:36 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)

Hi!

Christopher Baines <mail@cbaines.net> skribis:

> The way the Guix Build Coordinator generates compressed nars where the
> agent runs, then sends them over the network to the coordinator has a
> few benefits. The (sometimes expensive) work of generating the nars
> takes place where the agents are, so if you've got a bunch of machines
> running agents, that work is distributed.
>
> Also, when the nars are received by the coordinator, you have exactly
> what you need for serving substitutes. You just generate narinfo files,
> and then place the nars + narinfos where they can be fetched. The Guix
> Build Coordinator contains code to help with this.
>
> Because you aren't copying the store items back in to a single store, or
> serving substitutes from the store, you don't need to scale the store to
> serve more substitutes. You've still got a bunch of nars + narinfos to
> store, but I think that is an easier problem to tackle.

Yes, this is good for the use case of providing substitutes and it would
certainly help on a big build farm like berlin.

I see a lot could be shared with (guix scripts publish) and (guix
scripts substitute).  We should extract the relevant bits and move them
to new modules explicitly meant for more general consumption.  I think
it’s important to reduce duplication.

> The Guix Build Coordinator supports prioritisation of builds. You can
> assign a priority to builds, and it'll try to order builds in such a way
> that the higher priority builds get processed first. If the aim is to
> serve substitutes, doing some prioritisation might help building the
> most fetched things first.

Neat!

> Another feature supported by the Guix Build Coordinator is retries. If a
> build fails, the Guix Build Coordinator can automatically retry it. In a
> perfect world, everything would succeed first time, but because the
> world isn't perfect, there still can be intermittent build
> failures. Retrying failed builds even once can help reduce the chance
> that a failure leads to no substitutes for that builds as well as any
> builds that depend on that output.

That’s nice too; it’s one of the practical issues we have with Cuirass
and that’s tempting to ignore because “hey it’s all functional!”, but
then reality gets in the way.

> Now the not so good things:
>
> The Guix Build Coordinator just builds things, if you want to build all
> Guix packages, you need to work out the derivations, then submit builds
> for all of them. There's a script I wrote that does this with the help
> of a Guix Data Service instance, but that might not be ideal for all
> deployments. Even though it can handle the building of things, and most
> of the serving substitutes part (just not the serving bit), some other
> component(s) are needed.

That’s OK; it’s good that these two things (computing derivations and
building them) are separate.

> Because the build results don't end up in a store (they could, but as
> set out above, not being in the store is a feature I think), you can't
> use `guix gc` to get rid of old store entries/substitutes. I have some
> ideas about what to implement to provide some kind of GC approach over a
> bunch of nars + narinfos, but I haven't implemented anything yet.

‘guix publish’ has support for that via (guix cache), so if we could
share code, that’d be great.

One option would be to populate /var/cache/guix/publish and to let ‘guix
publish’ serve it from there.

> There could be issues with the implementation… I'd like to think it's
> relatively simple, but that doesn't mean there aren't issues. For some
> reason or another, getting backtraces for exceptions rarely works. Most
> of the time the coordinator tries to print a backtrace, the part of
> Guile doing that raises an exception. I've managed to cause it to
> segfault, through using SQLite incorrectly, which hasn't been obvious to
> fix at least for me. Additionally, there are some places where I'm
> fighting against bits of Guix, things like checking for substitutes
> without caching, or substituting a derivation without starting to build
> it.

I’ve haven’t yet watched your talk but I’ve what Mathieu’s, where he
admits to being concerned about the reliability of code involving Fibers
and/or SQLite (which I can understand given his/our experience, although
I’m maybe less pessimistic).  What’s your experience, how do you feel
about it?

> Finally, the instrumentation is somewhat reliant on Prometheus, and if
> you want a pretty dashboard, then you might need Grafana too. Both of
> these things aren't packaged for Guix, Prometheus might be feasible to
> package within the next few months, I doubt the same is true for Grafana
> (due to the use of NPM).

Heh.  :-)

Thanks for this update!

Ludo’.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]