bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#31785: Multiple client 'build-paths' RPCs can lead to daemon deadloc


From: Ludovic Courtès
Subject: bug#31785: Multiple client 'build-paths' RPCs can lead to daemon deadlock
Date: Sat, 21 Dec 2024 17:22:15 +0100
User-agent: Gnus/5.13 (Gnus v5.13)

ludo@gnu.org (Ludovic Courtès) skribis:

> I tried running this:
>
>   guix build --max-jobs=200 $(guix gc -R $(guix build -d inkscape 
> --no-grafts) | sort) & \
>   guix build --max-jobs=200 $(guix gc -R $(guix build -d inkscape 
> --no-grafts) | sort -r)
>
> … also in parallel with this (for good measure):
>
>   guix build --max-jobs=200 $(guix gc -R $(guix build -d inkscape 
> --no-grafts) | sort -R)
>
> Since we have 3 clients, that leads to 3 guix-daemon processes, and
> those are stuck in a deadlock:

This strikes again: ‘cuirass remote-worker’ processes on berlin
occasionally end up deadlocking in the exact same way.

When running ‘current remote-worker --workers=4’, 4 sessions (4 clients)
are used, which can lead to that situation, as in this example:

--8<---------------cut here---------------start------------->8---
root@hydra-guix-126 ~# guix processes |guix shell recutils -- recsel -p 
'SessionPID,ClientCommand,LockHeld'
SessionPID: 27250
ClientCommand: 
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile 
--no-auto-compile -e main -s 
/gnu/store/ll18sc406b5cqapmvz17v22gh4sryb24-cuirass-1.2.0-11.e96f088/bin/.cuirass-real
 remote-worker --user=cuirass-worker --workers=4 
--systems=x86_64-linux,i686-linux --publish-port=5558 
--substitute-urls=http://141.80.167.131

SessionPID: 27269
ClientCommand: 
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile 
--no-auto-compile -e main -s 
/gnu/store/ll18sc406b5cqapmvz17v22gh4sryb24-cuirass-1.2.0-11.e96f088/bin/.cuirass-real
 remote-worker --user=cuirass-worker --workers=4 
--systems=x86_64-linux,i686-linux --publish-port=5558 
--substitute-urls=http://141.80.167.131
LockHeld: /gnu/store/72s7500g3zg2p6fjdc1paazvm1w2xdr2-libva-2.19.0.lock
LockHeld: /gnu/store/0bbnhq7bagn6sbj2lmapmdiiw50v3dgz-rav1e-0.7.1.lock

SessionPID: 27308
ClientCommand: 
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile 
--no-auto-compile -e main -s 
/gnu/store/ll18sc406b5cqapmvz17v22gh4sryb24-cuirass-1.2.0-11.e96f088/bin/.cuirass-real
 remote-worker --user=cuirass-worker --workers=4 
--systems=x86_64-linux,i686-linux --publish-port=5558 
--substitute-urls=http://141.80.167.131
LockHeld: /gnu/store/zf5w9ypk8il0i9y22n81aamypr2qgsmm-dav1d-1.5.0.lock

SessionPID: 27345
ClientCommand: 
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile 
--no-auto-compile -e main -s 
/gnu/store/ll18sc406b5cqapmvz17v22gh4sryb24-cuirass-1.2.0-11.e96f088/bin/.cuirass-real
 remote-worker --user=cuirass-worker --workers=4 
--systems=x86_64-linux,i686-linux --publish-port=5558 
--substitute-urls=http://141.80.167.131
LockHeld: 
/gnu/store/0xbi2bgq34yyx2fqjjwpgdv4gkfyaf60-gst-plugins-bad-minimal-1.22.3.lock
LockHeld: /gnu/store/ij5igi5xrp4sx6c78nbvg24lb4ma2f4l-libcbor-0.11.0.lock
LockHeld: /gnu/store/czfvm14yy517vb8w2hpp46nyrdrymqyp-libfido2-1.12.0.lock
LockHeld: /gnu/store/1ldcq0p20nqy7d3mxdy4yra1ax5ik3xc-mpg123-1.31.2.lock
LockHeld: /gnu/store/sadbf1fmb0n9k754x5jbbdklcxbjqlhx-openssh-9.9p1.lock
LockHeld: /gnu/store/86rl29llmb7s4sl3bx0vl465mmq7nk6f-gcr-3.41.2.lock

SessionPID: 27382
ClientCommand: 
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile 
--no-auto-compile -e main -s 
/gnu/store/ll18sc406b5cqapmvz17v22gh4sryb24-cuirass-1.2.0-11.e96f088/bin/.cuirass-real
 remote-worker --user=cuirass-worker --workers=4 
--systems=x86_64-linux,i686-linux --publish-port=5558 
--substitute-urls=http://141.80.167.131
--8<---------------cut here---------------end--------------->8---

Here process 27269 holds locks on libva and rav1e and waits forever
trying to get the dav1d lock, held by 27308; process 27308 tries to get
the rav1e lock; process 27345 tries to get the libva lock.

FWIW, each of them is trying to substitute (not build) those things, via
the ‘build-things’ call made after the “substituting ~a inputs for ~a”
message in remote-worker.

Ludo’.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]