[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#31785: Multiple client 'build-paths' RPCs can lead to daemon deadloc
From: |
Ludovic Courtès |
Subject: |
bug#31785: Multiple client 'build-paths' RPCs can lead to daemon deadlock |
Date: |
Sat, 21 Dec 2024 17:22:15 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) |
ludo@gnu.org (Ludovic Courtès) skribis:
> I tried running this:
>
> guix build --max-jobs=200 $(guix gc -R $(guix build -d inkscape
> --no-grafts) | sort) & \
> guix build --max-jobs=200 $(guix gc -R $(guix build -d inkscape
> --no-grafts) | sort -r)
>
> … also in parallel with this (for good measure):
>
> guix build --max-jobs=200 $(guix gc -R $(guix build -d inkscape
> --no-grafts) | sort -R)
>
> Since we have 3 clients, that leads to 3 guix-daemon processes, and
> those are stuck in a deadlock:
This strikes again: ‘cuirass remote-worker’ processes on berlin
occasionally end up deadlocking in the exact same way.
When running ‘current remote-worker --workers=4’, 4 sessions (4 clients)
are used, which can lead to that situation, as in this example:
--8<---------------cut here---------------start------------->8---
root@hydra-guix-126 ~# guix processes |guix shell recutils -- recsel -p
'SessionPID,ClientCommand,LockHeld'
SessionPID: 27250
ClientCommand:
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile -e main -s
/gnu/store/ll18sc406b5cqapmvz17v22gh4sryb24-cuirass-1.2.0-11.e96f088/bin/.cuirass-real
remote-worker --user=cuirass-worker --workers=4
--systems=x86_64-linux,i686-linux --publish-port=5558
--substitute-urls=http://141.80.167.131
SessionPID: 27269
ClientCommand:
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile -e main -s
/gnu/store/ll18sc406b5cqapmvz17v22gh4sryb24-cuirass-1.2.0-11.e96f088/bin/.cuirass-real
remote-worker --user=cuirass-worker --workers=4
--systems=x86_64-linux,i686-linux --publish-port=5558
--substitute-urls=http://141.80.167.131
LockHeld: /gnu/store/72s7500g3zg2p6fjdc1paazvm1w2xdr2-libva-2.19.0.lock
LockHeld: /gnu/store/0bbnhq7bagn6sbj2lmapmdiiw50v3dgz-rav1e-0.7.1.lock
SessionPID: 27308
ClientCommand:
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile -e main -s
/gnu/store/ll18sc406b5cqapmvz17v22gh4sryb24-cuirass-1.2.0-11.e96f088/bin/.cuirass-real
remote-worker --user=cuirass-worker --workers=4
--systems=x86_64-linux,i686-linux --publish-port=5558
--substitute-urls=http://141.80.167.131
LockHeld: /gnu/store/zf5w9ypk8il0i9y22n81aamypr2qgsmm-dav1d-1.5.0.lock
SessionPID: 27345
ClientCommand:
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile -e main -s
/gnu/store/ll18sc406b5cqapmvz17v22gh4sryb24-cuirass-1.2.0-11.e96f088/bin/.cuirass-real
remote-worker --user=cuirass-worker --workers=4
--systems=x86_64-linux,i686-linux --publish-port=5558
--substitute-urls=http://141.80.167.131
LockHeld:
/gnu/store/0xbi2bgq34yyx2fqjjwpgdv4gkfyaf60-gst-plugins-bad-minimal-1.22.3.lock
LockHeld: /gnu/store/ij5igi5xrp4sx6c78nbvg24lb4ma2f4l-libcbor-0.11.0.lock
LockHeld: /gnu/store/czfvm14yy517vb8w2hpp46nyrdrymqyp-libfido2-1.12.0.lock
LockHeld: /gnu/store/1ldcq0p20nqy7d3mxdy4yra1ax5ik3xc-mpg123-1.31.2.lock
LockHeld: /gnu/store/sadbf1fmb0n9k754x5jbbdklcxbjqlhx-openssh-9.9p1.lock
LockHeld: /gnu/store/86rl29llmb7s4sl3bx0vl465mmq7nk6f-gcr-3.41.2.lock
SessionPID: 27382
ClientCommand:
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile -e main -s
/gnu/store/ll18sc406b5cqapmvz17v22gh4sryb24-cuirass-1.2.0-11.e96f088/bin/.cuirass-real
remote-worker --user=cuirass-worker --workers=4
--systems=x86_64-linux,i686-linux --publish-port=5558
--substitute-urls=http://141.80.167.131
--8<---------------cut here---------------end--------------->8---
Here process 27269 holds locks on libva and rav1e and waits forever
trying to get the dav1d lock, held by 27308; process 27308 tries to get
the rav1e lock; process 27345 tries to get the libva lock.
FWIW, each of them is trying to substitute (not build) those things, via
the ‘build-things’ call made after the “substituting ~a inputs for ~a”
message in remote-worker.
Ludo’.
- bug#31785: Multiple client 'build-paths' RPCs can lead to daemon deadlock,
Ludovic Courtès <=