[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#72166: Shepherd periodically goes unresponsive on one of my machines
From: |
Jonathan Frederickson |
Subject: |
bug#72166: Shepherd periodically goes unresponsive on one of my machines |
Date: |
Wed, 17 Jul 2024 20:43:15 -0400 |
User-agent: |
Cyrus-JMAP/3.11.0-alpha0-568-g843fbadbe-fm-20240701.003-g843fbadb |
I've been running into an issue with Shepherd on one of my machines. Every so
often (and I haven't figured out what conditions trigger it), my Shepherd
instances (both home and PID 1) will go unresponsive. I thought I had tracked
it down to a misbehaving home service that I had configured, but it's just
happened again without that service running.
'herd status' hangs indefinitely:
jfred@terracard ~$ sudo herd status
Password:
<never returns>
...on both instances:
jfred@terracard ~$ herd status
<never returns>
The PID 1 shepherd instance isn't reaping defunct processes:
jfred@terracard ~$ ps aux | grep -i lock
jfred 541 0.0 0.0 3700 2304 ? S 18:30 0:00 swayidle -w
timeout 300 swaylock -f -i ~/.wallpapers/user-manual.jpg timeout 10 if pgrep
swaylock; then swaymsg "output * dpms off"; fi resume swaymsg "output * dpms
on" before-sleep swaylock -f -i ~/.wallpapers/user-manual.jpg
jfred 3111 0.0 0.0 0 0 ? Z 18:53 0:00 [swaylock]
<defunct>
jfred 3112 0.0 0.0 0 0 ? Zs 18:53 0:00 [swaylock]
<defunct>
Some further troubleshooting... strace indicates that it's waiting on a read()
on its fd 9:
jfred@terracard ~ [env]$ sudo strace -fp 1
Password:
strace: Process 1 attached with 5 threads
[pid 144] read(9, <unfinished ...>
[pid 142] futex(0x7fa43892abe8,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY
<unfinished ...>
[pid 141] futex(0x7fa43892abe8,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY
<unfinished ...>
[pid 140] futex(0x7fa43892abe8,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY^
...which seems to be:
jfred@terracard ~ [env]$ sudo ls -l /proc/1/fd/9
lr-x------ 1 root root 64 Jul 17 20:39 /proc/1/fd/9 -> 'pipe:[4015]'
jfred@terracard ~ [env]$ sudo lsof -n | grep 4015
lsof: WARNING: can't stat() fuse.portal file system /run/user/1000/doc
Output information may be incomplete.
shepherd 1 root 9r FIFO 0,15
0t0 4015 pipe
shepherd 1 root 11w FIFO 0,15
0t0 4015 pipe
shepherd 1 140 GC-marker root 9r FIFO 0,15
0t0 4015 pipe
shepherd 1 140 GC-marker root 11w FIFO 0,15
0t0 4015 pipe
shepherd 1 141 GC-marker root 9r FIFO 0,15
0t0 4015 pipe
shepherd 1 141 GC-marker root 11w FIFO 0,15
0t0 4015 pipe
shepherd 1 142 GC-marker root 9r FIFO 0,15
0t0 4015 pipe
shepherd 1 142 GC-marker root 11w FIFO 0,15
0t0 4015 pipe
shepherd 1 144 shepherd root 9r FIFO 0,15
0t0 4015 pipe
shepherd 1 144 shepherd root 11w FIFO 0,15
0t0 4015 pipe
My system configuration for this machine can be found here, and I last ran a
'guix pull' on June 21:
https://github.com/jfrederickson/dotfiles/blob/master/guix/guix/system/machines/terracard/config.scm
Has anyone else run into this?
- bug#72166: Shepherd periodically goes unresponsive on one of my machines,
Jonathan Frederickson <=