bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#72166: Shepherd periodically goes unresponsive on one of my machines


From: Jonathan Frederickson
Subject: bug#72166: Shepherd periodically goes unresponsive on one of my machines
Date: Wed, 17 Jul 2024 20:43:15 -0400
User-agent: Cyrus-JMAP/3.11.0-alpha0-568-g843fbadbe-fm-20240701.003-g843fbadb

I've been running into an issue with Shepherd on one of my machines. Every so 
often (and I haven't figured out what conditions trigger it), my Shepherd 
instances (both home and PID 1) will go unresponsive. I thought I had tracked 
it down to a misbehaving home service that I had configured, but it's just 
happened again without that service running.

'herd status' hangs indefinitely:

jfred@terracard ~$ sudo herd status
Password: 
<never returns>

...on both instances:

jfred@terracard ~$ herd status
<never returns>

The PID 1 shepherd instance isn't reaping defunct processes:

jfred@terracard ~$ ps aux | grep -i lock
jfred      541  0.0  0.0   3700  2304 ?        S    18:30   0:00 swayidle -w 
timeout 300 swaylock -f -i ~/.wallpapers/user-manual.jpg timeout 10 if pgrep 
swaylock; then swaymsg "output * dpms off"; fi resume swaymsg "output * dpms 
on" before-sleep swaylock -f -i ~/.wallpapers/user-manual.jpg
jfred     3111  0.0  0.0      0     0 ?        Z    18:53   0:00 [swaylock] 
<defunct>
jfred     3112  0.0  0.0      0     0 ?        Zs   18:53   0:00 [swaylock] 
<defunct>

Some further troubleshooting... strace indicates that it's waiting on a read() 
on its fd 9:

jfred@terracard ~ [env]$ sudo strace -fp 1
Password: 
strace: Process 1 attached with 5 threads
[pid   144] read(9,  <unfinished ...>
[pid   142] futex(0x7fa43892abe8, 
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY 
<unfinished ...>
[pid   141] futex(0x7fa43892abe8, 
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY 
<unfinished ...>
[pid   140] futex(0x7fa43892abe8, 
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY^

...which seems to be:

jfred@terracard ~ [env]$ sudo ls -l /proc/1/fd/9
lr-x------ 1 root root 64 Jul 17 20:39 /proc/1/fd/9 -> 'pipe:[4015]'
jfred@terracard ~ [env]$ sudo lsof -n | grep 4015
lsof: WARNING: can't stat() fuse.portal file system /run/user/1000/doc
      Output information may be incomplete.
shepherd     1                      root    9r     FIFO               0,15      
 0t0       4015 pipe
shepherd     1                      root   11w     FIFO               0,15      
 0t0       4015 pipe
shepherd     1  140 GC-marker       root    9r     FIFO               0,15      
 0t0       4015 pipe
shepherd     1  140 GC-marker       root   11w     FIFO               0,15      
 0t0       4015 pipe
shepherd     1  141 GC-marker       root    9r     FIFO               0,15      
 0t0       4015 pipe
shepherd     1  141 GC-marker       root   11w     FIFO               0,15      
 0t0       4015 pipe
shepherd     1  142 GC-marker       root    9r     FIFO               0,15      
 0t0       4015 pipe
shepherd     1  142 GC-marker       root   11w     FIFO               0,15      
 0t0       4015 pipe
shepherd     1  144 shepherd        root    9r     FIFO               0,15      
 0t0       4015 pipe
shepherd     1  144 shepherd        root   11w     FIFO               0,15      
 0t0       4015 pipe

My system configuration for this machine can be found here, and I last ran a 
'guix pull' on June 21: 
https://github.com/jfrederickson/dotfiles/blob/master/guix/guix/system/machines/terracard/config.scm

Has anyone else run into this?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]