[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#68800: Guix waits for termination of a kernel thread
From: |
Tomas Volf |
Subject: |
bug#68800: Guix waits for termination of a kernel thread |
Date: |
Mon, 29 Jan 2024 18:34:33 +0100 |
Hello,
when Guix machine is shutting down, it keeps waiting for PID associated with
[mt76-tx phy0] to terminate. Since it is a kernel thread, it does not happen.
Previous discussion on this bug was done via email, and is copied here:
Date: Sun, 7 Jan 2024 15:59:51 +0100
From: Tomas Volf <~@wolfsden.cz>
To: guix-devel@gnu.org
Subject: Re: GNU Shepherd 0.10.3 released
On 2024-01-07 15:08:59 +0100, Ludovic Courtès wrote:
> We are pleased to announce the GNU Shepherd version 0.10.3, a bug-fix
> release of the new 0.10.x series, representing 51 commits over 6 months.
Congratulations on the release :)
>
> ** Do not accidentally wait for Linux kernel thread completion
> (<https://issues.guix.gnu.org/67132>)
>
> In cases a PID file contained a bogus PID or one that’s only valid in a
> separate PID namespace, shepherd could end up waiting for the termination of
> what’s actually a Linux kernel thread, such as PID 2 (“kthreadd”). This
> situation is now recognized and avoided.
This is great, I will not have to remember to run `modprobe -r mt7921e' before
each shutdown anymore. I hope. Looking forward to getting it in the Guix :)
Have a nice 2024,
Tomas Volf
Date: Wed, 10 Jan 2024 00:34:48 +0100
From: Ludovic Courtès <ludo@gnu.org>
To: guix-devel@gnu.org
Subject: Re: GNU Shepherd 0.10.3 released
Tomas Volf <~@wolfsden.cz> skribis:
> On 2024-01-07 15:08:59 +0100, Ludovic Courtès wrote:
[...]
>> ** Do not accidentally wait for Linux kernel thread completion
>> (<https://issues.guix.gnu.org/67132>)
>>
>> In cases a PID file contained a bogus PID or one that’s only valid in a
>> separate PID namespace, shepherd could end up waiting for the termination
>> of
>> what’s actually a Linux kernel thread, such as PID 2 (“kthreadd”). This
>> situation is now recognized and avoided.
>
> This is great, I will not have to remember to run `modprobe -r mt7921e' before
> each shutdown anymore. I hope. Looking forward to getting it in the Guix :)
D’oh, why did you have to do that? How did Shepherd end up with “wrong” PID?
I hope this release fixes it!
Ludo’.
Date: Wed, 10 Jan 2024 17:38:17 +0100
From: Tomas Volf <~@wolfsden.cz>
To: Ludovic Courtès <ludo@gnu.org>
Cc: guix-devel@gnu.org
Subject: Re: GNU Shepherd 0.10.3 released
On 2024-01-10 00:34:48 +0100, Ludovic Courtès wrote:
> Tomas Volf <~@wolfsden.cz> skribis:
>
> > On 2024-01-07 15:08:59 +0100, Ludovic Courtès wrote:
>
> [...]
>
> >> ** Do not accidentally wait for Linux kernel thread completion
> >> (<https://issues.guix.gnu.org/67132>)
> >>
> >> In cases a PID file contained a bogus PID or one that’s only valid in a
> >> separate PID namespace, shepherd could end up waiting for the
> >> termination of
> >> what’s actually a Linux kernel thread, such as PID 2 (“kthreadd”). This
> >> situation is now recognized and avoided.
> >
> > This is great, I will not have to remember to run `modprobe -r mt7921e'
> > before
> > each shutdown anymore. I hope. Looking forward to getting it in the Guix
> > :)
>
> D’oh, why did you have to do that?
Otherwise the shepherd would be stuck on shutdown waiting for process named
[mt76-tx phy0]
to terminate with messages along the lines of:
shepherd[1]: waiting for process termination (processes left: (1 678))
It is a kernel thread as far as I can tell (based on
https://stackoverflow.com/a/12231039):
$ cd /proc/678
$ cat cmdline
$ readlink exe; echo $?
1
Removing the module mt7921e stops the thread, so shepherd does not wait for it.
> How did Shepherd end up with “wrong” PID?
That I do not know. It is visible in `ps' output, so I assume shepherd picked
it up on its own somehow.
>
> I hope this release fixes it!
As far as I can tell, the 0.10.3 was already added into guix:
$ ps 1 | cat
PID TTY STAT TIME COMMAND
1 ? Sl 0:01
/gnu/store/bhynhk0c6ssq3fqqc59fvhxjzwywsjbb-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/06mz0yjkghi7r6d7lmhvv7gryipljhdd-shepherd-0.10.3/bin/shepherd
+--config /gnu/store/klkqq2y65k141rlipq4ls0w2rlhds12h-shepherd.conf
So I have to say it sadly did not resolve this issue. I am unsure why though.
I am not familiar with Shepherd's code base, but quick look at the git log
suggested that procedure (@@ (shepherd service) pseudo-process?) is the relevant
one. When I try it from a REPL, it returns #t.
$ guix shell guile shepherd guile-fibers -- guile
GNU Guile 3.0.9
Copyright (C) 1995-2023 Free Software Foundation, Inc.
Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.
Enter `,help' for help.
scheme@(guile-user)> ,use (shepherd service)
scheme@(guile-user)> ((@@ (shepherd service) pseudo-process?) 688)
$1 = #t
So it *should* work? However the issue is caused by non-free WiFi driver on a
corrupted kernel, so I am not sure if it is even problem that needs to be
solved... I would (obviously) like to see it resolved, but I probably cannot
even bug report it, since it requires non-free hardware and software to
reproduce.
Tomas
PS: It is interesting that `guix shell guile shepherd' is not enough, the
guile-fibers have to be explicitly specified as well. Is that expected?
Date: Wed, 10 Jan 2024 17:50:19 +0100
From: Tomas Volf <~@wolfsden.cz>
To: Ludovic Courtès <ludo@gnu.org>, guix-devel@gnu.org
Subject: Re: GNU Shepherd 0.10.3 released
PS:
On 2024-01-10 17:38:17 +0100, Tomas Volf wrote:
> scheme@(guile-user)> ((@@ (shepherd service) pseudo-process?) 688)
The pid is different than above, because this was after a reboot.
Tomas
Date: Thu, 11 Jan 2024 13:41:39 +0100
From: Ludovic Courtès <ludo@gnu.org>
To: guix-devel@gnu.org
Subject: Re: GNU Shepherd 0.10.3 released
Hello,
Tomas Volf <~@wolfsden.cz> skribis:
> Otherwise the shepherd would be stuck on shutdown waiting for process named
>
> [mt76-tx phy0]
>
> to terminate with messages along the lines of:
>
> shepherd[1]: waiting for process termination (processes left: (1 678))
>
> It is a kernel thread as far as I can tell (based on
> https://stackoverflow.com/a/12231039):
>
> $ cd /proc/678
> $ cat cmdline
> $ readlink exe; echo $?
> 1
>
> Removing the module mt7921e stops the thread, so shepherd does not wait for
> it.
Ooooh.
Then I’m afraid this bug isn’t fixed yet because that code (“waiting for
process termination”) is currently in Guix, not in Shepherd.
However, ‘processes’, which is what is used here and which is defined in
(guix build syscalls), already checks for kernel threads, though it
does it differently than what I implemented in shepherd:
(define (kernel? pid)
"Return #t if PID designates a \"kernel thread\" rather than a normal
user-land process."
(let ((stat (call-with-input-file (format #f "/proc/~a/stat" pid)
(compose string-tokenize read-string))))
;; See proc.txt in Linux's documentation for the list of fields.
(match stat
((pid tcomm state ppid pgrp sid tty_nr tty_pgrp flags min_flt
cmin_flt maj_flt cmaj_flt utime stime cutime cstime
priority nice num_thread it_real_value start_time
vsize rss rsslim
(= string->number start_code) (= string->number end_code) _ ...)
;; Got this obscure trick from sysvinit's 'killall5' program.
(and (zero? start_code) (zero? end_code))))))
It would be great if you could check whether this approach works for
you.
(I had completely forgotten about this code. Funny thing is this one
was inspired by sysvinit, whereas the one in Shepherd was inspired by
systemd. A sign of times!)
Ludo’.
Date: Thu, 11 Jan 2024 14:12:51 +0100
From: Tomas Volf <~@wolfsden.cz>
To: Ludovic Courtès <ludo@gnu.org>
Cc: guix-devel@gnu.org
Subject: Re: GNU Shepherd 0.10.3 released
On 2024-01-11 13:41:39 +0100, Ludovic Courtès wrote:
> Tomas Volf <~@wolfsden.cz> skribis:
>
> > Otherwise the shepherd would be stuck on shutdown waiting for process named
> >
> > [mt76-tx phy0]
> >
> > to terminate with messages along the lines of:
> >
> > shepherd[1]: waiting for process termination (processes left: (1 678))
> >
> > It is a kernel thread as far as I can tell (based on
> > https://stackoverflow.com/a/12231039):
> >
> > $ cd /proc/678
> > $ cat cmdline
> > $ readlink exe; echo $?
> > 1
> >
> > Removing the module mt7921e stops the thread, so shepherd does not wait for
> > it.
>
> Ooooh.
>
> Then I’m afraid this bug isn’t fixed yet because that code (“waiting for
> process termination”) is currently in Guix, not in Shepherd.
>
> However, ‘processes’, which is what is used here and which is defined in
> (guix build syscalls), already checks for kernel threads, though it
> does it differently than what I implemented in shepherd:
>
> (define (kernel? pid)
> "Return #t if PID designates a \"kernel thread\" rather than a normal
> user-land process."
> (let ((stat (call-with-input-file (format #f "/proc/~a/stat" pid)
> (compose string-tokenize read-string))))
> ;; See proc.txt in Linux's documentation for the list of fields.
> (match stat
> ((pid tcomm state ppid pgrp sid tty_nr tty_pgrp flags min_flt
> cmin_flt maj_flt cmaj_flt utime stime cutime cstime
> priority nice num_thread it_real_value start_time
> vsize rss rsslim
> (= string->number start_code) (= string->number end_code) _ ...)
> ;; Got this obscure trick from sysvinit's 'killall5' program.
> (and (zero? start_code) (zero? end_code))))))
>
> It would be great if you could check whether this approach works for
> you.
Ah, that code indeed returns #f for the pid in question:
scheme@(guix-user)> ((@@ (guix build syscalls) kernel?) 688)
$1 = #f
The stat file:
$ cat /proc/688/stat
688 (mt76-tx phy0) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 -2 0 1 0 964 0 0
18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 5 1 1 0 0 0 0 0 0 0 0
0 0 0
So the start_code is not zero (I would guess it is -1). I have no idea what
that means though.
Tomas
Date: Mon, 29 Jan 2024 17:31:33 +0100
From: Ludovic Courtès <ludo@gnu.org>
To: guix-devel@gnu.org
Subject: Re: GNU Shepherd 0.10.3 released
Hi,
Tomas Volf <~@wolfsden.cz> skribis:
> Ah, that code indeed returns #f for the pid in question:
>
> scheme@(guix-user)> ((@@ (guix build syscalls) kernel?) 688)
> $1 = #f
>
> The stat file:
>
> $ cat /proc/688/stat
> 688 (mt76-tx phy0) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 -2 0 1 0 964 0 0
> 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 5 1 1 0 0 0 0 0 0 0
> 0 0 0 0
>
> So the start_code is not zero (I would guess it is -1). I have no idea what
> that means though.
What about this method (from shepherd)?
--8<---------------cut here---------------start------------->8---
(define (linux-process-flags pid)
"Return the process flags of @var{pid} (or'd @code{PF_} constants), assuming
the Linux /proc file system is mounted; raise a @code{system-error} exception
otherwise."
(call-with-input-file (string-append "/proc/" (number->string pid)
"/stat")
(lambda (port)
(define line
(get-string-all port))
;; Parse like systemd's 'is_kernel_thread' function.
(let ((offset (string-index line #\)))) ;offset past 'tcomm' field
(match (and offset
(string-tokenize (string-drop line (+ offset 1))))
((state ppid pgrp sid tty-nr tty-pgrp flags . _)
(or (string->number flags) 0))
(_
0))))))
;; Per-process flag defined in <linux/sched.h>.
(define PF_KTHREAD #x00200000) ;I am a kernel thread
(define (linux-kernel-thread? pid)
"Return true if @var{pid} is a Linux kernel thread."
(= PF_KTHREAD (logand (linux-process-flags pid) PF_KTHREAD)))
--8<---------------cut here---------------end--------------->8---
If it works better, we can use that in syscalls.scm as well.
Ludo’.
PS: Having an entry in bug-guix would ensure we don’t lose track of this.
--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
signature.asc
Description: PGP signature
- bug#68800: Guix waits for termination of a kernel thread,
Tomas Volf <=