guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A Critique of Shepherd Design


From: raid5atemyhomework
Subject: Re: A Critique of Shepherd Design
Date: Wed, 24 Mar 2021 14:29:47 +0000



Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, March 23, 2021 1:02 AM, Ludovic Courtès <ludo@gnu.org> wrote:

> Hi,
>
> raid5atemyhomework raid5atemyhomework@protonmail.com skribis:
>
> > I'm not sure you can afford to keep it simple.
>
> It has limitations but it does the job—just like many other init systems
> did the job before the advent of systemd.
>
> > Consider: https://issues.guix.gnu.org/47253
> > In that issue, the `networking` provision comes up potentially before the 
> > network is, in fact, up. This means that other daemons that require 
> > `networking` could potentially be started before the network connection is 
> > up.
>
> The ‘networking’ service is just here to say that some network interface
> is up or will be up. It’s admittedly vague and weak, but it’s enough
> for most daemons; they just need to be able to bind and listen to some
> port.
>
> > One example of such a daemon is `transmission-daemon`. This daemon will 
> > bind itself to port 9091 so you can control it. Unfortunately, if it gets 
> > started while network is down, it will be unable to bind to 9091 (so you 
> > can't control it) but still keep running. On my system that means that on 
> > reboot I have to manually `sudo herd restart trannsmission-daemon`.
>
> Could you report a bug for this one? I don’t see why it’d fail to bind.

Let me see if I can still get to the old syslogs where `transmission-daemon` 
claims it cannot bind, but it still keeps going anyway.  I've pulled my 
`transmission-daemon` directly into my `configuration.scm` so I can edit its 
`requirement` to include a custom `networking-online` service that does 
`nm-online`.


> > In another example, I have a custom daemon that I have set up to use
> > the Tor proxy over 127.0.0.1:9050. It requires both `networking` and
> > `tor`. When it starts after `networking` comes up but before the
> > actual network does, it dies because it can't access the proxy at
> > 127.0.0.1:9050 (apparently NetworkManager handles loopback as well).
>
> Loopback is handled by the ‘loopback’ shepherd service, which is
> provided via ‘%base-services’. Perhaps you just need to have your
> service depend on it?
>
> > Switching to a concurrent design for Shepherd --- any concurrent design --- 
> > is probably best done sooner rather than later, because it risks strongly 
> > affecting customized `configuration.scm`s like mine that have almost a half 
> > dozen custom Shepherd daemons.
>
> I suspect the main issue here is undeclared dependencies of some of the
> Shepherd services you mention.
>
> I like the “sooner rather than later” bit, though: it sounds like you’re
> about to send patches or announce some sponsorship program?… :-)

Not particularly, but I *have* looked over Shepherd and here are some notes.  
Maybe I'll even send patches, but the reaction to the ZFS patches makes me just 
shrug; I'd need to devote more time than what I spent on ZFS, and the ZFS 
patches aren't getting into Guix, so why bother.  If I get annoyed enough I'll 
just patch my own system and call it a day.  My own system has `nm-online` and 
I don't expect to not have networking, so the `nm-online` delay is unlikely to 
be an issue, and I don't intend to mess with the `configuration.scm` anymore 
because it's just too brittle, I'll just host VMs instead and use a SystemD 
fully-free OS like Trisquel, I only need Guix for the ZFS (which Trisquel does 
not have, for some reason).

Anyway...

It seems to me that a good design to use would be that each `<service>` should 
have its own process.  Then the big loop in `modules/shepherd.scm` will then be 
"just" a completely event-based command dispatcher that forwards commands to 
the correct per-`<service>` process.

Now, one feature Shepherd has is that it can be set with `--socket-file=-`, 
which, if specified, causes GNU Shepherd to enable GNU readline and use the 
readline library to read `-herd`-like (?) commands.

Unfortunately the `readline` interface is inherently blocking instead of 
event-based.  The C interface of GNU readline has an alternative interface that 
is compatible with event-based (and I've used this in the past to create a toy 
chat program that would display messages from other users while you were typing 
your own) but it looks like this interface is not exposed.  I checked 
`readline-port` as well, but the code I could find online suggests that this 
just uses the blocking `readline` interface, and would (?) be incompatible with 
the Guile `select`. (side note: the `SIGCHLD` problem could probably be fixed 
if Guile had `pselect` exposed, but apparently it's not exposed and I'm not 
prepared to dedicate even more time fiddling with the lack of syscalls in 
Guile.  Maybe a signal-via-pipe technique would work as an alternative though 
since that supposedly works on every UNIX-like --- but presumably the Shepherd 
authors already knew that, so maybe there is good reason not to use it).

Since `readline` is blocking, one possibility would be to *fork off* the 
process that does the stdin communication.  Then it can continue to use the 
blocking `readline`.  It just needs to invoke `herd stop root`when it gets EOF 
(note: need to check how commands are sent, for now it looks to me (not 
verified, just impression) that commands are sent as plain text).

Since the goal is to make the mainloop into a very parallel dispatcher, we need 
some way to somehow send commands in stdin-mode.  We can take advantage of the 
little-known fact that UNIX domain sockets can pass file descriptors across 
processes, with the file descriptor number being remapped to the receiving 
process via magic kernel stuff.  So, we create a `socketpair` (NOTE: CHECK IF 
GUILE HAS `socketpair`!!!  Also review how the fd-passing gets done, maybe 
Guile doesn't expose the needed syscalls either, sigh), then each time the 
`readline`-process receives a command, it creates a new `socketpair`, sends 
over one end to the mainloop, sends the command via the other end, then  waits 
for a response and prints it.  This should make it very near to in experience 
as the blocking Shepherd.

If the above pattern is workable, ***we can use the same pattern for 
`--socket-file=/file/path`***.  We ***always*** fork off *some* process to 
handle `--socket-file`, whether `stdin`-mode or not.  In 
`--socket-file=/file/path` mode, the `socket-file` process binds the socket 
file, `listen`s on it on a loop, and then just passes the opened socket over to 
the mainloop.

We also need this pattern as a general feature of the mainloop.  An action on 
one `<service>` can trigger actions on another service (in theory; my cursory 
review of the Guix code suggests that current services only trigger actions on 
themselves (`ganeti`, `guix-daemon`; but this is not a full review and there 
may be other services in Guix that do other stuff)); note in particular that 
`start` causes every `requirement` to be started anyway.  So I think we need a 
general mechanism for the mainloop to receive commands from remote processes; 
we might as well use the same mechanism for both the Shepherd<->user 
interaction and the Shepherd<->service interaction.

So for clarity of exposition, let me then list down the processes created by 
Shepherd:

* The `mainloop` process which handles the main massively-parallel event loop.  
This is PID 1.
* The `socket-file` process which either gets commands from `stdin`, or via the 
`socket-file`.
* Multiple per-`<service>` process.

Now, the mainloop has to parse the command in order to learn which 
per-`<service>` process the command should get forwarded to.  And as mentioned, 
each per-`<service>` process also needs a command-sending socket to go to the 
mainloop.  So for each per-`<service>` process:

* The mainloop maintains a mainloop->service socket to send commands over.
* The mainloop maintains a service->mainloop socket it receives command 
socketfds over.

The mainloop process also special-cases the `root` service --- it handles 
commands to those directly (BUT this probably needs a lot of fiddling with the 
data structures involved --- `herd status root` can now occur while a `herd 
start <whatever>` is still running, so we need status reporting for "being 
started up" and "being stopped" as well --- `/` for "starting up", `\` for 
"stopping"?).

Now, the `action` and other procedures need to be special-cased.  We need a 
global variable indicating:

* The current process is `root` ie the mainloop process.
* The current process is some `<service>`.

Every `action` is different depending on this variable (`%process-identity`?).

* IF the action is going to the same `<service>` (including `root`):
  * Just tail-call into the action code.
* If the current process is `root` and a non-`root` action is being performed:
  * Check if the per-`<service>` process has been started, and start if needed.
  * Schedule the command to be sent via the event loop.
  * Keep operating the mainloop until the command has completed.
    * Use an event-loop stepper function (i.e. just calls `select` and 
dispatches appropriately, then returns, so caller has to implement the loop).
    * Initially set a mutable variable to `#f.
    * Schedule the command with a callback that sets the above variable to `#t`.
    * Call the event-loop stepper function until the mutable variable is true.
    * This implements the current semantics where a `.conf` file running an 
action will block until the action finishes, while still allowing commands to 
be sent to the Shepherd daemon.
* If the current process is not `root` and the action to be performed is of a 
different process:
  * Create a socketpair to send the command over to the mainloop and send it 
(blocking).
  * Send the command to the mainloop (blocking).
  * Wait for completion (blocking).

Each per-`<service>` process has a simple blocking loop: It waits for commands 
from the mainloop process, executes those commands, then loops again.

In particular, it means that any `start` actions in the `.conf` file will block 
(which is the expected behavior of legacy `.conf` files, but even so, the 
Shepherd will be able to handle commands even while it is still loading `.conf`.


=== Concurrency is Like Regexps, Now You Have Two Problems ===

But take not that this means that it is possible to deadlock services in this 
way:

* There are two services `A` and `B`.
  * `A` has an action `AtoB` which invokes an action `BtoA` on service `B`.
  * The `B` `BtoA` action invokes an action `Anoop` on service `A`.
    * In the above structure, because the `A` per-`<service>` process is 
waiting on the `BtoA` action on service `B`, it cannot handle the `Anoop` 
action!
    * In the current single-threaded Shepherd, such a thing is actually 
possible and would not cause a deadlock if the `A` `Anoop` terminates normally.

HOWEVER, this is probably a very unusual setup!  It may be tolerable to simply 
require that a service that performs actions on another service should have an 
acyclic relationship (it is Turing-complete --- consider the case where the `B` 
`BtoA` action reinvokes the `A` `AtoB` with the same arguments causing it to 
invoke `B` `BtoA` action again ad infinitum --- whereas an acyclic relationship 
requirement would provably terminate; it may even be possible, by passing a 
"stack" (i.e. list of service names that caused a particular action of a 
particular service to be invoked) when passing commands across, with the `root` 
service always passing an empty stack, and each `<service>`-to-`<service>` 
action prepends the name of the calling service, so the mainloop can detect a 
dynamic cycle and just fail the command without forwarding).

IF this restriction is too onerous, then it may be possible to use an event 
loop in the per-`<service>` process as well, and use the same 
wait-for-events-while-blocking logic as on the mainloop --- the code might even 
be reusable.  BUT I worry about this bit, as it could potentially mean that an 
action is invoked in the dynamic context (including fluids, sigh) of another 
on-going completely-unrelated action, which is BAAAAAD.  This is fine for the 
`root` service since the `root` service is Shepherd itself (? I think) and we 
can ensure that the Shepherd code does not depend on dynamic context.


Another is that in the current single-threaded Shepherd, any service's action 
can (re-)register a new `<service>`.  This is problematic since a 
per-`<service>` process will obviously not affect the mainloop process 
post-fork.

Again this is probably a very unusual setup.  While such a thing would be cute, 
I think it would be tolerable to simply require that only the mainloop process 
(which is what loads the `.conf` file) is the only one that is allowed to 
(re-)register `<service>`s.


=== Taking Advantage of Concurrency ===

Now that each `<service>` gets its own process, we can add a new 
`force-destroy-service` action on `root`.

    herd force-destroy-service root <some-service>

This forces the per-`<service>` process, and that of every dependent 
`<service>`, as well as all processes in the process trees of tho affected 
per-`<service>` processes, to be force-killed (IMPORTANT: check how to get the 
process tree, also look into "process groups", maybe that would work better, 
but does it exist on Hurd?).

This new command helps mitigate the issue where Shepherd `start` actions are 
Turing-complete and potentially contain infinite-loop bugs.  If this happens, 
the sysad can `herd force-destroy-service root <some-service>` the misbehaving 
service, reconfigure the system to correct the issue, and restart.


Another issue is that the current Guix startup is like this (paraphrased):

    (for-each (lambda (service) (start service)) '(guix-daemon #;...))

Now consider this in the context of the `nm-online` example above.  The reason 
why Guix+Shepherd cannot just ***always*** use `nm-online` like SystemD does is 
because `nm-online` can delay boot for an arbitrary number of seconds.  And 
since `start` is a known-blocking legacy interface, it should remain a 
known-blocking interface (back compatibility is a bitch).

This means that for the Guix startup, we should expose a new Shepherd API:

    (start-multiple '(guix-daemon #;...))

This basically does this:

* Set a mutable variable to the length of the input list.
* For each service, if the per-`<service>` process isn't started yet, start it.
* For each service, schedule to `start` the service; have the callback 
decrement the above variable (maybe also set another variable to a list of 
services that fail to start).
* In a loop:
  * If the above mutable variable is zero, exit.
  * Otherwise, call the mainloop stepper function, then loop again.

This allows multiple `start` to be executed at once (which could trigger a 
`start` again of their requirements, but if a service is already started then 
its `start` action should do nothing) in parallel.  There may be a thundering 
herd effect on the mainloop though because of hammering `start` on the 
requirements.  Hmm.  Concurrency is hard.

Then Guix also has to be modified to use `start-multiple` instead.

This further step ensures as well that infloop problems in custom service 
definitions do not delay startup --- startup is fully parallel (modulo 
thundering herds of daemons).


=== Overall ===

It seems to me that with the above redesign, there would be very little code 
left of the original Shepherd, making this more of a reimplementation than a 
patch or even a fork.


Thanks
raid5atemyhomework



reply via email to

[Prev in Thread] Current Thread [Next in Thread]