guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Latest guile-daemon changes and bewilderment


From: Caleb Ristvedt
Subject: Re: Latest guile-daemon changes and bewilderment
Date: Fri, 28 Jul 2017 06:19:33 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)

> Is there a line above or below the backtrace mentioning the uncaught
> exception?  Could you ‘strace -f’ the daemon process?

No, no line above or below. Very strange.

> BTW, I see the code uses ‘clone’ directly.  It would be safer to use
> ‘call-with-container’, which already handles bind mounts, non-local
> exits, and so on.  Would it be an option?

There are a couple of issues with using call-with-container. In
decreasing order of perceived difficulty to solve:

1. Copying the output from the build to the real store. How would that
work? It wouldn't work with call-with-container - the container can't
access the outside world, and the chroot directory is automatically
deleted once the scope of the container is left. On top of that, there's
no guarantee that anything inside the chroot directory is visible
outside of the container namespace, since it's on a separate filesystem
mounted in that separate namespace.

The primary solution I have in mind for this is to add a separate
procedure argument to call-with-container (maybe "use-output"?) to be
run after the main thunk has finished running but before the temporary
directory has been deleted and which takes the chroot directory as its
only argument. Also to change the mounting of a tmpfs on the chroot dir
to happen before the clone, and explicitly unmount it after running the
aforementioned use-output thunk.

2. Some MS_PRIVATE stuff. Cleaning up will fail when whatever filesystem
the chroot directory is on is mounted as MS_SHARED, which according to a
comment in build.cc is what systemd mounts stuff as. (I'm curious why we
don't seem to have run into this issue yet, perhaps I have misunderstood
something)

My solution here is to change the propagation type of the chroot
directory inside the namespace to MS_PRIVATE. Anything mounted under it
will inherit that propagation type and not appear mounted in the
original namespace, and unmounting the chroot directory should work
fine.

3. Miscellaneous order changes. I don't know enough about the inner
workings of linux to be completely sure to what extent the order of some
operations is significant.

4. Minor differences. For example, the C++ daemon doesn't currently
bind-mount /dev/ptmx or /dev/fuse or /dev/console. I don't think those
would be a problem, but I dunno.

... and then I paused writing this for 2 days while I checked whether my
in-theory solutions would work in practice. And it seems like they
actually do (see recent branch update). Mostly. I need to figure out why
it fails when a new user namespace is created - for some reason
pivot-root fails when new-root was mounted from a different user
namespace.

But on the bright side, it somehow solved the bug I described earlier. I
still haven't managed to make it all the way to building hello (a
libunistring test hangs), but it's getting much farther.


> Regarding scanning, (guix build grafts) contains a special-purpose
> reference scanner that Mark carefully optimized; it might be worth
> looking at.

Huh. I did not know that. In hindsight, it seems obvious that there must
have been something like that for grafts to function. I'll look into
that.

>> You'll notice that among the environment variables is
>> GUILE_AUTO_COMPILE=0. This is actually something I added myself because
>> for some reason the bootstrap guile wasn't honoring the
>> --no-auto-compile flag, but does honor the environment
>> variable. Strange.
>
> Yeah, we shouldn’t add this environment variable to derivations because
> that would influence everything that goes in there.

Aye, it was mostly just for debugging. That problem has also disappeared
with the switch to using call-with-container. It's nice and all, but I
do wish I knew what caused it.

> Could it be that ‘argv’ lacked the executable name as argv[0], and thus
> the argument list was shifted to the left?

That's what I thought too, but the same behavior happened when adding
the executable name explicitly as the first argument both to system* and
execl.

> Let’s maybe try to further debug this interactively on IRC.

... and then I promptly fell asleep and spent the next few days
(nights?) tinkering. Oops.

Oh well, there will be no shortage of debugging to be done. It seems
like that's going to be the pattern for the near future - add an
isolation mechanism or something that conforms better to what is
currently done, try to build stuff, get a bit farther, look for more
stuff to fix.

- reepca



reply via email to

[Prev in Thread] Current Thread [Next in Thread]