[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1'
From: |
Ludovic Courtès |
Subject: |
bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out) |
Date: |
Fri, 10 Dec 2021 11:17:30 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) |
Hi Maxim!
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
[...]
> Oh! Marius found this interesting issue [0] that they shared on IRC
> today; I wonder if it may be related. sd-bus apparently pipeline things
> aggressively, which is not always handled well by other D-bus
> participants.
>
> [0] https://github.com/systemd/systemd/issues/16610
Interesting.
>> Anyway, the “Hello” message is sent to the system bus asynchronously in
>> ‘sd-bus.c’:
[...]
>> static int hello_callback(sd_bus_message *reply, void *userdata,
>> sd_bus_error *error) {
>>
>> [...]
>>
>> fail:
>> /* When Hello() failed, let's propagate this in two ways: first we
>> return the error immediately here,
>> * which is the propagated up towards the event loop. Let's also
>> invalidate the connection, so that
>> * if the user then calls back into us again we won't wait any
>> longer. */
>>
>> bus_set_state(bus, BUS_CLOSING);
>> return r;
>> }
>>
>>
>> It’s not clear from that whether the authors intended for the thing to
>> keep going in case of failure. In our case it’s not helpful.
>
> If we picture this in the systemd use case, I believe sd-bus must be
> used as *the* d-bus daemon, perhaps? So it should never die, and expect
> users to call back into it to retry things? In our case, it acts as a
> D-Bus client, not a server (IIUC), so perhaps its behavior is not tuned
> right for our use case.
I think systemd-logind is a separate process, just like elogind, that
talks over D-Bus; I don’t think there’s any difference here. It just
seems that this corner case, where we don’t get a reply for ‘Hello’, is
not correctly handled.
> Interesting that you mention this; that's what I worked on yesterday
> (see attached patches). I managed to get elogind panicking the kernel
> during a guix system reconfigure, which corrupted GRUB, so had to chroot
> and reconfigure from there [1]. Not sure what happened, but it seems
> that killing and restarting elogind is susceptible to cause hard locks.
Ouch! This is weird. Note that “everything” depends on elogind, as can
be seen with ‘guix system shepherd-graph’. So “herd stop elogind” is
not something you should try at home.
Incidentally, can you reproduce the bug in a VM? That would be much
nicer.
>> We know that elogind is started after dbus-daemon has written its PID
>> file (there’s a Shepherd service dependency). Is there a possibility
>> that dbus-daemon writes its PID file before it has set
>> ‘new_connection_function’?
>
> Are PID files conventionally agreed to be synchronization primitives?
Yes. Daemons are supposed to write their PID file once they’re ready to
do their job.
> I'm sorry for my lack of deep analysis here, I've spent most of my
> evening attempting to fix my system just to boot ^^'. I've at least
> managed to collect the following verbose D-Bus log (54 MiB uncompressed)
> which hopefully can shed some light onto how this failure came to be.
Excerpt:
--8<---------------cut here---------------start------------->8---
366: 0x7f28f396e740: 1639108961.938358
[dbus-sysdeps-util-unix.c(237):_dbus_write_pid_to_file_and_pipe] writing pid
file /var/run/dbus/pid
366: 0x7f28f396e740: 1639108961.938438
[dbus-sysdeps-util-unix.c(291):_dbus_write_pid_to_file_and_pipe] No pid pipe to
write to
366: 0x7f28f396e740: 1639108961.938474
[dbus-userdb.c(177):_dbus_user_database_lookup] Using cache for UID 986
information
366: 0x7f28f396e740: 1639108961.938665
[dbus-sysdeps-unix.c(3514):_dbus_socketpair] full-duplex pipe 6 <-> 7
366: 0x7f28f396e740: 1639108961.938700 [main.c(719):main] We are on D-Bus...
366: 0x7f28f396e740: 1639108961.938725 [dbus-mainloop.c(884):_dbus_loop_run]
Running main loop, depth 0 -> 1
366: 0x7f28f396e740: 1639108962.566557
[dbus-server-socket.c(182):socket_handle_watch] Handling client connection,
flags 0x1
366: 0x7f28f396e740: 1639108962.566623 [dbus-sysdeps-unix.c(2344):_dbus_accept]
client fd 8 accepted
366: 0x7f28f396e740: 1639108962.566665
[dbus-server-socket.c(94):handle_new_client_fd_and_unlock] Creating new client
connection with fd 8
366: 0x7f28f396e740: 1639108962.566779
[dbus-connection.c(1360):_dbus_connection_new_for_transport] LOCK
366: 0x7f28f396e740: 1639108962.566824
[dbus-transport-socket.c(180):check_read_watch] fd = 8
366: 0x7f28f396e740: 1639108962.566869
[dbus-transport-socket.c(227):check_read_watch] setting read watch enabled = 1
[...]
366: 0x7f28f396e740: 1639108962.568765
[dbus-transport-socket.c(974):socket_handle_watch] handling read watch
0x23c2b00 flags = 9
366: 0x7f28f396e740: 1639108962.568807
[dbus-transport-socket.c(348):exchange_credentials] exchange_credentials:
do_reading = 1, do_writing = 0
366: 0x7f28f396e740: 1639108962.568862
[dbus-transport-socket.c(383):exchange_credentials] Failed to read credentials
Failed to read credentials byte (zero-length read)
366: 0x7f28f396e740: 1639108962.568904
[dbus-transport.c(510):_dbus_transport_disconnect] start
366: 0x7f28f396e740: 1639108962.568945
[dbus-transport-socket.c(1041):socket_disconnect]
366: 0x7f28f396e740: 1639108962.568986
[dbus-transport-socket.c(76):free_watches] start
366: 0x7f28f396e740: 1639108962.569029
[dbus-watch.c(423):_dbus_watch_list_remove_watch] Removing watch on fd 8
--8<---------------cut here---------------end--------------->8---
We see this sequence:
1. Writes PID file.
2. Accepts incoming connection.
3. “Failed to read credentials”.
4. Close connection.
So we need to understand #3. Perhaps going back to strace logs could
help this time…
Thanks,
Ludo’.
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Maxim Cournoyer, 2021/12/07
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Maxim Cournoyer, 2021/12/07
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Ludovic Courtès, 2021/12/08
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Maxim Cournoyer, 2021/12/09
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Ludovic Courtès, 2021/12/09
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Maxim Cournoyer, 2021/12/10
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out),
Ludovic Courtès <=
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Josselin Poiret, 2021/12/16
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Timothy Sample, 2021/12/21
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Leo Famulari, 2021/12/21
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Timothy Sample, 2021/12/21
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Maxime Devos, 2021/12/21
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Leo Famulari, 2021/12/21
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Maxime Devos, 2021/12/21
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Leo Famulari, 2021/12/21
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Leo Famulari, 2021/12/21
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Leo Famulari, 2021/12/21