[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#71238: Installer image consistently fails to run system init due to
From: |
adanskana |
Subject: |
bug#71238: Installer image consistently fails to run system init due to TLS error |
Date: |
Tue, 28 May 2024 05:36:09 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.0 |
Hi Richard
On 5/28/24 4:44 AM, Richard Sent <richard@freakingpenguin.com> wrote:
Richard Sent <richard@freakingpenguin.com> writes:
> What the heck is going on here? Those two images are wildly different
> and are downloading wildly different sets of substitutes.
Bad news. I connected my device to a different network with just an
ordinary consumer router and the installation succeeded (using the guix
00384aed media). Ordinary my devices are behind a opnsense router with a
/very/ lightly-customized firewall. To me, this means there are three
possibilities, none of which is particularly comforting:
1. There was a transient network issue for ~3 hours when I attempted to
install Guix ~4 times using different installation media that caused a
specific TLS handshake to fail.
2. A specific TLS handshake Guix undertakes during the installation
process fails to pass one of the built-in firewall rules shipped with
opnsense.
3. Some other odd aspect of my network messes things up for a specific
TLS handshake.
My money is on 2 given how this is a seemingly common issue on
enterprise networks [1] and the rules I have added seem irrelevant. (I'd
rather not talk openly about my firewall rules in an archived public
forum, but can discuss off-list). However, there is another comment in
that thread that says IT didn't notice any firewall blocking.
>> Sometimes, usually when I'm on an enterprise network like my
>> university's of library's wifi, the `guix substitute` process dies
>> with a "TLS error in procedure 'write_to_session_record_port': Error
>> in the push function" error message. My connection is rock-solid
>> otherwise, and sometimes it doesn't happen at all.
I was actually going to reopen this issue, as I'm still encountering this bug in the exact same scenarios. Nothing has changed at all.
> I get the same error on guix pull almost always when I am on my
> enterprise network. Re-running guix pull a second time also almost
> always then runs fine. I checked with our IT: nothing suspicious on
> the network, i.e. no firewall blocking.
Running Guix pull to work around the problem is great...... unless
you're trying to install Guix via the guided installer! :) In my case it
also wasn't guix pull that was failing.
I want to emphasize that the error occured in the same phase of the
installer every time, it was not the first handshake, no other machine
has ever had this issue, and the installer was (3/4 times) on a commit
that should include the fix described in [1].
I'm happy to assist with debugging this, although I'm not some TLS
networking genius so trying to solve it outright is probably beyond me.
I'd also LOVE to hear if other people using a largely stock opnsense or
other firewall software encountered this issue, particularly with the
installation media.
Same, I'm happy to assist. The test that Ludo' provided to try and reproduce the bug doesn't work as referenced in previous emails. Is there some way I can attatch a debugger to a guile process running `guix upgrade` or something like that?
At some point I'll attempt to gradually "de-enterprise" parts of my
network and see exactly when (if ever) the problem is resolved. Due to
the nature of the problem, reliably reproducing it in the future will be
a challenge.
CC'ing people involved in [1] because this is just so weird and I don't
want it to be consigned to the dustbins of history. I don't think we
heard anyone with the issue explicitly say the fix resolved or at least
mitigated the problem.
Thanks for CC'ing me. Yes, the problem was never resolved. For someone just upgrading their system, it's annoying, but can be mitigated pretty easily. For someone trying to install Guix, on the other hand, this is a intensely annoying problem. After my exams are finished in a couple weeks I want to try and fix this problem and also upgrade GRUB to fix issues with it recognising ext4 partitons with certain features enabled properly.
[1]: https://lists.gnu.org/archive/html/guix-devel/2024-03/msg00150.html
Anyway, please let me know how I can help. If someone could help me attaching some sort of
debugger, I can reproduce the error fairly easily on my uni's wifi if I do a `guix gc -d 2w
&& guix upgrade && sudo guix system reconfigure config.scm`. The sheer number
of substitutes downloaded seems to be enough for it to happen at least once.
Warmly,
Ada