[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#41625: Sporadic guix-offload crashes due to EOF errors
From: |
Maxim Cournoyer |
Subject: |
bug#41625: Sporadic guix-offload crashes due to EOF errors |
Date: |
Mon, 24 May 2021 01:33:21 -0400 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) |
Hi,
Ludovic Courtès <ludo@gnu.org> writes:
> Hi,
>
> Marius Bakke <marius@gnu.org> skribis:
>
>> Marius Bakke <marius@gnu.org> writes:
>>
>>> 'guix offload test' passes without problems.
>>
>> Not so fast, running it in a loop reveals the crash.
>>
>> There is a trace file in /root/offloadtest.trace on Berlin with such an
>> occurence. It looks like a timeout is reached shortly before the EOF
>> error:
>>
>> 10139 poll([{fd=14, events=POLLIN|POLLOUT}], 1, 0) = 1 ([{fd=14,
>> revents=POLLOUT}])
>> 10139 poll([{fd=14, events=POLLIN}], 1, 15000) = 0 (Timeout)
>> 10139 write(2, "Backtrace:\n", 11) = 11
>>
>> This seems to be from a different node than the one reported previously,
>> as the preceding connect() was to this machine:
>>
>> 10139 connect(44, {sa_family=AF_INET, sin_port=htons(22),
>> sin_addr=inet_addr("141.80.167.186")}, 16) = -1 EINPROGRESS
>> (Operation now in progress)
>
> So it looks like ‘connect’ fails and eventually we get an EOF object.
> However, I don’t see where that EOF comes from because the return value
> of ‘connect!’ (the Guile-SSH procedure) is properly checked.
>
> Ludo’.
I got a slightly different backtrace that suggests making the connection
is not at fault, rather it occurs during the read-repl-response call:
--8<---------------cut here---------------start------------->8---
guix offload: testing 1 build machines defined in '/etc/guix/machines.scm'...
Backtrace:
8 (primitive-load "/home/maxim/.config/guix/current/bin/guix")
In guix/ui.scm:
2165:12 7 (run-guix-command _ . _)
In ice-9/boot-9.scm:
1752:10 6 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
1747:15 5 (with-exception-handler #<procedure 7f2caf885780 at
ice-9/boot-9.scm:1831:7 (exn)> _ # _ # …)
In guix/scripts/offload.scm:
704:21 4 (check-machine-availability _ _)
In srfi/srfi-1.scm:
586:17 3 (map1 (#<session maxim@overdrive1.guix.gnu.org:52522 (connected)
7f2cae396fc0>))
In guix/inferior.scm:
258:2 2 (port->inferior _ _)
240:2 1 (read-repl-response _ _)
In ice-9/boot-9.scm:
1685:16 0 (raise-exception _ #:continuable? _)
ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Throw to key `match-error' with args `("match" "no matching pattern" #<eof>)'.
--8<---------------cut here---------------end--------------->8---
I seem to get this more often than not with the overdrive1 offload
machine.
Maxim
- bug#41625: Sporadic guix-offload crashes due to EOF errors,
Maxim Cournoyer <=
- bug#41625: [PATCH] offload: Handle a possible EOF response from read-repl-response., Maxim Cournoyer, 2021/05/25
- bug#41625: [PATCH] offload: Handle a possible EOF response from read-repl-response., Ludovic Courtès, 2021/05/25
- bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response., Maxim Cournoyer, 2021/05/25
- bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response., Ludovic Courtès, 2021/05/26
- bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response., Maxim Cournoyer, 2021/05/27
- bug#41625: [PATCH v3] offload: Handle a possible EOF response from read-repl-response., Maxim Cournoyer, 2021/05/27
- bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response., Maxim Cournoyer, 2021/05/27
- bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response., Ludovic Courtès, 2021/05/29
- bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response., Marius Bakke, 2021/05/26
- bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response., Maxim Cournoyer, 2021/05/27