bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#35181: Hydra offloads often get stuck while exporting build requisit


From: Ludovic Courtès
Subject: bug#35181: Hydra offloads often get stuck while exporting build requisites
Date: Tue, 09 Apr 2019 12:54:20 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Hi Mark,

Mark H Weaver <address@hidden> skribis:

> Ludovic Courtès <address@hidden> writes:
>
>> Mark H Weaver <address@hidden> skribis:
>>
>>> The source checkout currently being transferred for build 3432472
>>> (/gnu/store/…-font-google-material-design-icons-3.0.1-checkout) is 176
>>> megabytes uncompressed, as measured by "du -s --si", which is not
>>> precisely same as NAR size, but hopefully close enough for a rough
>>> estimate.  As I write this, build 3432472 been stuck here for 24 hours
>>> 15 minutes.  Even if the average transfer rate were 4 kilobytes per
>>> second, it should have been done in half that time.
>>
>> This is weird, could it be that data transfers get stuck somehow?
>
> As far as I can tell, that's what seems to happen.
>
>> Did you try to check the status of the ‘nix-store’ and ‘guix offload’
>> processes on the head node?
>
> Here are the corresponding 'guix offload' processes:
>
> address@hidden:~$ ps auxwwf | head -1; ps auxwwf | egrep -B1 'off()load'

[...]

> root     14769  0.0  0.2 145668 10912 ?        SLsl Apr07   0:16  |       |   
> \_ /gnu/store/yihvhxv3xyyvl1m2cy1lnf1lyi9h76fk-guile-2.2.2/bin/guile 
> --no-auto-compile 
> /gnu/store/fkkjhida23k612naa9d4q6avqj5v3b28-guix-0.13.0-8.357ab93/bin/.guix-real
>  offload x86_64-linux 3600 1 72000

The problem is that this is an ancient Guix.  In the meantime,
offloading has seen relevant changes, in particular things like commit
ed7b44370f71126087eb953f36aad8dc4c44109f which address stability issues
with Guile-SSH (ssh dist node) that was previously used.

I think we should upgrade Guix on hydra.gnu.org otherwise we’re likely
to end up chasing old bugs.

> The 'nix-store' processes seem to be stuck sleeping in 'read', if I'm
> interpreting the 'strace' output correctly:
>
> address@hidden:~# strace -p 8983
> Process 8983 attached - interrupt to quit
> read(3, ^C <unfinished ...>
> Process 8983 detached
> address@hidden:~# strace -p 14767
> Process 14767 attached - interrupt to quit
> read(3, ^C <unfinished ...>
> Process 14767 detached
>
>
> "netstat --inet --program" shows that the SSH connections are still
> open:
>
> address@hidden:~# netstat --inet --program | grep 'hydra\.net\.in\.tum\.'
> tcp        0      0 20121227-hydra.gn:53216 hydra.net.in.tum.de:ssh 
> ESTABLISHED 14769/guile     
> tcp        0      0 20121227-hydra.gn:52434 hydra.net.in.tum.de:ssh 
> ESTABLISHED 8985/guile      
> tcp        0      0 20121227-hydra.gnu.:www hydra.net.in.tum.:52104 TIME_WAIT 
>   -               
> tcp        0      0 20121227-hydra.gnu.:www hydra.net.in.tum.:52103 TIME_WAIT 
>   -               

This could be the kind of issue that we had with (ssh dist node).  It’s
hard to tell.

> I could easily believe that this problem is specific to
> hydra.gnunet.org, but even if that's the case, it would be good if
> offloading would reliably time out before days have passed.

That’s the case with commit a708de151c255712071e42e5c8284756b51768cd,
but again, the Guix installation on hydra may predate that.  :-/

Thanks,
Ludo’.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]