bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#65787: time-machine is doing too much network requests


From: Simon Tournier
Subject: bug#65787: time-machine is doing too much network requests
Date: Wed, 06 Sep 2023 18:26:18 +0200

Hi,

Well, I am in a quest to make Guix more robust for the worst-case
scenario: Savannah is unreachable (as well as other servers).  For
context, it matters when using Guix for reproducing scientific
production.  An example of such worst-case, see [1].  Well, this quote:

        The first annoyance is that guix time-machine needs an access to the
        server git.savannah.gnu.org, although the Git repository is already
        cloned and already contains the required commit. 

is almost tackled by #65352; at least tracked. :-)

Investigating that, I am noticed that the current design is suboptimal,
IMHO.  I am reporting here and I hope to improve the situation by
reducing the number of network requests.

It matters in worst-case scenario of scientific production.  And it also
matters for people with poor or unstable network link.

Sorry if the report is hard to follow, I did my best for being clear.

To keep the discussion simple, I only consider the Git reference
specification ’branch’ and ’tag-or-commit’.  These Git reference
specification that various internal procedures are using is poorly
documented.  See the docstring of the procedure ’update-cached-checkout’
from (guix git) for an idea or the implementation of ’resolve-reference’
for the complete list.

Let consider only the Git reference specifications:

    (branch        . "string")
    (tag-or-commit . "string")

because that are what “guix time-machine” sets from the CLI or reads
from channels.scm files, IIUC.

The command “guix time-machine” starts to call ’cached-channel-instance’
passing as argument the procedure ’validate-guix-channel’.

This procedure ’cached-channel-instance’ starts by collecting all the
commits for each channel.  It maps the channels list using the procedure
’channel-full-commit’.  And that procedure calls
’update-cached-checkout. (1)

Then, ’cached-channel-instance’ calls ’validate-guix-channel’.  And this
procedure also calls ’update-cached-checkout’. (2)

Then, ’cached-channel-instance’ calls ’latest-channel-instances’ which
calls ’latest-channel-instance’.  And guess what, this procedure also
calls ’update-cached-checkout’. (3)

Ok, let give a look at ’update-cached-checkout’.

This procedure ’update-cached-checkout’ first looks if the Git reference
specification is already in the cached Git checkout using the procedure
’reference-available?’.

Consider that the Git reference specification is (branch . "some"), then
’reference-available?’ returns #false, so it triggers ’remote-fetch’
from Guile-Git.  If I read correctly, this generates network traffic and
Savannah needs to be reachable. (I)

Hum, I am not convinced someone is following.  Who knows? :-)

Let continue.  ’update-cached-checkout’ starts to check some commit
relation and friends.  There is an if-branch calling then
’switch-to-ref’ else ’resolve-reference’.  Under the hood, the procedure
’switch-to-ref’ is calling ’resolve-reference’.

For the case (branch . "some"), this ’resolve-reference’ procedure calls
’branch-lookup’ from Guile-Git.  If I read correctly, this generates
network traffic because of BRANCH-REMOTE and Savannah needs to be
reachable. (II)

Summary: ( (1) + (2) + (3) ) * ( (I) + (II) ) = 6.

If I am correct and if I am not missing something, the current design
requires 6 network traffic with Savannah and most of this traffic is
useless because it had already be done, somehow.

Well, (branch . "some") is the worst case, IMHO.  And the short commit
ID (tag-or-commit . "1234abc") or the tag (tag-or-commit . "v1.4.0")
too.

Applying my proposal from #65352 (DRAFT v2), it removes some useless
’remote-fetch’ calls.

Well, let me know if this diagnostic is correct.

To be continued…


Cheers,
simon


1: https://simon.tournier.info/posts/2023-06-23-hackathon-repro.html





reply via email to

[Prev in Thread] Current Thread [Next in Thread]