Re: [lmi] VCS caching

lmi
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] VCS caching

From:	Vadim Zeitlin
Subject:	Re: [lmi] VCS caching
Date:	Sun, 15 Apr 2018 13:32:34 +0200
On Sun, 15 Apr 2018 00:54:42 +0000 Greg Chicares <address@hidden> wrote:

GC> On 2018-04-14 23:05, Vadim Zeitlin wrote:
GC> > 
GC> > Why do we absolutely need to have this local
GC> > cache that we must be able to modify instead of just using it if it's
GC> > available?
GC> 
GC> In retrospect I see that I should have explained my goal clearly up
GC> front, but let me do that now. Efficiency is good, but robustness is
GC> absolutely indispensable.
GC> 
GC> If github is down, or if my network connectivity is broken, or if I'm
GC> behind some insidious corporate firewall that has added a new rule that
GC> breaks everything...I don't want such circumstances to prevent me from
GC> working.

 I do understand this motivation, I just don't understand how do these
changes help with it.

GC> Running 'install_msw.sh' on an msw machine with no copied-over cache
GC> will fail if any one of these servers is unreachable (or blocked by a
GC> firewall):
GC> 
GC>   savannah.nongnu.org
GC>   ftp.gnu.org
GC>   sourceforge.net
GC>   cygwin.com
GC>   apache.org
GC>   storage.googleapis.com
GC>   github.com
GC>   xmlsoft.org
GC> 
GC> The probability that they're all up simultaneously may be ninety
GC> percent,

 I am quite sure it's significantly higher than 90%, but it's definitely
still less than 100% and I'm all for supporting offline installation. It's
just that, again, I don't see any scenario in which the latest version of
the script helps more with it than the initial version did.

GC> But if I copy over the contents of a current cache directory, I want
GC> to know that 'install_msw.sh' is guaranteed to succeed.

 The very first version of the script already implemented this: if you
copied the contents of the cache directory and set wx_git_url to
/cache_for_lmi/vcs, it would be guaranteed to succeed provided the cached
version contained the currently used lmi commit.

 The only tiny thing to improve would, IMO, be to use /cache_for_lmi/vcs by
default, which could be trivially achieved by testing if this directory
exists (or running git-rev-parse in it) and using it as wx_git_url if it
does.

GC> And I don't want that guarantee to depend on my diligence in manually
GC> maintaining a mirror of everything

 But you don't really need to maintain a mirror constantly, it only needs
to be updated when wx_commit_sha changes, which doesn't happen that often.
Is it really better to rerun install_msw.sh when it does change and rely on
it implicitly updating the cache directory rather than just updating the
mirror manually using "for d in *.git; git -C $d fetch"?

GC> I want scripts and makefiles to take care of their own caching
GC> automatically, and 'install_msw.sh' to take care of all the other
GC> scripts and makefiles, so that if 'install_msw.sh' worked yesterday,
GC> it'll work today.

 Just to be clear, if install_msw.sh worked yesterday and wx_commit_sha
didn't change since then, it will definitely work today because it doesn't
need use any other repository than the existing one in /opt/lmi/local/vcs
at all. And if wx_commit_sha did change, then having the cache from
yesterday doesn't help any longer because it could lack the new commit (if
it doesn't lack it, then the repository in /opt/lmi/local/vcs has it too,
so it still works). So I don't see any advantage in having the second level
cache in /cache_for_lmi in this case (first level cache is the existing Git
repository under /opt/lmi/local). Am I missing something here, i.e. do you
see any scenario in which having the cache on the machine helps with
re-running install_msw.sh on the same machine?

 Or is the *only* motivation for having this cache is to be able to copy it
to another machine in order to install lmi there? This is the only possible
reason for having it I see, but if this is indeed the case, I'm surprised
it has never been mentioned explicitly (or did I just miss it?) and I also
think that this promotes a rather weird workflow, in which you need to run
install_msw.sh on one machine only in order to be able to then run it on
another one. Wouldn't it be better to provide some update_cache.sh that
could do what its name indicates and create the repositories under
/cache_for_lmi from the existing one under /opt/lmi/local only when, and
if, necessary?
        
 Of course, concerning installing lmi on multiple machines, I still think
that the thing that would make most sense would be to have a central
location with this cache, updated whenever the versions of any dependencies
change (i.e. not that often). Note also that there could be several
locations like this, i.e. this cache directory/tarball could be replicated
on both savannah.nongnu.org and github.com, replacing the requirement for
all the sites above to be online with the much weaker requirement for any
one of them being accessible.


 If you decide to keep the current version of the script only in order to
always have the repositories mirrors in /cache_for_lmi for the purposes of
copying them to another machine, I'd like, once again, to suggest making it
very clear in a comment in the text of the script that this is what the
goal is, because it's almost impossible to understand why does the script
do what it does without knowing that it's done to facilitate lmi
installation on _another_ machine and not the one where the script is
running: none of the existing comments even mentions copying the cache
directory to another machine. And without such comment any refactoring of
this script in a couple of years risks breaking its purpose completely.

 And I'd also still prefer to use a more clear git-rev-parse rather than
git-ls-remote for testing the cache repositories existence/validity. You
really don't have to worry about them being remote as it doesn't make sense
from the point of view of the motivation above and is not supported at all
by the current script anyhow (you can't clone into a remote repository).

 Finally, if you'd like to make my life a bit simpler, I'd like to ask you
to use wx_git_url as it was used originally, i.e. allow setting it for a
URL different from https://github.com/wxWidgets.git. Currently it looks
like it's still supported, but actually it isn't because the cached
repository is always cloned from the $default_url, i.e. GitHub. Conversely,
if you don't want to support my workflow (i.e. using an existing local
clone of the repository on the LAN) possible, then wx_git_url should be
eradicated entirely as it's just confusing to have it currently (it's also
pretty confusing to switch default_url from the GitHub URL to the cache one
IMO, I'd rather use $cache_wx_url explicitly if we always use anyhow).


 To summarize:

0. I believe that the simplest and best thing to do would be to revert to
   the initial, simpler version of the script which uses the cache if it
   exists and doesn't insist on creating it if it doesn't. This works just
   as well as the current version for a single machine and also works very
   well if there is a central location for the cache for multiple ones, as
   well as being faster for the initial checkout without cache due to
   allowing checking out submodules in parallel.

1. If the current logic is to be preserved, it must be made explicitly
   clear that we create the cached repository solely for the possibility of
   copying it to another machine. Ideally I'd also use a different term
   instead of "cache" (mirror?) as it isn't really used as a cache in the
   classic sense here. Also, the remnants of the old logic, such as
   wx_git_url, should be removed as they just confuse things.

2. I'm, of course, ready to do any or all of the changes proposed above if
   you'd like me to.

 Please let me know what, if anything, should I do.
 
 Thanks,
VZ
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [lmi] VCS caching, (continued)
Prev by Date: Re: [lmi] VCS caching
Next by Date: Re: [lmi] VCS caching
Previous by thread: Re: [lmi] VCS caching
Next by thread: Re: [lmi] VCS caching
Index(es):
- Date
- Thread