bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#65720: Guile-Git-managed checkouts grow way too much


From: Ludovic Courtès
Subject: bug#65720: Guile-Git-managed checkouts grow way too much
Date: Mon, 04 Sep 2023 23:47:49 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)

Ludovic Courtès <ludo@gnu.org> skribis:

> As reported by Tobias on IRC (in the context of ‘hpcguix-web’),
> checkouts managed by Guile-Git appear to grow beyond reason.  As an
> example, here’s the same ‘.git’ managed with Guile-Git and with Git:
>
> $ du -hs 
> ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> 6.7G    
> /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> $ du -hs .git
> 517M    .git

Unsurprisingly, GC makes a big difference:

--8<---------------cut here---------------start------------->8---
$ cp -r 
~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq 
/tmp/checkout
$ (cd /tmp/checkout/; git gc)
Enumerating objects: 717785, done.
Counting objects: 100% (717785/717785), done.
Delta compression using up to 4 threads
Compressing objects: 100% (154644/154644), done.
Writing objects: 100% (717785/717785), done.
Total 717785 (delta 569440), reused 710535 (delta 562274), pack-reused 0
Enumerating cruft objects: 103412, done.
Traversing cruft objects: 81753, done.
Counting objects: 100% (64171/64171), done.
Delta compression using up to 4 threads
Compressing objects: 100% (17379/17379), done.
Writing objects: 100% (64171/64171), done.
Total 64171 (delta 52330), reused 58296 (delta 46792), pack-reused 0
Expanding reachable commits in commit graph: 133730, done.
$ du -hs /tmp/checkout
539M    /tmp/checkout
--8<---------------cut here---------------end--------------->8---

> It would seem that libgit2 doesn’t do the equivalent of ‘git gc’.

Confirmed: <https://github.com/libgit2/libgit2/issues/3247>.

My inclination for the short term would be to work around this
limitation by (1) finding a heuristic to determine is a checkout has
likely accumulated too much cruft, and (2) considering such checkouts as
expired (thereby forcing a re-clone) or running ‘git gc’ on them if
‘git’ is available.

I can’t think of a good heuristic for (1).  Birth time could be one, but
we’d need statx(2):

--8<---------------cut here---------------start------------->8---
$ stat 
~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq | 
tail -4
Access: 2023-09-04 23:13:54.668279105 +0200
Modify: 2023-09-04 11:34:41.665385000 +0200
Change: 2023-09-04 11:34:41.661629102 +0200
 Birth: 2021-08-09 10:48:17.748722151 +0200
--8<---------------cut here---------------end--------------->8---

Lacking statx(2), we can approximate creation time by looking at
‘.git/config’:

--8<---------------cut here---------------start------------->8---
$ stat 
~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/config
 | tail -3
Modify: 2021-08-09 10:50:28.031760953 +0200
Change: 2021-08-09 10:50:28.031760953 +0200
 Birth: 2021-08-09 10:50:28.031760953 +0200
--8<---------------cut here---------------end--------------->8---

This strategy can be implemented like this:

diff --git a/guix/git.scm b/guix/git.scm
index ebe2600209..ed3fa56bc8 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -405,7 +405,16 @@ (define cached-checkout-expiration
 
   ;; Use the mtime rather than the atime to cope with file systems mounted
   ;; with 'noatime'.
-  (file-expiration-time (* 90 24 3600) stat:mtime))
+  (let ((ttl (* 90 24 3600))
+        (max-checkout-retention (* 9 30 24 3600)))
+    (lambda (file)
+      (match (false-if-exception (lstat file))
+        (#f 0)                     ;FILE may have been deleted in the meantime
+        (st (min (pk 'ttl (+ (stat:mtime st) ttl))
+                 (pk 'maxttl (match (false-if-exception
+                          (lstat (in-vicinity file ".git/config")))
+                    (#f +inf.0)
+                    (st (+ (stat:mtime st) max-checkout-retention))))))))))
 
 (define %checkout-cache-cleanup-period
   ;; Period for the removal of expired cached checkouts.
Namely, a cached checkout as considered as “expired” after 9 months.  In
my case, it gives this:

--8<---------------cut here---------------start------------->8---
scheme@(guix git)> (cached-checkout-expiration 
"/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/")

;;; (ttl 1701596081)

;;; (maxttl 1651827028)
$6 = 1651827028
--8<---------------cut here---------------end--------------->8---

Of course having to re-clone entire repositories every 9 months is
ridiculous, but storing gigabytes of packs is worse IMO (I’m
specifically thinking about the Guix repo, which every users copies via
‘guix pull’).

Thoughts?

Thanks,
Ludo’.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]