[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#65720: Guile-Git-managed checkouts grow way too much
From: |
Ludovic Courtès |
Subject: |
bug#65720: Guile-Git-managed checkouts grow way too much |
Date: |
Mon, 04 Sep 2023 23:47:49 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) |
Ludovic Courtès <ludo@gnu.org> skribis:
> As reported by Tobias on IRC (in the context of ‘hpcguix-web’),
> checkouts managed by Guile-Git appear to grow beyond reason. As an
> example, here’s the same ‘.git’ managed with Guile-Git and with Git:
>
> $ du -hs
> ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> 6.7G
> /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> $ du -hs .git
> 517M .git
Unsurprisingly, GC makes a big difference:
--8<---------------cut here---------------start------------->8---
$ cp -r
~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
/tmp/checkout
$ (cd /tmp/checkout/; git gc)
Enumerating objects: 717785, done.
Counting objects: 100% (717785/717785), done.
Delta compression using up to 4 threads
Compressing objects: 100% (154644/154644), done.
Writing objects: 100% (717785/717785), done.
Total 717785 (delta 569440), reused 710535 (delta 562274), pack-reused 0
Enumerating cruft objects: 103412, done.
Traversing cruft objects: 81753, done.
Counting objects: 100% (64171/64171), done.
Delta compression using up to 4 threads
Compressing objects: 100% (17379/17379), done.
Writing objects: 100% (64171/64171), done.
Total 64171 (delta 52330), reused 58296 (delta 46792), pack-reused 0
Expanding reachable commits in commit graph: 133730, done.
$ du -hs /tmp/checkout
539M /tmp/checkout
--8<---------------cut here---------------end--------------->8---
> It would seem that libgit2 doesn’t do the equivalent of ‘git gc’.
Confirmed: <https://github.com/libgit2/libgit2/issues/3247>.
My inclination for the short term would be to work around this
limitation by (1) finding a heuristic to determine is a checkout has
likely accumulated too much cruft, and (2) considering such checkouts as
expired (thereby forcing a re-clone) or running ‘git gc’ on them if
‘git’ is available.
I can’t think of a good heuristic for (1). Birth time could be one, but
we’d need statx(2):
--8<---------------cut here---------------start------------->8---
$ stat
~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq |
tail -4
Access: 2023-09-04 23:13:54.668279105 +0200
Modify: 2023-09-04 11:34:41.665385000 +0200
Change: 2023-09-04 11:34:41.661629102 +0200
Birth: 2021-08-09 10:48:17.748722151 +0200
--8<---------------cut here---------------end--------------->8---
Lacking statx(2), we can approximate creation time by looking at
‘.git/config’:
--8<---------------cut here---------------start------------->8---
$ stat
~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/config
| tail -3
Modify: 2021-08-09 10:50:28.031760953 +0200
Change: 2021-08-09 10:50:28.031760953 +0200
Birth: 2021-08-09 10:50:28.031760953 +0200
--8<---------------cut here---------------end--------------->8---
This strategy can be implemented like this:
diff --git a/guix/git.scm b/guix/git.scm
index ebe2600209..ed3fa56bc8 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -405,7 +405,16 @@ (define cached-checkout-expiration
;; Use the mtime rather than the atime to cope with file systems mounted
;; with 'noatime'.
- (file-expiration-time (* 90 24 3600) stat:mtime))
+ (let ((ttl (* 90 24 3600))
+ (max-checkout-retention (* 9 30 24 3600)))
+ (lambda (file)
+ (match (false-if-exception (lstat file))
+ (#f 0) ;FILE may have been deleted in the meantime
+ (st (min (pk 'ttl (+ (stat:mtime st) ttl))
+ (pk 'maxttl (match (false-if-exception
+ (lstat (in-vicinity file ".git/config")))
+ (#f +inf.0)
+ (st (+ (stat:mtime st) max-checkout-retention))))))))))
(define %checkout-cache-cleanup-period
;; Period for the removal of expired cached checkouts.
Namely, a cached checkout as considered as “expired” after 9 months. In
my case, it gives this:
--8<---------------cut here---------------start------------->8---
scheme@(guix git)> (cached-checkout-expiration
"/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/")
;;; (ttl 1701596081)
;;; (maxttl 1651827028)
$6 = 1651827028
--8<---------------cut here---------------end--------------->8---
Of course having to re-clone entire repositories every 9 months is
ridiculous, but storing gigabytes of packs is worse IMO (I’m
specifically thinking about the Guix repo, which every users copies via
‘guix pull’).
Thoughts?
Thanks,
Ludo’.
- bug#65720: Guile-Git-managed checkouts grow way too much, Ludovic Courtès, 2023/09/03
- bug#65720: Guile-Git-managed checkouts grow way too much,
Ludovic Courtès <=
- bug#65720: Guile-Git-managed checkouts grow way too much, Josselin Poiret, 2023/09/05
- bug#65720: Guile-Git-managed checkouts grow way too much, Ludovic Courtès, 2023/09/05
- bug#65720: Guile-Git-managed checkouts grow way too much, Josselin Poiret, 2023/09/06
- bug#65720: Guile-Git-managed checkouts grow way too much, Ludovic Courtès, 2023/09/08
- bug#65720: Guile-Git-managed checkouts grow way too much, Csepp, 2023/09/11
- bug#65720: Digression about Git implementations (was Re: bug#65720: Guile-Git-managed checkouts grow way too much), Simon Tournier, 2023/09/11
- bug#65720: Guile-Git-managed checkouts grow way too much, wolf, 2023/09/11
- bug#65720: Guile-Git-managed checkouts grow way too much, Ludovic Courtès, 2023/09/13
- bug#65720: Guile-Git-managed checkouts grow way too much, Simon Tournier, 2023/09/13
- bug#65720: Guile-Git-managed checkouts grow way too much, Simon Tournier, 2023/09/06