[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#39258] [PATCH v4 0/3] Faster cache generation (similar as v3)
From: |
zimoun |
Subject: |
[bug#39258] [PATCH v4 0/3] Faster cache generation (similar as v3) |
Date: |
Sun, 3 May 2020 17:01:51 +0200 |
Dear,
The aim of this version v4 is to keep the same searching performances as the
previous version v3 but to drastically reduce the generation of the cache. On
my laptop, the overhead is now 4 seconds; compared to more than 20 seconds for
v2 and v3.
--8<---------------cut here---------------start------------->8---
# default
time guix build
/gnu/store/0nfpp82mqglpwvl1nbfpaphw5db2ivcp-guix-package-cache.drv --check
# v4
time guix build
/gnu/store/y78gfh1n7m3kyrj8wsqj25qc2cbc1a4d-guix-package-cache.drv --check
--8<---------------cut here---------------end--------------->8---
| | default | v4 |
|------+----------+-----------|
| real | 0m6.012s | 0m10.244s |
| user | 0m0.541s | 0m0.542s |
| sys | 0m0.033s | 0m0.032s |
In the version v3, the cache is built using 'cons' and 'fold-packages' (wrapper
to 'fold-module-public-variables'). The version v4 modifies -- by adding other
information -- the function 'generate-package-cache' which uses 'vhash' and
'fold-module-public-variables*'.
Therefore the cache '/lib/guix/package.cache' contains more information. (The
v4 structure of 'package.cache' is a quick draft, so details should be
discussed and an interesting move should to have a structured (binary and all
strings) S-exp; because it should become an entry point to export the packages
list to JSON. WDYT?)
Now, we are comparing apples to apples and the cost to compute BM25 (v2) is not
free at all. Remember that BM25 is the state-of-the-art of information
retrieval (relevance ranking) and it is delegated to Xapian (v2). I do not
know if there is perfomance bottleneck between Guix, Guile-Xapian and Xapian
itself but for sure the computation of BM25 is not free. More about that soon.
To be clear about BM25 and caching, what I have in mind is:
1. "guix search --build-index" optionally done by the user if they wants for
example the BM25 ranking.
2. Use BM25 metrics to detect poor package meta-data (synopsis and
description); if it worth why not add another checker to "guix lint".
However, ranking is another story and I am not convinced yet if BM25 fits Guix
needs or not.
* Details
~~~~~~~~~
The pacthes applies against the commit a357849f5b (and it is not yet rebased).
--8<---------------cut here---------------start------------->8---
time ./pre-env-inst guix pull --branch=search-v4 --url=$PWD -p /tmp/v4
--8<---------------cut here---------------end--------------->8---
Similar test than the previous benchmark (cold cache).
--8<---------------cut here---------------start------------->8---
time ./pre-env-inst /tmp/v4/bin/guix search crypto library \
| recsel -P name | grep libb2
name: libb2
real 0m0.784s
user 0m0.810s
sys 0m0.037s
--8<---------------cut here---------------end--------------->8---
And the option '--load-path' turns off the cache and it fallbacks to the usual
'fold-package'.
--8<---------------cut here---------------start------------->8---
time ./pre-inst-env /tmp/v4/bin/guix search -L /tmp/my-pkgs crypto library \
| recsel -C -p name | grep libb2
name: libb2
real 0m2.446s
user 0m1.872s
sys 0m0.187s
--8<---------------cut here---------------end--------------->8---
* Still draft
~~~~~~~~~~~~~
1. The name of 'fold-packages*' should be misleading since it does not return
"true" packages.
--8<---------------cut here---------------start------------->8---
(define get-hello (p r)
(if (string=? (package-name p) "hello")
p
r))
(define no-cache (fold-packages get-hello '()))
(define from-cache (fold-packages* get-hello '()))
(equal? no-cache from-cache)
;;; #f
--8<---------------cut here---------------end--------------->8---
Another name for the procedure is welcome if it is an issue.
2. The function 'package->recutils' in 'guix/ui.scm' is modified but it is not
the better.
--8<---------------cut here---------------start------------->8---
(match (package-supported-systems p)
(('cache supported-systems)
(string-join supported-systems))
(_
(string-join (package-transitive-supported-systems p)))))
--8<---------------cut here---------------end--------------->8---
However it avoids to duplicate code; as it is done in version v3.
3. Deprecated packages are displayed (bug in v3 too).
4. Impolite '@@' is used to access the private license construction.
5. Commit messages are incomplete, copyright header too, etc..
* Next?
~~~~~~~
IMHO, simply caching improves the current situation:
- a bit of extra time at pull time (less than 5s on my machine)
+ speed up at search time (2x faster)
* maintainable code?
Is it in the right direction?
Could you advise for a more compliant code?
Could you test on your machines to have another point of comparison?
Best regards,
simon
zimoun (3):
DRAFT packages: Add fields to packages cache.
DRAFT packages: Add new procedure 'fold-packages*'.
DRAFT guix package: Use cache in 'find-packages-by-description'.
gnu/packages.scm | 98 ++++++++++++++++++++++++++++++++++++++--
guix/scripts/package.scm | 2 +-
guix/ui.scm | 29 +++++++-----
tests/packages.scm | 31 +++++++++++++
4 files changed, 143 insertions(+), 17 deletions(-)
--
2.26.1
- [bug#39258] [PATCH v4 0/3] Faster cache generation (similar as v3),
zimoun <=