guix-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug#39258] [PATCH v4 0/3] Faster cache generation (similar as v3)


From: zimoun
Subject: [bug#39258] [PATCH v4 0/3] Faster cache generation (similar as v3)
Date: Sun, 3 May 2020 17:01:51 +0200

Dear,

The aim of this version v4 is to keep the same searching performances as the 
previous version v3 but to drastically reduce the generation of the cache.  On 
my laptop, the overhead is now 4 seconds; compared to more than 20 seconds for 
v2 and v3.

--8<---------------cut here---------------start------------->8---
# default
time guix build 
/gnu/store/0nfpp82mqglpwvl1nbfpaphw5db2ivcp-guix-package-cache.drv --check
# v4
time guix build 
/gnu/store/y78gfh1n7m3kyrj8wsqj25qc2cbc1a4d-guix-package-cache.drv --check
--8<---------------cut here---------------end--------------->8---

|      | default  | v4        |
|------+----------+-----------|
| real | 0m6.012s | 0m10.244s |
| user | 0m0.541s | 0m0.542s  |
| sys  | 0m0.033s | 0m0.032s  |


In the version v3, the cache is built using 'cons' and 'fold-packages' (wrapper 
to 'fold-module-public-variables').  The version v4 modifies -- by adding other 
information -- the function 'generate-package-cache' which uses 'vhash' and 
'fold-module-public-variables*'.

Therefore the cache '/lib/guix/package.cache' contains more information.  (The 
v4 structure of 'package.cache' is a quick draft, so details should be 
discussed and an interesting move should to have a structured (binary and all 
strings) S-exp; because it should become an entry point to export the packages 
list to JSON.  WDYT?)


Now, we are comparing apples to apples and the cost to compute BM25 (v2) is not 
free at all.  Remember that BM25 is the state-of-the-art of information 
retrieval (relevance ranking) and it is delegated to Xapian (v2).  I do not 
know if there is perfomance bottleneck between Guix, Guile-Xapian and Xapian 
itself but for sure the computation of BM25 is not free.  More about that soon.

To be clear about BM25 and caching, what I have in mind is:
  1. "guix search --build-index" optionally done by the user if they wants for 
example the BM25 ranking.
  2. Use BM25 metrics to detect poor package meta-data (synopsis and 
description); if it worth why not add another checker to "guix lint".

However, ranking is another story and I am not convinced yet if BM25 fits Guix 
needs or not.



* Details
~~~~~~~~~

The pacthes applies against the commit a357849f5b (and it is not yet rebased).

--8<---------------cut here---------------start------------->8---
time ./pre-env-inst guix pull --branch=search-v4 --url=$PWD -p /tmp/v4
--8<---------------cut here---------------end--------------->8---


Similar test than the previous benchmark (cold cache).

--8<---------------cut here---------------start------------->8---
time ./pre-env-inst /tmp/v4/bin/guix search crypto library \
     | recsel -P name | grep libb2
name: libb2

real    0m0.784s
user    0m0.810s
sys     0m0.037s
--8<---------------cut here---------------end--------------->8---

And the option '--load-path' turns off the cache and it fallbacks to the usual 
'fold-package'.

--8<---------------cut here---------------start------------->8---
time ./pre-inst-env /tmp/v4/bin/guix search -L /tmp/my-pkgs crypto library \
     | recsel -C -p name | grep libb2
name: libb2

real    0m2.446s
user    0m1.872s
sys     0m0.187s
--8<---------------cut here---------------end--------------->8---



* Still draft
~~~~~~~~~~~~~

 1. The name of 'fold-packages*' should be misleading since it does not return 
"true" packages.

--8<---------------cut here---------------start------------->8---
(define get-hello (p r)
  (if (string=? (package-name p) "hello")
      p
      r))
(define no-cache   (fold-packages  get-hello '()))
(define from-cache (fold-packages* get-hello '()))

(equal? no-cache from-cache)
;;; #f
--8<---------------cut here---------------end--------------->8---

    Another name for the procedure is welcome if it is an issue.

 2. The function 'package->recutils' in 'guix/ui.scm' is modified but it is not 
the better.

--8<---------------cut here---------------start------------->8---
          (match (package-supported-systems p)
            (('cache supported-systems)
             (string-join supported-systems))
            (_
             (string-join (package-transitive-supported-systems p)))))
--8<---------------cut here---------------end--------------->8---

    However it avoids to duplicate code; as it is done in version v3.


 3. Deprecated packages are displayed (bug in v3 too).

 4. Impolite '@@' is used to access the private license construction.

 5. Commit messages are incomplete, copyright header too, etc..



* Next?
~~~~~~~

IMHO, simply caching improves the current situation:

 - a bit of extra time at pull time (less than 5s on my machine)
 + speed up at search time (2x faster)
 * maintainable code?

Is it in the right direction?
Could you advise for a more compliant code?
Could you test on your machines to have another point of comparison?



Best regards,
simon


zimoun (3):
  DRAFT packages: Add fields to packages cache.
  DRAFT packages: Add new procedure 'fold-packages*'.
  DRAFT guix package: Use cache in 'find-packages-by-description'.

 gnu/packages.scm         | 98 ++++++++++++++++++++++++++++++++++++++--
 guix/scripts/package.scm |  2 +-
 guix/ui.scm              | 29 +++++++-----
 tests/packages.scm       | 31 +++++++++++++
 4 files changed, 143 insertions(+), 17 deletions(-)

-- 
2.26.1






reply via email to

[Prev in Thread] Current Thread [Next in Thread]