[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#39258] [PATCH v5 0/4] Optimize guix search
From: |
zimoun |
Subject: |
[bug#39258] [PATCH v5 0/4] Optimize guix search |
Date: |
Mon, 1 Jun 2020 03:25:09 +0200 |
Hi Arun,
On Mon, 1 Jun 2020 at 02:00, Arun Isaac <arunisaac@systemreboot.net> wrote:
> Sorry for the long delay in replying to this thread.
Based on the Ludo's comments [1] on v4 which is a simple re-write of
your v3, I am finishing a vN+1.. but time flies and I am late on the
topic too. :-)
Well, this still unsent vN+1 series has the same performance of v4 on
"guix pull" which is a key point compared to v3. Obviously, the
performance on "guix search" are equivalent on both version. This
vN+1 builds two caches -- to avoid binary breakage -- in only one go;
the consuming 'fold-modules-public-variables*' is applied only once.
[1] http://issues.guix.gnu.org/39258#93
> I think Ludo is right in that we can improve guix search performance with only
> simple code improvements rather than including xapian or improving our
> existing cache. Here are a few patches on those lines.
Well, improving the cache is easy; at least as you did in v3 by adding
another one.
The most annoying part is the arguments rewrite of 'package->recutils'
to be compliant.
However after some comparisons, I am not convinced that BM25 will be
worth to implement...
> In `relevance`, we set our score to 0 if any of the regexps don't match. Then,
> we might as well not match the remaining regexps. Patch 1 does this early cut
> off optimization.
Interesting.
> Often our search strings are only literal strings. So, we can save some time
> by using string-contains instead of invoking the regexp engine. Patch 2 does
> this. In addition, guile's string-contains uses a naive O(n^2) string search
> algorithm. We should perhaps use the O(n) Knuth-Morris-Pratt algorithm[1]. In
> fact, a comment on line 2006 of libguile/srfi-13.c in the guile source code
> mentions this. If implemented, the KMP algorithm could speed up guix search
> further.
>
> [1]:
> https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
Really interesting idea,
> Patch 3 and 4 are minor improvements.
>
> Here's a rough performance comparison.
On cold or warm cache?
> --8<---------------cut here---------------start------------->8---
> time ./pre-inst-env guix search game
>
> real 0m2.261s
> user 0m2.351s
> sys 0m0.104s
> --8<---------------cut here---------------end--------------->8---
>
> --8<---------------cut here---------------start------------->8---
> time guix search game
>
> real 0m2.661s
> user 0m2.843s
> sys 0m0.080s
> --8<---------------cut here---------------end--------------->8---
>
> --8<---------------cut here---------------start------------->8---
> time ./pre-inst-env guix search strategy game
>
> real 0m1.613s
> user 0m1.635s
> sys 0m0.096s
> --8<---------------cut here---------------end--------------->8---
>
> --8<---------------cut here---------------start------------->8---
> time guix search strategy game
>
> real 0m2.520s
> user 0m2.583s
> sys 0m0.112s
> --8<---------------cut here---------------end--------------->8---
So in the best case, you have the ratio old/new is 1.5; this new
version is 1.5 faster.
Well, in the extra cache approach (v3 or v4) the ration old/new is
really higher: 3.1 faster on cold cache (which is the one I am
interested in) and 2.4 faster on warm cache.
I will give a look to this new series and report what happens on my
laptop. But basically, I would like "guix search" under the 1.0
second on my machine. ;-)
Thank you for this new input.
Cheers,
simon