guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: File search progress: database review and question on triggers


From: Pierre Neidhardt
Subject: Re: File search progress: database review and question on triggers
Date: Wed, 12 Aug 2020 21:10:08 +0200

I've done some benchmarking.

1. I tried to fine-tune the SQL a bit:
  - Open/close the database only once for the whole indexing.
  - Use "insert" instead of "insert or replace".
  - Use numeric ID as key instead of path.

  Result: Still around 15-20 minutes to build.  Switching to numeric
  indices shrank the database by half.

2. I've tried with the following naive 1-file-per-line format:

--8<---------------cut here---------------start------------->8---
"/gnu/store/97p5gvb7jglmn9jpgwwf5al1798wi61f-acl-2.2.53//share/man/man5/acl.5.gz"
"/gnu/store/97p5gvb7jglmn9jpgwwf5al1798wi61f-acl-2.2.53//share/man/man3/acl_add_perm.3.gz"
"/gnu/store/97p5gvb7jglmn9jpgwwf5al1798wi61f-acl-2.2.53//share/man/man3/acl_calc_mask.3.gz"
...
--8<---------------cut here---------------end--------------->8---

  Result: Takes between 20 and 2 minutes to complete and the result is
  32 MiB big.  (I don't know why the timing varies.)

  A string-contains filter takes less than 1 second.

  A string-match (regex) search takes some 3 seconds (Ryzen 5 @ 3.5
  GHz).  I'm not sure if we can go faster.  I need to measure the time
  SQL takes for a regexp match.

Question: Any idea how to load the database as fast as possible?  I
tried the following, it takes 1.5s on my machine:

--8<---------------cut here---------------start------------->8---
(define (load-textual-database)
  (call-with-input-file %textual-db
    (lambda (port)
      (let loop ((line (get-line port))
                 (result '()))
        (if (string? line)
            (loop (get-line port) (cons line result))
            result)))))
--8<---------------cut here---------------end--------------->8---

Cheers!

--
Pierre Neidhardt
https://ambrevar.xyz/

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]