guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: File search progress: database review and question on triggers


From: Arun Isaac
Subject: Re: File search progress: database review and question on triggers
Date: Thu, 13 Aug 2020 05:47:18 +0530

Hi,

> 1. I tried to fine-tune the SQL a bit:
>   - Open/close the database only once for the whole indexing.
>   - Use "insert" instead of "insert or replace".
>   - Use numeric ID as key instead of path.
>
>   Result: Still around 15-20 minutes to build.  Switching to numeric
>   indices shrank the database by half.

sqlite insert statements can be very fast. sqlite.org claims 50000 or
more insert statements per second. But in order to achieve that speed
all insert statements have to be grouped together in a single
transaction. See https://www.sqlite.org/faq.html#q19

>   A string-contains filter takes less than 1 second.

Guile's string-contains function uses a naive O(nk) implementation,
where 'n' is the length of string s1 and 'k' is the length of string
s2. If it was implemented using the Knuth-Morris-Pratt algorithm, it
could cost only O(n+k). So, there is some scope for improvement here. In
fact, a comment on line 2007 of libguile/srfi-13.c in the guile source
tree makes this very point.

> I need to measure the time SQL takes for a regexp match.

sqlite, by default, does not come with regexp support. You might have to
load some external library. See
https://www.sqlite.org/lang_expr.html#the_like_glob_regexp_and_match_operators

--8<---------------cut here---------------start------------->8---
The REGEXP operator is a special syntax for the regexp() user
function. No regexp() user function is defined by default and so use of
the REGEXP operator will normally result in an error message. If an
application-defined SQL function named "regexp" is added at run-time,
then the "X REGEXP Y" operator will be implemented as a call to
"regexp(Y,X)".
--8<---------------cut here---------------end--------------->8---

Regards,
Arun

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]