unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Arun Isaac <arunisaac@systemreboot.net>
To: Pierre Neidhardt <mail@ambrevar.xyz>,
	Ricardo Wurmus <rekado@elephly.net>
Cc: guix-devel@gnu.org
Subject: Re: File search progress: database review and question on triggers
Date: Thu, 13 Aug 2020 05:47:18 +0530	[thread overview]
Message-ID: <87eeobh01d.fsf@systemreboot.net> (raw)
In-Reply-To: <87r1sbel4f.fsf@ambrevar.xyz>

[-- Attachment #1: Type: text/plain, Size: 1756 bytes --]


Hi,

> 1. I tried to fine-tune the SQL a bit:
>   - Open/close the database only once for the whole indexing.
>   - Use "insert" instead of "insert or replace".
>   - Use numeric ID as key instead of path.
>
>   Result: Still around 15-20 minutes to build.  Switching to numeric
>   indices shrank the database by half.

sqlite insert statements can be very fast. sqlite.org claims 50000 or
more insert statements per second. But in order to achieve that speed
all insert statements have to be grouped together in a single
transaction. See https://www.sqlite.org/faq.html#q19

>   A string-contains filter takes less than 1 second.

Guile's string-contains function uses a naive O(nk) implementation,
where 'n' is the length of string s1 and 'k' is the length of string
s2. If it was implemented using the Knuth-Morris-Pratt algorithm, it
could cost only O(n+k). So, there is some scope for improvement here. In
fact, a comment on line 2007 of libguile/srfi-13.c in the guile source
tree makes this very point.

> I need to measure the time SQL takes for a regexp match.

sqlite, by default, does not come with regexp support. You might have to
load some external library. See
https://www.sqlite.org/lang_expr.html#the_like_glob_regexp_and_match_operators

--8<---------------cut here---------------start------------->8---
The REGEXP operator is a special syntax for the regexp() user
function. No regexp() user function is defined by default and so use of
the REGEXP operator will normally result in an error message. If an
application-defined SQL function named "regexp" is added at run-time,
then the "X REGEXP Y" operator will be implemented as a call to
"regexp(Y,X)".
--8<---------------cut here---------------end--------------->8---

Regards,
Arun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

  parent reply	other threads:[~2020-08-13  0:17 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-10 14:32 File search progress: database review and question on triggers Pierre Neidhardt
2020-08-11  9:43 ` Mathieu Othacehe
2020-08-11 12:35   ` Pierre Neidhardt
2020-08-15 12:48     ` Hartmut Goebel
2020-08-11 15:43 ` Ricardo Wurmus
2020-08-11 17:54   ` Pierre Neidhardt
2020-08-11 17:58     ` Pierre Neidhardt
2020-08-11 20:08       ` Ricardo Wurmus
2020-08-12 19:10         ` Pierre Neidhardt
2020-08-12 20:13           ` Julien Lepiller
2020-08-12 20:43             ` Pierre Neidhardt
2020-08-12 21:29               ` Julien Lepiller
2020-08-12 22:29                 ` Ricardo Wurmus
2020-08-13  6:55                   ` Pierre Neidhardt
2020-08-13  6:52                 ` Pierre Neidhardt
2020-08-13  9:34                   ` Ricardo Wurmus
2020-08-13 10:04                     ` Pierre Neidhardt
2020-08-15 12:47                       ` Hartmut Goebel
2020-08-15 21:20                         ` Bengt Richter
2020-08-16  8:18                           ` Hartmut Goebel
2020-08-12 20:32           ` Pierre Neidhardt
2020-08-13  0:17           ` Arun Isaac [this message]
2020-08-13  6:58             ` Pierre Neidhardt
2020-08-13  9:40             ` Pierre Neidhardt
2020-08-13 10:08               ` Pierre Neidhardt
2020-08-13 11:47               ` Ricardo Wurmus
2020-08-13 13:44                 ` Pierre Neidhardt
2020-08-13 12:20               ` Arun Isaac
2020-08-13 13:53                 ` Pierre Neidhardt
2020-08-13 15:14                   ` Arun Isaac
2020-08-13 15:36                     ` Pierre Neidhardt
2020-08-13 15:56                       ` Pierre Neidhardt
2020-08-15 19:33                         ` Arun Isaac
2020-08-24  8:29                           ` Pierre Neidhardt
2020-08-24 10:53                             ` Pierre Neidhardt
2020-09-04 19:15                               ` Arun Isaac
2020-09-05  7:48                                 ` Pierre Neidhardt
2020-09-06  9:25                                   ` Arun Isaac
2020-09-06 10:05                                     ` Pierre Neidhardt
2020-09-06 10:33                                       ` Arun Isaac
2020-08-18 14:58 ` File search progress: database review and question on triggers OFF TOPIC PRAISE Joshua Branson
2020-08-27 10:00 ` File search progress: database review and question on triggers zimoun
2020-08-27 11:15   ` Pierre Neidhardt
2020-08-27 12:56     ` zimoun
2020-08-27 13:19       ` Pierre Neidhardt
2020-09-26 14:04         ` Pierre Neidhardt
2020-09-26 14:12           ` Pierre Neidhardt
2020-10-05 12:35           ` Ludovic Courtès
2020-10-05 18:53             ` Pierre Neidhardt
2020-10-09 21:16               ` zimoun
2020-10-10  8:57                 ` Pierre Neidhardt
2020-10-10 14:58                   ` zimoun
2020-10-12 10:16                   ` Ludovic Courtès
2020-10-12 11:18                     ` Pierre Neidhardt
2020-10-13 13:48                       ` Ludovic Courtès
2020-10-13 13:59                         ` Pierre Neidhardt
2020-10-10 16:03               ` zimoun
2020-10-11 11:19                 ` Pierre Neidhardt
2020-10-11 13:02                   ` zimoun
2020-10-11 14:25                     ` Pierre Neidhardt
2020-10-11 16:05                       ` zimoun
2020-10-12 10:20               ` Ludovic Courtès
2020-10-12 11:21                 ` Pierre Neidhardt
2020-10-13 13:45                   ` Ludovic Courtès
2020-10-13 13:56                     ` Pierre Neidhardt
2020-10-13 21:22                       ` Ludovic Courtès
2020-10-14  7:50                         ` Pierre Neidhardt
2020-10-16 10:30                           ` Ludovic Courtès
2020-10-17  9:14                             ` Pierre Neidhardt
2020-10-17 19:17                               ` Pierre Neidhardt
2020-10-21  9:53                               ` Ludovic Courtès
2020-10-21  9:58                                 ` Pierre Neidhardt
2020-10-12 11:23                 ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87eeobh01d.fsf@systemreboot.net \
    --to=arunisaac@systemreboot.net \
    --cc=guix-devel@gnu.org \
    --cc=mail@ambrevar.xyz \
    --cc=rekado@elephly.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).