From: Pierre Neidhardt <mail@ambrevar.xyz>
To: Ricardo Wurmus <rekado@elephly.net>
Cc: guix-devel@gnu.org
Subject: Re: File search progress: database review and question on triggers
Date: Wed, 12 Aug 2020 21:10:08 +0200 [thread overview]
Message-ID: <87r1sbel4f.fsf@ambrevar.xyz> (raw)
In-Reply-To: <87pn7x3pyw.fsf@elephly.net>
[-- Attachment #1: Type: text/plain, Size: 1764 bytes --]
I've done some benchmarking.
1. I tried to fine-tune the SQL a bit:
- Open/close the database only once for the whole indexing.
- Use "insert" instead of "insert or replace".
- Use numeric ID as key instead of path.
Result: Still around 15-20 minutes to build. Switching to numeric
indices shrank the database by half.
2. I've tried with the following naive 1-file-per-line format:
--8<---------------cut here---------------start------------->8---
"/gnu/store/97p5gvb7jglmn9jpgwwf5al1798wi61f-acl-2.2.53//share/man/man5/acl.5.gz"
"/gnu/store/97p5gvb7jglmn9jpgwwf5al1798wi61f-acl-2.2.53//share/man/man3/acl_add_perm.3.gz"
"/gnu/store/97p5gvb7jglmn9jpgwwf5al1798wi61f-acl-2.2.53//share/man/man3/acl_calc_mask.3.gz"
...
--8<---------------cut here---------------end--------------->8---
Result: Takes between 20 and 2 minutes to complete and the result is
32 MiB big. (I don't know why the timing varies.)
A string-contains filter takes less than 1 second.
A string-match (regex) search takes some 3 seconds (Ryzen 5 @ 3.5
GHz). I'm not sure if we can go faster. I need to measure the time
SQL takes for a regexp match.
Question: Any idea how to load the database as fast as possible? I
tried the following, it takes 1.5s on my machine:
--8<---------------cut here---------------start------------->8---
(define (load-textual-database)
(call-with-input-file %textual-db
(lambda (port)
(let loop ((line (get-line port))
(result '()))
(if (string? line)
(loop (get-line port) (cons line result))
result)))))
--8<---------------cut here---------------end--------------->8---
Cheers!
--
Pierre Neidhardt
https://ambrevar.xyz/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]
next prev parent reply other threads:[~2020-08-12 19:10 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-10 14:32 File search progress: database review and question on triggers Pierre Neidhardt
2020-08-11 9:43 ` Mathieu Othacehe
2020-08-11 12:35 ` Pierre Neidhardt
2020-08-15 12:48 ` Hartmut Goebel
2020-08-11 15:43 ` Ricardo Wurmus
2020-08-11 17:54 ` Pierre Neidhardt
2020-08-11 17:58 ` Pierre Neidhardt
2020-08-11 20:08 ` Ricardo Wurmus
2020-08-12 19:10 ` Pierre Neidhardt [this message]
2020-08-12 20:13 ` Julien Lepiller
2020-08-12 20:43 ` Pierre Neidhardt
2020-08-12 21:29 ` Julien Lepiller
2020-08-12 22:29 ` Ricardo Wurmus
2020-08-13 6:55 ` Pierre Neidhardt
2020-08-13 6:52 ` Pierre Neidhardt
2020-08-13 9:34 ` Ricardo Wurmus
2020-08-13 10:04 ` Pierre Neidhardt
2020-08-15 12:47 ` Hartmut Goebel
2020-08-15 21:20 ` Bengt Richter
2020-08-16 8:18 ` Hartmut Goebel
2020-08-12 20:32 ` Pierre Neidhardt
2020-08-13 0:17 ` Arun Isaac
2020-08-13 6:58 ` Pierre Neidhardt
2020-08-13 9:40 ` Pierre Neidhardt
2020-08-13 10:08 ` Pierre Neidhardt
2020-08-13 11:47 ` Ricardo Wurmus
2020-08-13 13:44 ` Pierre Neidhardt
2020-08-13 12:20 ` Arun Isaac
2020-08-13 13:53 ` Pierre Neidhardt
2020-08-13 15:14 ` Arun Isaac
2020-08-13 15:36 ` Pierre Neidhardt
2020-08-13 15:56 ` Pierre Neidhardt
2020-08-15 19:33 ` Arun Isaac
2020-08-24 8:29 ` Pierre Neidhardt
2020-08-24 10:53 ` Pierre Neidhardt
2020-09-04 19:15 ` Arun Isaac
2020-09-05 7:48 ` Pierre Neidhardt
2020-09-06 9:25 ` Arun Isaac
2020-09-06 10:05 ` Pierre Neidhardt
2020-09-06 10:33 ` Arun Isaac
2020-08-18 14:58 ` File search progress: database review and question on triggers OFF TOPIC PRAISE Joshua Branson
2020-08-27 10:00 ` File search progress: database review and question on triggers zimoun
2020-08-27 11:15 ` Pierre Neidhardt
2020-08-27 12:56 ` zimoun
2020-08-27 13:19 ` Pierre Neidhardt
2020-09-26 14:04 ` Pierre Neidhardt
2020-09-26 14:12 ` Pierre Neidhardt
2020-10-05 12:35 ` Ludovic Courtès
2020-10-05 18:53 ` Pierre Neidhardt
2020-10-09 21:16 ` zimoun
2020-10-10 8:57 ` Pierre Neidhardt
2020-10-10 14:58 ` zimoun
2020-10-12 10:16 ` Ludovic Courtès
2020-10-12 11:18 ` Pierre Neidhardt
2020-10-13 13:48 ` Ludovic Courtès
2020-10-13 13:59 ` Pierre Neidhardt
2020-10-10 16:03 ` zimoun
2020-10-11 11:19 ` Pierre Neidhardt
2020-10-11 13:02 ` zimoun
2020-10-11 14:25 ` Pierre Neidhardt
2020-10-11 16:05 ` zimoun
2020-10-12 10:20 ` Ludovic Courtès
2020-10-12 11:21 ` Pierre Neidhardt
2020-10-13 13:45 ` Ludovic Courtès
2020-10-13 13:56 ` Pierre Neidhardt
2020-10-13 21:22 ` Ludovic Courtès
2020-10-14 7:50 ` Pierre Neidhardt
2020-10-16 10:30 ` Ludovic Courtès
2020-10-17 9:14 ` Pierre Neidhardt
2020-10-17 19:17 ` Pierre Neidhardt
2020-10-21 9:53 ` Ludovic Courtès
2020-10-21 9:58 ` Pierre Neidhardt
2020-10-12 11:23 ` zimoun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r1sbel4f.fsf@ambrevar.xyz \
--to=mail@ambrevar.xyz \
--cc=guix-devel@gnu.org \
--cc=rekado@elephly.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).