On 2020年8月12日 15:10:08 GMT-04:00, Pierre Neidhardt <mail@ambrevar.xyz> wrote:
I've done some benchmarking.
1. I tried to fine-tune the SQL a bit:
- Open/close the database only once for the whole indexing.
- Use "insert" instead of "insert or replace".
- Use numeric ID as key instead of path.
Result: Still around 15-20 minutes to build. Switching to numeric
indices shrank the database by half.
2. I've tried with the following naive 1-file-per-line format:
--8<---------------cut here---------------start------------->8---
"/gnu/store/97p5gvb7jglmn9jpgwwf5al1798wi61f-acl-2.2.53//share/man/man5/acl.5.gz"
"/gnu/store/97p5gvb7jglmn9jpgwwf5al1798wi61f-acl-2.2.53//share/man/man3/acl_add_perm.3.gz"
"/gnu/store/97p5gvb7jglmn9jpgwwf5al1798wi61f-acl-2.2.53//share/man/man3/acl_calc_mask.3.gz"
...
--8<---------------cut here---------------end--------------->8---
Result: Takes between 20 and 2 minutes to complete and the result is
32 MiB big. (I don't know why the timing varies.)
A string-contains filter takes less than 1 second.
A string-match (regex) search takes some 3 seconds (Ryzen 5 @ 3.5
GHz). I'm not sure if we can go faster. I need to measure the time
SQL takes for a regexp match.
Question: Any idea how to load the database as fast as possible? I
tried the following, it takes 1.5s on my machine:
--8<---------------cut here---------------start------------->8---
(define (load-textual-database)
(call-with-input-file %textual-db
(lambda (port)
(let loop ((line (get-line port))
(result '()))
(if (string? line)
(loop (get-line port) (cons line result))
result)))))
--8<---------------cut here---------------end--------------->8---
Cheers!
--
Pierre Neidhardt
https://ambrevar.xyz/