From: Julien Lepiller <julien@lepiller.eu>
To: Pierre Neidhardt <mail@ambrevar.xyz>,
guix-devel@gnu.org, Ricardo Wurmus <rekado@elephly.net>
Subject: Re: File search progress: database review and question on triggers
Date: Wed, 12 Aug 2020 17:29:39 -0400 [thread overview]
Message-ID: <72845E8B-35E1-4A27-95E6-452D1D1F626B@lepiller.eu> (raw)
In-Reply-To: <87o8nfy4qu.fsf@ambrevar.xyz>
[-- Attachment #1: Type: text/plain, Size: 1880 bytes --]
Why wouldn't it help? Can't you make it a trie from basename -> complete name? If I'm looking for "libcord.so" (which is a key in the trie), I don't think I need to look for every path. I only need to follow the trie until I find a pointer to some structure that contains the data I look for (ex: a list of complete filenames).
On 2020年8月12日 16:43:37 GMT-04:00, Pierre Neidhardt <mail@ambrevar.xyz> wrote:
>Julien Lepiller <julien@lepiller.eu> writes:
>
>> Have you tried something more structured? I have some code for
>creating a binary search tree and even compressing/decompressing
>strings with huffman, as well as code to serialize all that (my
>deserialization is in Java though, so not very useful to you):
>https://framagit.org/nani-project/nani-website
>>
>> See modules/nani/trie.scm for instance.
>>
>> The idea is to have a binary search tree whose keys are filenames and
>value is a pointer in the file to a structure that holds data for this
>filerame (packages and versions of guix for instance). It's super fast
>to look it up, because you don't load the whole file, you only seek to
>the right position. It's a lot longer to build, and not so easy to
>update though.
>
>Assuming we'd have only 1 Guix generation per file, I'm not sure a trie
>would help because we _always_ need to search _all_ file paths, so in
>all cases we've got some 10,000+ entries to load in memory and loop
>over
>them.
>
>The total number of entries is the bottleneck here, both for the
>database load and the individual search queries.
>
>An obvious optimization is to memoize the database load. This has a
>significant memory cost though.
>
>The trie could be helpful for multiple Guix generations in the same
>database, but I'm not sure it warrant the increased complexity.
>
>Thoughts?
>
>--
>Pierre Neidhardt
>https://ambrevar.xyz/
[-- Attachment #2: Type: text/html, Size: 2275 bytes --]
next prev parent reply other threads:[~2020-08-12 21:30 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-10 14:32 File search progress: database review and question on triggers Pierre Neidhardt
2020-08-11 9:43 ` Mathieu Othacehe
2020-08-11 12:35 ` Pierre Neidhardt
2020-08-15 12:48 ` Hartmut Goebel
2020-08-11 15:43 ` Ricardo Wurmus
2020-08-11 17:54 ` Pierre Neidhardt
2020-08-11 17:58 ` Pierre Neidhardt
2020-08-11 20:08 ` Ricardo Wurmus
2020-08-12 19:10 ` Pierre Neidhardt
2020-08-12 20:13 ` Julien Lepiller
2020-08-12 20:43 ` Pierre Neidhardt
2020-08-12 21:29 ` Julien Lepiller [this message]
2020-08-12 22:29 ` Ricardo Wurmus
2020-08-13 6:55 ` Pierre Neidhardt
2020-08-13 6:52 ` Pierre Neidhardt
2020-08-13 9:34 ` Ricardo Wurmus
2020-08-13 10:04 ` Pierre Neidhardt
2020-08-15 12:47 ` Hartmut Goebel
2020-08-15 21:20 ` Bengt Richter
2020-08-16 8:18 ` Hartmut Goebel
2020-08-12 20:32 ` Pierre Neidhardt
2020-08-13 0:17 ` Arun Isaac
2020-08-13 6:58 ` Pierre Neidhardt
2020-08-13 9:40 ` Pierre Neidhardt
2020-08-13 10:08 ` Pierre Neidhardt
2020-08-13 11:47 ` Ricardo Wurmus
2020-08-13 13:44 ` Pierre Neidhardt
2020-08-13 12:20 ` Arun Isaac
2020-08-13 13:53 ` Pierre Neidhardt
2020-08-13 15:14 ` Arun Isaac
2020-08-13 15:36 ` Pierre Neidhardt
2020-08-13 15:56 ` Pierre Neidhardt
2020-08-15 19:33 ` Arun Isaac
2020-08-24 8:29 ` Pierre Neidhardt
2020-08-24 10:53 ` Pierre Neidhardt
2020-09-04 19:15 ` Arun Isaac
2020-09-05 7:48 ` Pierre Neidhardt
2020-09-06 9:25 ` Arun Isaac
2020-09-06 10:05 ` Pierre Neidhardt
2020-09-06 10:33 ` Arun Isaac
2020-08-18 14:58 ` File search progress: database review and question on triggers OFF TOPIC PRAISE Joshua Branson
2020-08-27 10:00 ` File search progress: database review and question on triggers zimoun
2020-08-27 11:15 ` Pierre Neidhardt
2020-08-27 12:56 ` zimoun
2020-08-27 13:19 ` Pierre Neidhardt
2020-09-26 14:04 ` Pierre Neidhardt
2020-09-26 14:12 ` Pierre Neidhardt
2020-10-05 12:35 ` Ludovic Courtès
2020-10-05 18:53 ` Pierre Neidhardt
2020-10-09 21:16 ` zimoun
2020-10-10 8:57 ` Pierre Neidhardt
2020-10-10 14:58 ` zimoun
2020-10-12 10:16 ` Ludovic Courtès
2020-10-12 11:18 ` Pierre Neidhardt
2020-10-13 13:48 ` Ludovic Courtès
2020-10-13 13:59 ` Pierre Neidhardt
2020-10-10 16:03 ` zimoun
2020-10-11 11:19 ` Pierre Neidhardt
2020-10-11 13:02 ` zimoun
2020-10-11 14:25 ` Pierre Neidhardt
2020-10-11 16:05 ` zimoun
2020-10-12 10:20 ` Ludovic Courtès
2020-10-12 11:21 ` Pierre Neidhardt
2020-10-13 13:45 ` Ludovic Courtès
2020-10-13 13:56 ` Pierre Neidhardt
2020-10-13 21:22 ` Ludovic Courtès
2020-10-14 7:50 ` Pierre Neidhardt
2020-10-16 10:30 ` Ludovic Courtès
2020-10-17 9:14 ` Pierre Neidhardt
2020-10-17 19:17 ` Pierre Neidhardt
2020-10-21 9:53 ` Ludovic Courtès
2020-10-21 9:58 ` Pierre Neidhardt
2020-10-12 11:23 ` zimoun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=72845E8B-35E1-4A27-95E6-452D1D1F626B@lepiller.eu \
--to=julien@lepiller.eu \
--cc=guix-devel@gnu.org \
--cc=mail@ambrevar.xyz \
--cc=rekado@elephly.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).