Why wouldn't it help? Can't you make it a trie from basename -> complete name? If I'm looking for "libcord.so" (which is a key in the trie), I don't think I need to look for every path. I only need to follow the trie until I find a pointer to some structure that contains the data I look for (ex: a list of complete filenames).

On 2020年8月12日 16:43:37 GMT-04:00, Pierre Neidhardt <mail@ambrevar.xyz> wrote:
Julien Lepiller <julien@lepiller.eu> writes:

Have you tried something more structured? I have some code for creating a binary search tree and even compressing/decompressing strings with huffman, as well as code to serialize all that (my deserialization is in Java though, so not very useful to you): https://framagit.org/nani-project/nani-website

See modules/nani/trie.scm for instance.

The idea is to have a binary search tree whose keys are filenames and value is a pointer in the file to a structure that holds data for this filerame (packages and versions of guix for instance). It's super fast to look it up, because you don't load the whole file, you only seek to the right position. It's a lot longer to build, and not so easy to update though.

Assuming we'd have only 1 Guix generation per file, I'm not sure a trie
would help because we _always_ need to search _all_ file paths, so in
all cases we've got some 10,000+ entries to load in memory and loop over
them.

The total number of entries is the bottleneck here, both for the
database load and the individual search queries.

An obvious optimization is to memoize the database load. This has a
significant memory cost though.

The trie could be helpful for multiple Guix generations in the same
database, but I'm not sure it warrant the increased complexity.

Thoughts?