Julien Lepiller <julien@lepiller.eu> writes:

> Have you tried something more structured? I have some code for creating a binary search tree and even compressing/decompressing strings with huffman, as well as code to serialize all that (my deserialization is in Java though, so not very useful to you): https://framagit.org/nani-project/nani-website
>
> See modules/nani/trie.scm for instance.
>
> The idea is to have a binary search tree whose keys are filenames and value is a pointer in the file to a structure that holds data for this filerame (packages and versions of guix for instance). It's super fast to look it up, because you don't load the whole file, you only seek to the right position. It's a lot longer to build, and not so easy to update though.

Assuming we'd have only 1 Guix generation per file, I'm not sure a trie
would help because we _always_ need to search _all_ file paths, so in
all cases we've got some 10,000+ entries to load in memory and loop over
them.

The total number of entries is the bottleneck here, both for the
database load and the individual search queries.

An obvious optimization is to memoize the database load.  This has a
significant memory cost though.

The trie could be helpful for multiple Guix generations in the same
database, but I'm not sure it warrant the increased complexity.

Thoughts?

-- 
Pierre Neidhardt
https://ambrevar.xyz/