Why wouldn't it help? Can't you make it a trie from basename -> complete name? If I'm looking for "libcord.so" (which is a key in the trie), I don't think I need to look for every path. I only need to follow the trie until I find a pointer to some structure that contains the data I look for (ex: a list of complete filenames). On 2020年8月12日 16:43:37 GMT-04:00, Pierre Neidhardt wrote: >Julien Lepiller writes: > >> Have you tried something more structured? I have some code for >creating a binary search tree and even compressing/decompressing >strings with huffman, as well as code to serialize all that (my >deserialization is in Java though, so not very useful to you): >https://framagit.org/nani-project/nani-website >> >> See modules/nani/trie.scm for instance. >> >> The idea is to have a binary search tree whose keys are filenames and >value is a pointer in the file to a structure that holds data for this >filerame (packages and versions of guix for instance). It's super fast >to look it up, because you don't load the whole file, you only seek to >the right position. It's a lot longer to build, and not so easy to >update though. > >Assuming we'd have only 1 Guix generation per file, I'm not sure a trie >would help because we _always_ need to search _all_ file paths, so in >all cases we've got some 10,000+ entries to load in memory and loop >over >them. > >The total number of entries is the bottleneck here, both for the >database load and the individual search queries. > >An obvious optimization is to memoize the database load. This has a >significant memory cost though. > >The trie could be helpful for multiple Guix generations in the same >database, but I'm not sure it warrant the increased complexity. > >Thoughts? > >-- >Pierre Neidhardt >https://ambrevar.xyz/