Hi Mathieu, Thanks for you comments! Answers below. >> 3. Size of the database: >> I've persisted all locally-present store items for my current Guix version >> and it produced a database of 72 MiB. It compresses down to 8 MiB in zstd. >> >> But since we can have multiple Guix versions, this means that the >> packages have one entry per store path, so we might end up with more >> entries than that as the number of Guix generations grows. > > I'm not sure we actually need to save the full history. I think we could > just store that the package X produces [Y1, Y2, ...] executable files. Then > on package X update, the executable files list could be refreshed. Maybe you are missing some context. The original discussion is there: https://lists.gnu.org/archive/html/guix-devel/2020-01/msg00019.html. Unlike Nix, we would like to do more than just index executable files. Indeed, it's very useful to know where to find, say, a C header, a .so library, a TeXlive .sty file, etc. However, as you hinted, maybe it's unnecessary to save the file listings of the packages for every Guix versions. Maybe we could only store the "diffs" between the Guix generations. I don't know if SQLite supports this. If not, it sounds like a rather complex thing to do. But really, if the compressed database over multiple Guix generations is <100 MiB, then size is not a big problem. >> Question: Should we include empty directories in the database? I'm tempted >> to answer no. > > I would also say no, and also exclude non-executable files. See above, I think we would lose a lot in not including non-executable files. >> Question: This bounds us to the SQLite syntax for pattern matching. Is it a >> problem? >> It seems powerful enough in practice. But maybe we can use regular >> expression in SQLite as well? > > From the UI perspective, we already have "guix search" that expects a > regex. If we were to include a "guix file-search" command, then I think > it would make sense that it uses the same regex syntax. I found out that SQLite has a REGEXP operator, I'll see if it works well enough. >> 7. Have substitute servers distribute database content. When the user performs >> a file search, Guix asks the substitute server for a database update. Only >> the diff should be sent over the network, not the whole thing since it might > > If I understand correctly, you are proposing to create local databases > that would be kept in sync with a master database populated by the CI > server. This seems a bit complex. > > What about extending Cuirass database to add the two tables you are > proposing. Then, each time a package is built, if the version is > updated, the "Files" table would be updated. > > Then we could add an HTTP interface such as "/search/file?query=libxxx" > to Cuirass, that would directly query the database. In Guix itself, we > could add the counterpart in the (guix ci) module. The problem with this approach is that it would not work offline. In my opinion, this is a big limitation. I'd rather have a local database. Besides, we need a local database for non-official, locally-built packages anyways (which Cuirass would not know about). Since this is a requirement, the only piece that'd be missing is database synchronization. Thoughts? -- Pierre Neidhardt https://ambrevar.xyz/