From: "Ludovic Courtès" <ludo@gnu.org>
To: Guix Devel <guix-devel@gnu.org>
Subject: File search
Date: Fri, 21 Jan 2022 10:03:43 +0100 [thread overview]
Message-ID: <8735lh5ukw.fsf@inria.fr> (raw)
[-- Attachment #1: Type: text/plain, Size: 2372 bytes --]
Hello Guix!
Lately I found myself going several times to
<https://packages.debian.org> to look for packages providing a given
file and I thought it’s time to do something about it.
The script below creates an SQLite database for the current set of
packages, but only for those already in the store:
guix repl file-database.scm populate
That creates /tmp/db; it took about 25mn on berlin, for 18K packages.
Then you can run, say:
guix repl file-database.scm search boot-9.scm
to find which packages provide a file named ‘boot-9.scm’. That part is
instantaneous.
The database for 18K packages is quite big:
--8<---------------cut here---------------start------------->8---
$ du -h /tmp/db*
389M /tmp/db
82M /tmp/db.gz
61M /tmp/db.zst
--8<---------------cut here---------------end--------------->8---
How do we expose that information? There are several criteria I can
think of: accuracy, freshness, privacy, responsiveness, off-line
operation.
I think accuracy (making sure you get results that correspond precisely
to, say, your current channel revisions and your current system) is not
a high priority: some result is better than no result. Likewise for
freshness: results for an older version of a given package may still be
valid now.
In terms of privacy, I think it’s better if we can avoid making one
request per file searched for. Off-line operation would be sweet, and
it comes with responsiveness; fast off-line search is necessary for
things like ‘command-not-found’ (where the shell tells you what package
to install when a command is not found).
Based on that, it is tempting to just distribute a full database from
ci.guix, say, that the client command would regularly fetch. The
downside is that that’s quite a lot of data to download; if you use the
file search command infrequently, you might find yourself spending more
time downloading the database than actually searching it.
We could have a hybrid solution: distribute a database that contains
only files in /bin and /sbin (it should be much smaller), and for
everything else, resort to a web service (the Data Service could be
extended to include file lists). That way, we’d have fast
privacy-respecting search for command names, and on-line search for
everything else.
Thoughts?
Ludo’.
[-- Attachment #2: The file database tool --]
[-- Type: text/plain, Size: 7549 bytes --]
;;; GNU Guix --- Functional package management for GNU
;;; Copyright © 2022 Ludovic Courtès <ludo@gnu.org>
;;;
;;; This file is part of GNU Guix.
;;;
;;; GNU Guix is free software; you can redistribute it and/or modify it
;;; under the terms of the GNU General Public License as published by
;;; the Free Software Foundation; either version 3 of the License, or (at
;;; your option) any later version.
;;;
;;; GNU Guix is distributed in the hope that it will be useful, but
;;; WITHOUT ANY WARRANTY; without even the implied warranty of
;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;;; GNU General Public License for more details.
;;;
;;; You should have received a copy of the GNU General Public License
;;; along with GNU Guix. If not, see <http://www.gnu.org/licenses/>.
(define-module (file-database)
#:use-module (sqlite3)
#:use-module (ice-9 match)
#:use-module (guix store)
#:use-module (guix monads)
#:autoload (guix grafts) (%graft?)
#:use-module (guix derivations)
#:use-module (guix packages)
#:autoload (guix build utils) (find-files)
#:autoload (gnu packages) (fold-packages)
#:use-module (srfi srfi-1)
#:use-module (srfi srfi-9)
#:export (file-database))
(define schema
"
create table if not exists Packages (
id integer primary key autoincrement not null,
name text not null,
version text not null
);
create table if not exists Directories (
id integer primary key autoincrement not null,
name text not null,
package integer not null,
foreign key (package) references Packages(id) on delete cascade
);
create table if not exists Files (
name text not null,
basename text not null,
directory integer not null,
foreign key (directory) references Directories(id) on delete cascade
);
create index if not exists IndexFiles on Files(basename);")
(define (call-with-database file proc)
(let ((db (sqlite-open file)))
(dynamic-wind
(lambda () #t)
(lambda ()
(sqlite-exec db schema)
(proc db))
(lambda ()
(sqlite-close db)))))
(define (insert-files db package version directories)
"Insert the files contained in DIRECTORIES as belonging to PACKAGE at
VERSION."
(define last-row-id-stmt
(sqlite-prepare db "SELECT last_insert_rowid();"
#:cache? #t))
(define package-stmt
(sqlite-prepare db "\
INSERT OR REPLACE INTO Packages(name, version)
VALUES (:name, :version);"
#:cache? #t))
(define directory-stmt
(sqlite-prepare db "\
INSERT INTO Directories(name, package) VALUES (:name, :package);"
#:cache? #t))
(define file-stmt
(sqlite-prepare db "\
INSERT INTO Files(name, basename, directory)
VALUES (:name, :basename, :directory);"
#:cache? #t))
(sqlite-exec db "begin immediate;")
(sqlite-bind-arguments package-stmt
#:name package
#:version version)
(sqlite-fold (const #t) #t package-stmt)
(match (sqlite-fold cons '() last-row-id-stmt)
((#(package-id))
(pk 'package package-id package)
(for-each (lambda (directory)
(define (strip file)
(string-drop file (+ (string-length directory) 1)))
(sqlite-reset directory-stmt)
(sqlite-bind-arguments directory-stmt
#:name directory
#:package package-id)
(sqlite-fold (const #t) #t directory-stmt)
(match (sqlite-fold cons '() last-row-id-stmt)
((#(directory-id))
(for-each (lambda (file)
;; If DIRECTORY is a symlink, (find-files
;; DIRECTORY) returns the DIRECTORY singleton.
(unless (string=? file directory)
(sqlite-reset file-stmt)
(sqlite-bind-arguments file-stmt
#:name (strip file)
#:basename
(basename file)
#:directory
directory-id)
(sqlite-fold (const #t) #t file-stmt)))
(find-files directory)))))
directories)
(sqlite-exec db "commit;"))))
(define (insert-package db package)
"Insert all the files of PACKAGE into DB."
(mlet %store-monad ((drv (package->derivation package #:graft? #f)))
(match (derivation->output-paths drv)
(((labels . directories) ...)
(when (every file-exists? directories)
(insert-files db (package-name package) (package-version package)
directories))
(return #t)))))
(define (insert-packages db)
"Insert all the current packages into DB."
(with-store store
(parameterize ((%graft? #f))
(fold-packages (lambda (package _)
(run-with-store store
(insert-package db package)))
#t
#:select? (lambda (package)
(and (not (hidden-package? package))
(not (package-superseded package))
(supported-package? package)))))))
(define-record-type <package-match>
(package-match name version file)
package-match?
(name package-match-name)
(version package-match-version)
(file package-match-file))
(define (matching-packages db file)
"Return a list of <package-match> corresponding to packages containing
FILE."
(define lookup-stmt
(sqlite-prepare db "\
SELECT Packages.name, Packages.version, Directories.name, Files.name
FROM Packages
INNER JOIN Files, Directories
ON files.basename = :file AND directories.id = files.directory AND packages.id = directories.package;"))
(sqlite-bind-arguments lookup-stmt #:file file)
(sqlite-fold (lambda (result lst)
(match result
(#(package version directory file)
(cons (package-match package version
(string-append directory "/" file))
lst))))
'() lookup-stmt))
\f
(define (file-database . args)
(match args
((_ "populate")
(call-with-database "/tmp/db"
(lambda (db)
(insert-packages db))))
((_ "search" file)
(let ((matches (call-with-database "/tmp/db"
(lambda (db)
(matching-packages db file)))))
(for-each (lambda (result)
(format #t "~20a ~a~%"
(string-append (package-match-name result)
"@" (package-match-version result))
(package-match-file result)))
matches)
(exit (pair? matches))))
(_
(format (current-error-port)
"usage: file-database [populate|search] args ...~%")
(exit 1))))
(apply file-database (command-line))
next reply other threads:[~2022-01-21 10:13 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-21 9:03 Ludovic Courtès [this message]
2022-01-21 10:35 ` File search Mathieu Othacehe
2022-01-22 0:35 ` Ludovic Courtès
2022-01-21 19:00 ` Vagrant Cascadian
2022-01-22 0:37 ` Ludovic Courtès
2022-01-22 2:53 ` Maxim Cournoyer
2022-01-25 11:15 ` Ludovic Courtès
2022-01-25 11:20 ` Oliver Propst
2022-01-25 11:22 ` Oliver Propst
2022-01-22 4:46 ` raingloom
2022-01-22 7:55 ` Ricardo Wurmus
2022-01-24 15:48 ` Ludovic Courtès
2022-01-24 17:03 ` Ricardo Wurmus
2022-02-02 16:14 ` Maxim Cournoyer
2022-02-05 11:15 ` Ludovic Courtès
2022-01-25 23:45 ` Ryan Prior
2022-02-05 11:18 ` Ludovic Courtès
2022-02-06 13:27 ` André A. Gomes
-- strict thread matches above, loose matches on Subject: below --
2022-12-02 17:58 antoine.romain.dumont
2022-12-02 18:22 ` Antoine R. Dumont (@ardumont)
2022-12-03 18:19 ` Ludovic Courtès
2022-12-04 16:35 ` Antoine R. Dumont (@ardumont)
2022-12-06 10:01 ` Ludovic Courtès
2022-12-06 12:59 ` zimoun
2022-12-06 18:27 ` (
2022-12-08 15:41 ` Ludovic Courtès
2022-12-09 10:05 ` Antoine R. Dumont (@ardumont)
2022-12-09 18:05 ` zimoun
2022-12-11 10:22 ` Ludovic Courtès
2022-12-15 17:03 ` Antoine R. Dumont (@ardumont)
2022-12-19 21:25 ` Ludovic Courtès
2022-12-19 22:44 ` zimoun
2022-12-20 11:13 ` Antoine R. Dumont (@ardumont)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8735lh5ukw.fsf@inria.fr \
--to=ludo@gnu.org \
--cc=guix-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.