all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: Guix Devel <guix-devel@gnu.org>
Subject: File search
Date: Fri, 21 Jan 2022 10:03:43 +0100	[thread overview]
Message-ID: <8735lh5ukw.fsf@inria.fr> (raw)

[-- Attachment #1: Type: text/plain, Size: 2372 bytes --]

Hello Guix!

Lately I found myself going several times to
<https://packages.debian.org> to look for packages providing a given
file and I thought it’s time to do something about it.

The script below creates an SQLite database for the current set of
packages, but only for those already in the store:

  guix repl file-database.scm populate

That creates /tmp/db; it took about 25mn on berlin, for 18K packages.
Then you can run, say:

  guix repl file-database.scm search boot-9.scm

to find which packages provide a file named ‘boot-9.scm’.  That part is
instantaneous.

The database for 18K packages is quite big:

--8<---------------cut here---------------start------------->8---
$ du -h /tmp/db*
389M    /tmp/db
82M     /tmp/db.gz
61M     /tmp/db.zst
--8<---------------cut here---------------end--------------->8---

How do we expose that information?  There are several criteria I can
think of: accuracy, freshness, privacy, responsiveness, off-line
operation.

I think accuracy (making sure you get results that correspond precisely
to, say, your current channel revisions and your current system) is not
a high priority: some result is better than no result.  Likewise for
freshness: results for an older version of a given package may still be
valid now.

In terms of privacy, I think it’s better if we can avoid making one
request per file searched for.  Off-line operation would be sweet, and
it comes with responsiveness; fast off-line search is necessary for
things like ‘command-not-found’ (where the shell tells you what package
to install when a command is not found).

Based on that, it is tempting to just distribute a full database from
ci.guix, say, that the client command would regularly fetch.  The
downside is that that’s quite a lot of data to download; if you use the
file search command infrequently, you might find yourself spending more
time downloading the database than actually searching it.

We could have a hybrid solution: distribute a database that contains
only files in /bin and /sbin (it should be much smaller), and for
everything else, resort to a web service (the Data Service could be
extended to include file lists).  That way, we’d have fast
privacy-respecting search for command names, and on-line search for
everything else.

Thoughts?

Ludo’.


[-- Attachment #2: The file database tool --]
[-- Type: text/plain, Size: 7549 bytes --]

;;; GNU Guix --- Functional package management for GNU
;;; Copyright © 2022 Ludovic Courtès <ludo@gnu.org>
;;;
;;; This file is part of GNU Guix.
;;;
;;; GNU Guix is free software; you can redistribute it and/or modify it
;;; under the terms of the GNU General Public License as published by
;;; the Free Software Foundation; either version 3 of the License, or (at
;;; your option) any later version.
;;;
;;; GNU Guix is distributed in the hope that it will be useful, but
;;; WITHOUT ANY WARRANTY; without even the implied warranty of
;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;;; GNU General Public License for more details.
;;;
;;; You should have received a copy of the GNU General Public License
;;; along with GNU Guix.  If not, see <http://www.gnu.org/licenses/>.

(define-module (file-database)
  #:use-module (sqlite3)
  #:use-module (ice-9 match)
  #:use-module (guix store)
  #:use-module (guix monads)
  #:autoload   (guix grafts) (%graft?)
  #:use-module (guix derivations)
  #:use-module (guix packages)
  #:autoload   (guix build utils) (find-files)
  #:autoload   (gnu packages) (fold-packages)
  #:use-module (srfi srfi-1)
  #:use-module (srfi srfi-9)
  #:export (file-database))

(define schema
  "
create table if not exists Packages (
  id        integer primary key autoincrement not null,
  name      text not null,
  version   text not null
);

create table if not exists Directories (
  id        integer primary key autoincrement not null,
  name      text not null,
  package   integer not null,
  foreign key (package) references Packages(id) on delete cascade
);

create table if not exists Files (
  name      text not null,
  basename  text not null,
  directory integer not null,
  foreign key (directory) references Directories(id) on delete cascade
);

create index if not exists IndexFiles on Files(basename);")

(define (call-with-database file proc)
  (let ((db (sqlite-open file)))
    (dynamic-wind
      (lambda () #t)
      (lambda ()
        (sqlite-exec db schema)
        (proc db))
      (lambda ()
        (sqlite-close db)))))

(define (insert-files db package version directories)
  "Insert the files contained in DIRECTORIES as belonging to PACKAGE at
VERSION."
  (define last-row-id-stmt
    (sqlite-prepare db "SELECT last_insert_rowid();"
                    #:cache? #t))

  (define package-stmt
    (sqlite-prepare db "\
INSERT OR REPLACE INTO Packages(name, version)
VALUES (:name, :version);"
                    #:cache? #t))

  (define directory-stmt
    (sqlite-prepare db "\
INSERT INTO Directories(name, package) VALUES (:name, :package);"
                    #:cache? #t))

  (define file-stmt
    (sqlite-prepare db "\
INSERT INTO Files(name, basename, directory)
VALUES (:name, :basename, :directory);"
                    #:cache? #t))

  (sqlite-exec db "begin immediate;")
  (sqlite-bind-arguments package-stmt
                         #:name package
                         #:version version)
  (sqlite-fold (const #t) #t package-stmt)
  (match (sqlite-fold cons '() last-row-id-stmt)
    ((#(package-id))
     (pk 'package package-id package)
     (for-each (lambda (directory)
                 (define (strip file)
                   (string-drop file (+ (string-length directory) 1)))

                 (sqlite-reset directory-stmt)
                 (sqlite-bind-arguments directory-stmt
                                        #:name directory
                                        #:package package-id)
                 (sqlite-fold (const #t) #t directory-stmt)

                 (match (sqlite-fold cons '() last-row-id-stmt)
                   ((#(directory-id))
                    (for-each (lambda (file)
                                ;; If DIRECTORY is a symlink, (find-files
                                ;; DIRECTORY) returns the DIRECTORY singleton.
                                (unless (string=? file directory)
                                  (sqlite-reset file-stmt)
                                  (sqlite-bind-arguments file-stmt
                                                         #:name (strip file)
                                                         #:basename
                                                         (basename file)
                                                         #:directory
                                                         directory-id)
                                  (sqlite-fold (const #t) #t file-stmt)))
                              (find-files directory)))))
               directories)
     (sqlite-exec db "commit;"))))

(define (insert-package db package)
  "Insert all the files of PACKAGE into DB."
  (mlet %store-monad ((drv (package->derivation package #:graft? #f)))
    (match (derivation->output-paths drv)
      (((labels . directories) ...)
       (when (every file-exists? directories)
         (insert-files db (package-name package) (package-version package)
                       directories))
       (return #t)))))

(define (insert-packages db)
  "Insert all the current packages into DB."
  (with-store store
    (parameterize ((%graft? #f))
      (fold-packages (lambda (package _)
                       (run-with-store store
                         (insert-package db package)))
                     #t
                     #:select? (lambda (package)
                                 (and (not (hidden-package? package))
                                      (not (package-superseded package))
                                      (supported-package? package)))))))

(define-record-type <package-match>
  (package-match name version file)
  package-match?
  (name      package-match-name)
  (version   package-match-version)
  (file      package-match-file))

(define (matching-packages db file)
  "Return a list of <package-match> corresponding to packages containing
FILE."
  (define lookup-stmt
    (sqlite-prepare db "\
SELECT Packages.name, Packages.version, Directories.name, Files.name
FROM Packages
INNER JOIN Files, Directories
ON files.basename = :file AND directories.id = files.directory AND packages.id = directories.package;"))

  (sqlite-bind-arguments lookup-stmt #:file file)
  (sqlite-fold (lambda (result lst)
                 (match result
                   (#(package version directory file)
                    (cons (package-match package version
                                         (string-append directory "/" file))
                          lst))))
               '() lookup-stmt))

\f
(define (file-database . args)
  (match args
    ((_ "populate")
     (call-with-database "/tmp/db"
       (lambda (db)
         (insert-packages db))))
    ((_ "search" file)
     (let ((matches (call-with-database "/tmp/db"
                      (lambda (db)
                        (matching-packages db file)))))
       (for-each (lambda (result)
                   (format #t "~20a ~a~%"
                           (string-append (package-match-name result)
                                          "@" (package-match-version result))
                           (package-match-file result)))
                 matches)
       (exit (pair? matches))))
    (_
     (format (current-error-port)
             "usage: file-database [populate|search] args ...~%")
     (exit 1))))

(apply file-database (command-line))

             reply	other threads:[~2022-01-21 10:13 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-21  9:03 Ludovic Courtès [this message]
2022-01-21 10:35 ` File search Mathieu Othacehe
2022-01-22  0:35   ` Ludovic Courtès
2022-01-21 19:00 ` Vagrant Cascadian
2022-01-22  0:37   ` Ludovic Courtès
2022-01-22  2:53     ` Maxim Cournoyer
2022-01-25 11:15       ` Ludovic Courtès
2022-01-25 11:20         ` Oliver Propst
2022-01-25 11:22           ` Oliver Propst
2022-01-22  4:46 ` raingloom
2022-01-22  7:55   ` Ricardo Wurmus
2022-01-24 15:48     ` Ludovic Courtès
2022-01-24 17:03       ` Ricardo Wurmus
2022-02-02 16:14         ` Maxim Cournoyer
2022-02-05 11:15           ` Ludovic Courtès
2022-01-25 23:45 ` Ryan Prior
2022-02-05 11:18   ` Ludovic Courtès
2022-02-06 13:27 ` André A. Gomes
  -- strict thread matches above, loose matches on Subject: below --
2022-12-02 17:58 antoine.romain.dumont
2022-12-02 18:22 ` Antoine R. Dumont (@ardumont)
2022-12-03 18:19   ` Ludovic Courtès
2022-12-04 16:35     ` Antoine R. Dumont (@ardumont)
2022-12-06 10:01       ` Ludovic Courtès
2022-12-06 12:59         ` zimoun
2022-12-06 18:27         ` (
2022-12-08 15:41           ` Ludovic Courtès
2022-12-09 10:05         ` Antoine R. Dumont (@ardumont)
2022-12-09 18:05           ` zimoun
2022-12-11 10:22           ` Ludovic Courtès
2022-12-15 17:03             ` Antoine R. Dumont (@ardumont)
2022-12-19 21:25               ` Ludovic Courtès
2022-12-19 22:44                 ` zimoun
2022-12-20 11:13                 ` Antoine R. Dumont (@ardumont)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8735lh5ukw.fsf@inria.fr \
    --to=ludo@gnu.org \
    --cc=guix-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.