all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: "Antoine R. Dumont (@ardumont)" <antoine.romain.dumont@gmail.com>
Cc: guix-devel@gnu.org
Subject: Re: File search
Date: Sat, 03 Dec 2022 19:19:20 +0100	[thread overview]
Message-ID: <87lenonydz.fsf@gnu.org> (raw)
In-Reply-To: <87mt85r7h8.fsf@gmail.com> (Antoine R. Dumont's message of "Fri,  02 Dec 2022 19:22:27 +0100")

[-- Attachment #1: Type: text/plain, Size: 1541 bytes --]

Hi Antoine,

"Antoine R. Dumont (@ardumont)" <antoine.romain.dumont@gmail.com>
skribis:

> After toying a bit with the initial code, I took the liberty to make it
> a guix extension (we discussed it a bit with @zimoun). It was mostly to
> get started with Guile (I know some lisp implems but not this one so i
> had to familiarize myself with tools and whatnot ;). Anyway, that can be
> reverted if you feel like it can be integrated as a Guix cli directly.
>
> Currently, the implementation scans and indexes whatever package is
> present in the local store of the machine's user. From nix/guix's
> design, it makes sense to do it that way as it's likely that even though
> you don't have all the tools locally, it may be already present as a
> dependency of some high level tools you already use (it's just not
> exposed because not declared in config.scm or home-configuration.scm).
>
> You will find inlines (at the bottom) some cli usage calls [1] and the
> current implementation [2].

Yay, nice work!

I toyed a bit with your code and that gave me an idea: instead of the
costly ‘fold-packages’ + ‘package-derivation’, we can iterate over all
the manifests on the system and index packages they refer to.  That way,
no need to talk to the daemon, computer derivations, etc.  Should be
faster, though of course it still needs to traverse those directories.

Please find attached a modified version that illustrates that.  (We’ll
need version control at some point.  :-))

Thanks,
Ludo’.


[-- Attachment #2: the code --]
[-- Type: text/plain, Size: 14030 bytes --]

;;; GNU Guix --- Functional package management for GNU
;;; Copyright © 2022 Ludovic Courtès <ludo@gnu.org>
;;;
;;; This file is part of GNU Guix.
;;;
;;; GNU Guix is free software; you can redistribute it and/or modify it
;;; under the terms of the GNU General Public License as published by
;;; the Free Software Foundation; either version 3 of the License, or (at
;;; your option) any later version.
;;;
;;; GNU Guix is distributed in the hope that it will be useful, but
;;; WITHOUT ANY WARRANTY; without even the implied warranty of
;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;;; GNU General Public License for more details.
;;;
;;; You should have received a copy of the GNU General Public License
;;; along with GNU Guix.  If not, see <http://www.gnu.org/licenses/>.

(define-module (guix extensions index)
  #:use-module (guix config)  ;; %guix-package-name, ...
  #:use-module (guix ui)      ;; display G_
  #:use-module (guix scripts)
  #:use-module (sqlite3)
  #:use-module (ice-9 match)
  #:use-module (guix describe)
  #:use-module (guix store)
  #:use-module (guix monads)
  #:autoload   (guix combinators) (fold2)
  #:autoload   (guix grafts) (%graft?)
  #:autoload   (guix store roots) (gc-roots)
  #:use-module (guix derivations)
  #:use-module (guix packages)
  #:use-module (guix profiles)
  #:use-module (guix sets)
  #:use-module ((guix utils) #:select (cache-directory))
  #:autoload   (guix build utils) (find-files)
  #:autoload   (gnu packages) (fold-packages)
  #:use-module (srfi srfi-1)
  #:use-module (srfi srfi-9)
  #:use-module (srfi srfi-71)
  #:export     (guix-index))

(define debug #f)

(define schema
  "
create table if not exists Packages (
  id        integer primary key autoincrement not null,
  name      text not null,
  version   text not null,
  unique    (name, version) -- add uniqueness constraint
);

create table if not exists Directories (
  id        integer primary key autoincrement not null,
  name      text not null,
  package   integer not null,
  foreign key (package) references Packages(id) on delete cascade,
  unique (name, package) -- add uniqueness constraint
);

create table if not exists Files (
  name      text not null,
  basename  text not null,
  directory integer not null,
  foreign key (directory) references Directories(id) on delete cascade
  unique (name, basename, directory) -- add uniqueness constraint
);

create index if not exists IndexFiles on Files(basename);")

(define (call-with-database file proc)
  (let ((db (sqlite-open file)))
    (dynamic-wind
      (lambda () #t)
      (lambda ()
        (sqlite-exec db schema)
        (proc db))
      (lambda ()
        (sqlite-close db)))))

(define (insert-files db package version directories)
  "Insert files from DIRECTORIES as belonging to PACKAGE at VERSION."
  (define stmt-select-package
    (sqlite-prepare db "\
SELECT id FROM Packages WHERE name = :name AND version = :version;"
                    #:cache? #t))

  (define stmt-insert-package
    (sqlite-prepare db "\
INSERT OR IGNORE INTO Packages(name, version) -- to avoid spurious writes
VALUES (:name, :version);"
                    #:cache? #t))

  (define stmt-select-directory
    (sqlite-prepare db "\
SELECT id FROM Directories WHERE name = :name AND package = :package;"
                    #:cache? #t))

  (define stmt-insert-directory
    (sqlite-prepare db "\
INSERT OR IGNORE INTO Directories(name, package) -- to avoid spurious writes
VALUES (:name, :package);"
                    #:cache? #t))

  (define stmt-insert-file
    (sqlite-prepare db "\
INSERT OR IGNORE INTO Files(name, basename, directory)
VALUES (:name, :basename, :directory);"
                    #:cache? #t))

  (sqlite-exec db "begin immediate;")
  (sqlite-bind-arguments stmt-insert-package
                         #:name package
                         #:version version)
  (sqlite-fold (const #t) #t stmt-insert-package)

  (sqlite-bind-arguments stmt-select-package
                         #:name package
                         #:version version)
  (match (sqlite-fold cons '() stmt-select-package)
    ((#(package-id))
     (when debug
       (format #t "(pkg, version, pkg-id): (~a, ~a, ~a)"
               package version package-id))
     (pk 'package package-id package)
     (for-each (lambda (directory)
                 (define (strip file)
                   (string-drop file (+ (string-length directory) 1)))

                 (sqlite-reset stmt-insert-directory)
                 (sqlite-bind-arguments stmt-insert-directory
                                        #:name directory
                                        #:package package-id)
                 (sqlite-fold (const #t) #t stmt-insert-directory)

                 (sqlite-reset stmt-select-directory)
                 (sqlite-bind-arguments stmt-select-directory
                                        #:name directory
                                        #:package package-id)
                 (match (sqlite-fold cons '() stmt-select-directory)
                   ((#(directory-id))
                    (when debug
                      (format #t "(name, package, dir-id): (~a, ~a, ~a)\n"
                              directory package-id directory-id))
                    (for-each (lambda (file)
                                ;; If DIRECTORY is a symlink, (find-files
                                ;; DIRECTORY) returns the DIRECTORY singleton.
                                (unless (string=? file directory)
                                  (sqlite-reset stmt-insert-file)
                                  (sqlite-bind-arguments stmt-insert-file
                                                         #:name (strip file)
                                                         #:basename
                                                         (basename file)
                                                         #:directory
                                                         directory-id)
                                  (sqlite-fold (const #t) #t stmt-insert-file)))
                              (find-files directory)))))
               directories)))
  (sqlite-exec db "commit;"))

(define (insert-package db package)
  "Insert all the files of PACKAGE into DB."
  (mlet %store-monad ((drv (package->derivation package #:graft? #f)))
    (match (derivation->output-paths drv)
      (((labels . directories) ...)
       (when (every file-exists? directories)
         (insert-files db (package-name package) (package-version package)
                       directories))
       (return #t)))))

(define (filter-public-current-supported package)
  "Filter supported, not hidden (public) and not superseded (current) package."
  (and (not (hidden-package? package))
       (not (package-superseded package))
       (supported-package? package)))

(define (filter-supported-package package)
  "Filter supported package (package might be hidden or superseded)."
  (and (supported-package? package)))

(define (no-filter package) "No filtering on package" #t)

(define* (insert-packages db #:optional (filter-policy filter-public-current-supported))
  "Insert all current packages matching `filter-package-policy` into DB."
  (with-store store
    (parameterize ((%graft? #f))
      (fold-packages (lambda (package _)
                       (run-with-store store
                         (insert-package db package)))
                     #t
                     #:select? filter-policy))))

\f
;;;
;;; Indexing from local profiles.
;;;

(define (all-profiles)
  "Return the list of profiles on the system."
  (delete-duplicates
   (filter-map (lambda (root)
                 (if (file-exists? (string-append root "/manifest"))
                     root
                     (let ((root (string-append root "/profile")))
                       (and (file-exists? (string-append root "/manifest"))
                            root))))
               (gc-roots))))

(define (profiles->manifest-entries profiles)
  "Return manifest entries for all of PROFILES, without duplicates."
  (let loop ((visited (set))
             (profiles profiles)
             (entries '()))
    (match profiles
      (()
       entries)
      ((profile . rest)
       (let* ((manifest (profile-manifest profile))
              (entries visited
                       (fold2 (lambda (entry lst visited)
                                (let ((item (manifest-entry-item entry)))
                                  (if (set-contains? visited item)
                                      (values lst visited)
                                      (values (cons entry lst)
                                              (set-insert item
                                                          visited)))))
                              entries
                              visited
                              (manifest-transitive-entries manifest))))
         (loop visited rest entries))))))

(define (insert-manifest-entry db entry)
  "Insert ENTRY, a manifest entry, into DB."
  (insert-files db (manifest-entry-name entry)
                (manifest-entry-version entry)
                (list (manifest-entry-item entry)))) ;FIXME: outputs?

(define (index-manifests db-file)
  "Insert into DB-FILE entries for packages that appear in manifests
available on the system."
  (call-with-database db-file
    (lambda (db)
      (for-each (lambda (entry)
                  (insert-manifest-entry db entry))
                (let ((lst (profiles->manifest-entries (all-profiles))))
                  (pk 'entries (length lst))
                  lst)))))

\f
;;;
;;; Search.
;;;

(define-record-type <package-match>
  (package-match name version file)
  package-match?
  (name      package-match-name)
  (version   package-match-version)
  (file      package-match-file))

(define (matching-packages db file)
  "Return unique <package-match> corresponding to packages containing FILE."
  (define lookup-stmt
    (sqlite-prepare db "\
SELECT Packages.name, Packages.version, Directories.name, Files.name
FROM Packages
INNER JOIN Files, Directories
ON files.basename = :file
  AND directories.id = files.directory
  AND packages.id = directories.package;"))

  (sqlite-bind-arguments lookup-stmt #:file file)
  (sqlite-fold (lambda (result lst)
                 (match result
                   (#(package version directory file)
                    (cons (package-match package version
                                         (string-append directory "/" file))
                          lst))))
               '() lookup-stmt))

\f


(define (index-packages-with-db db-pathname)
  "Index packages using db at location DB-PATHNAME."
  (call-with-database db-pathname
    (lambda (db)
      (insert-packages db no-filter))))

(define (matching-packages-with-db db-pathname file)
  "Compute list of packages referencing FILE using db at DB-PATHNAME."
  (call-with-database db-pathname
    (lambda (db)
      (matching-packages db file))))

(define (print-matching-results matches)
  "Print the MATCHES matching results."
  (for-each (lambda (result)
              (format #t "~20a ~a~%"
                      (string-append (package-match-name result)
                                     "@" (package-match-version result))
                      (package-match-file result)))
            matches))

(define default-db-path
  (string-append (cache-directory #:ensure? #f)
                 "/index/db.sqlite"))

(define (show-help)
  (display (G_ "Usage: guix index [OPTIONS...] [search FILE...]
Without FILE, index (package, file) relationships in the local store.
With 'search FILE', search for packages installing FILEx;x.\n
Note: The internal cache is located at ~/.config/guix/locate-db.sqlite.
See --db-path for customization.\n"))
  (newline)
  (display (G_ "The valid values for OPTIONS are:"))
  (newline)
  (display (G_ "
  -h, --help      Display this help and exit"))
  (display (G_ "
  -V, --version   Display version information and exit"))
  (display (G_ "
  --db-path=DIR   Change default location of the cache db"))
  (newline)
  (newline)
  (display (G_ "The valid values for ARGS are:"))
  (newline)
  (display (G_ "
  search FILE     Search for packages installing the FILE (from cache db)"))
  (newline)
  (show-bug-report-information))

(define-command (guix-index . args)
  (category extension)
  (synopsis "Index packages to allow searching package for a given filename")

  (define (parse-db-args args)
    "Parsing of string key=value where we are only interested in  'value'"
    (match (string-split args #\=)
      ((unused db-path)
       db-path)
      (_ #f)))

  (define (display-help-and-exit)
    (show-help)
    (exit 0))

  (match args
    ((or ("-h") ("--help"))
     (display-help-and-exit))
    ((or ("-V") ("--version"))
     (show-version-and-exit "guix locate"))
    ((db-path-args)
     (let ((db-path (parse-db-args db-path-args)))
       (if db-path
           (index-packages-with-db db-path)
           (display-help-and-exit))))
    (("search" file)
     (let ((matches (matching-packages-with-db default-db-path file)))
       (print-matching-results matches)
       (exit (pair? matches))))
    ((db-path-args "search" file)
     (let ((db-path (parse-db-args db-path-args)))
       (if db-path
           (let ((matches (matching-packages-with-db db-path file)))
             (print-matching-results matches)
             (exit (pair? matches)))
           (display-help-and-exit))))
    (_  ;; index by default
     ;; (index-packages-with-db default-db-path)
     (index-manifests default-db-path)
     )))

  reply	other threads:[~2022-12-03 18:19 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-02 17:58 File search antoine.romain.dumont
2022-12-02 18:22 ` Antoine R. Dumont (@ardumont)
2022-12-03 18:19   ` Ludovic Courtès [this message]
2022-12-04 16:35     ` Antoine R. Dumont (@ardumont)
2022-12-06 10:01       ` Ludovic Courtès
2022-12-06 12:59         ` zimoun
2022-12-06 18:27         ` (
2022-12-08 15:41           ` Ludovic Courtès
2022-12-09 10:05         ` Antoine R. Dumont (@ardumont)
2022-12-09 18:05           ` zimoun
2022-12-11 10:22           ` Ludovic Courtès
2022-12-15 17:03             ` Antoine R. Dumont (@ardumont)
2022-12-19 21:25               ` Ludovic Courtès
2022-12-19 22:44                 ` zimoun
2022-12-20 11:13                 ` Antoine R. Dumont (@ardumont)
  -- strict thread matches above, loose matches on Subject: below --
2022-01-21  9:03 Ludovic Courtès
2022-01-21 10:35 ` Mathieu Othacehe
2022-01-22  0:35   ` Ludovic Courtès
2022-01-21 19:00 ` Vagrant Cascadian
2022-01-22  0:37   ` Ludovic Courtès
2022-01-22  2:53     ` Maxim Cournoyer
2022-01-25 11:15       ` Ludovic Courtès
2022-01-25 11:20         ` Oliver Propst
2022-01-25 11:22           ` Oliver Propst
2022-01-22  4:46 ` raingloom
2022-01-22  7:55   ` Ricardo Wurmus
2022-01-24 15:48     ` Ludovic Courtès
2022-01-24 17:03       ` Ricardo Wurmus
2022-02-02 16:14         ` Maxim Cournoyer
2022-02-05 11:15           ` Ludovic Courtès
2022-01-25 23:45 ` Ryan Prior
2022-02-05 11:18   ` Ludovic Courtès
2022-02-06 13:27 ` André A. Gomes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lenonydz.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=antoine.romain.dumont@gmail.com \
    --cc=guix-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.