From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id +FNHNWWH6mHp4QAAgWs5BA (envelope-from ) for ; Fri, 21 Jan 2022 11:13:57 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id oPK3MmWH6mFofgEA9RJhRA (envelope-from ) for ; Fri, 21 Jan 2022 11:13:57 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 4DCB636062 for ; Fri, 21 Jan 2022 11:13:57 +0100 (CET) Received: from localhost ([::1]:42358 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nAqvc-000266-F6 for larch@yhetil.org; Fri, 21 Jan 2022 05:13:56 -0500 Received: from eggs.gnu.org ([209.51.188.92]:56360) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nAppp-0006xh-8E for guix-devel@gnu.org; Fri, 21 Jan 2022 04:03:58 -0500 Received: from hera.aquilenet.fr ([185.233.100.1]:55632) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nAppm-0002Cx-K6 for guix-devel@gnu.org; Fri, 21 Jan 2022 04:03:52 -0500 Received: from localhost (localhost [127.0.0.1]) by hera.aquilenet.fr (Postfix) with ESMTP id A36268E5 for ; Fri, 21 Jan 2022 10:03:46 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at aquilenet.fr Received: from hera.aquilenet.fr ([127.0.0.1]) by localhost (hera.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9nHJjMSeEdkk for ; Fri, 21 Jan 2022 10:03:44 +0100 (CET) Received: from ribbon (91-160-117-201.subs.proxad.net [91.160.117.201]) by hera.aquilenet.fr (Postfix) with ESMTPSA id BFCA08B for ; Fri, 21 Jan 2022 10:03:43 +0100 (CET) From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Guix Devel Subject: File search X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 2 =?utf-8?Q?Pluvi=C3=B4se?= an 230 de la =?utf-8?Q?R?= =?utf-8?Q?=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Fri, 21 Jan 2022 10:03:43 +0100 Message-ID: <8735lh5ukw.fsf@inria.fr> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spamd-Bar: ++ X-Rspamd-Server: hera X-Rspamd-Queue-Id: A36268E5 X-Spamd-Result: default: False [2.15 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[multipart/mixed,text/plain]; PREVIOUSLY_DELIVERED(0.00)[guix-devel@gnu.org]; RCPT_COUNT_ONE(0.00)[1]; TO_DN_ALL(0.00)[]; CTYPE_MIXED_BOGUS(1.00)[]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:+]; R_MIXED_CHARSET(1.25)[subject]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; TO_DOM_EQ_FROM_DOM(0.00)[] Received-SPF: softfail client-ip=185.233.100.1; envelope-from=ludo@gnu.org; helo=hera.aquilenet.fr X-Spam_score_int: -11 X-Spam_score: -1.2 X-Spam_bar: - X-Spam_report: (-1.2 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_SOFTFAIL=0.665, T_FILL_THIS_FORM_SHORT=0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1642760037; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:list-id:list-help:list-unsubscribe: list-subscribe:list-post; bh=RMZk7EzsFGhixjCy2GSiBnaUnNZCuVxiROsSRX1zLD8=; b=UvBzRZa7ZELaqhxp4ArV7auLZgsztGfc3PsddGOl4yVYi5JAhco3MYVQh/+x71Dm2Rrp6P np4bMxcyRFTbuIKMkOjE9oPLIfEjDV1AtLwm8Er8qwt//ABu/eB0rXUtijWuOBxjfZhmQU +DS4Rq3uMAEX340kDuWFUTiEgV4HsNYzysPe8QRABNa8SPjA3Fc6F+CjGaK3A0Yp+swcSE vAPJXPzEHfmchdjCn+C8vkvITw81zocP/4BaBRVVvDKyDCFvohZ7TC2M2RZmE6UUObpwVl HqML8sCivuwV6CtIPyTva9MkFoX3PZ6b9hxQNwEtmR0Xdy/J7ZU5ht3G9CRxIg== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1642760037; a=rsa-sha256; cv=none; b=Xbf3ApJDnWoqwd+DAZv/la4ckj9QFULwR/mVbxLxrEvIN0oaZWklpYTvKENSL4UM7QLsbr Q29qc/ZGn4g+hI95rUZSV+q4GtBDJYvai9BrIi0I4nzNWM2QvYZsUbwZX7R5KGGigIIeBM YSd9E54kZTAGqAUs33GLelUQpUx0YDn2NtkgfaYu5IiKaermonm5wQBhq/46utpmbaSLnI kQD3ZPk9EF/7QufIx7MLsGKPp7MLDDYCXNu88WjMq+VGg0EYcls7FgFz7QcU/6D+MWypQF mrnaIEJJVNW6RLxRZRDRZjcdENjc1shK6TgZ9oYGDC0XfOaO8MdYlksvnXiExg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -3.62 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 4DCB636062 X-Spam-Score: -3.62 X-Migadu-Scanner: scn1.migadu.com X-TUID: W8FRYUz3rrgd --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hello Guix! Lately I found myself going several times to to look for packages providing a given file and I thought it=E2=80=99s time to do something about it. The script below creates an SQLite database for the current set of packages, but only for those already in the store: guix repl file-database.scm populate That creates /tmp/db; it took about 25mn on berlin, for 18K packages. Then you can run, say: guix repl file-database.scm search boot-9.scm to find which packages provide a file named =E2=80=98boot-9.scm=E2=80=99. = That part is instantaneous. The database for 18K packages is quite big: --8<---------------cut here---------------start------------->8--- $ du -h /tmp/db* 389M /tmp/db 82M /tmp/db.gz 61M /tmp/db.zst --8<---------------cut here---------------end--------------->8--- How do we expose that information? There are several criteria I can think of: accuracy, freshness, privacy, responsiveness, off-line operation. I think accuracy (making sure you get results that correspond precisely to, say, your current channel revisions and your current system) is not a high priority: some result is better than no result. Likewise for freshness: results for an older version of a given package may still be valid now. In terms of privacy, I think it=E2=80=99s better if we can avoid making one request per file searched for. Off-line operation would be sweet, and it comes with responsiveness; fast off-line search is necessary for things like =E2=80=98command-not-found=E2=80=99 (where the shell tells you = what package to install when a command is not found). Based on that, it is tempting to just distribute a full database from ci.guix, say, that the client command would regularly fetch. The downside is that that=E2=80=99s quite a lot of data to download; if you use= the file search command infrequently, you might find yourself spending more time downloading the database than actually searching it. We could have a hybrid solution: distribute a database that contains only files in /bin and /sbin (it should be much smaller), and for everything else, resort to a web service (the Data Service could be extended to include file lists). That way, we=E2=80=99d have fast privacy-respecting search for command names, and on-line search for everything else. Thoughts? Ludo=E2=80=99. --=-=-= Content-Type: text/plain; charset=utf-8 Content-Disposition: inline; filename=file-database.scm Content-Transfer-Encoding: quoted-printable Content-Description: The file database tool ;;; GNU Guix --- Functional package management for GNU ;;; Copyright =C2=A9 2022 Ludovic Court=C3=A8s ;;; ;;; This file is part of GNU Guix. ;;; ;;; GNU Guix is free software; you can redistribute it and/or modify it ;;; under the terms of the GNU General Public License as published by ;;; the Free Software Foundation; either version 3 of the License, or (at ;;; your option) any later version. ;;; ;;; GNU Guix is distributed in the hope that it will be useful, but ;;; WITHOUT ANY WARRANTY; without even the implied warranty of ;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;;; GNU General Public License for more details. ;;; ;;; You should have received a copy of the GNU General Public License ;;; along with GNU Guix. If not, see . (define-module (file-database) #:use-module (sqlite3) #:use-module (ice-9 match) #:use-module (guix store) #:use-module (guix monads) #:autoload (guix grafts) (%graft?) #:use-module (guix derivations) #:use-module (guix packages) #:autoload (guix build utils) (find-files) #:autoload (gnu packages) (fold-packages) #:use-module (srfi srfi-1) #:use-module (srfi srfi-9) #:export (file-database)) (define schema " create table if not exists Packages ( id integer primary key autoincrement not null, name text not null, version text not null ); create table if not exists Directories ( id integer primary key autoincrement not null, name text not null, package integer not null, foreign key (package) references Packages(id) on delete cascade ); create table if not exists Files ( name text not null, basename text not null, directory integer not null, foreign key (directory) references Directories(id) on delete cascade ); create index if not exists IndexFiles on Files(basename);") (define (call-with-database file proc) (let ((db (sqlite-open file))) (dynamic-wind (lambda () #t) (lambda () (sqlite-exec db schema) (proc db)) (lambda () (sqlite-close db))))) (define (insert-files db package version directories) "Insert the files contained in DIRECTORIES as belonging to PACKAGE at VERSION." (define last-row-id-stmt (sqlite-prepare db "SELECT last_insert_rowid();" #:cache? #t)) (define package-stmt (sqlite-prepare db "\ INSERT OR REPLACE INTO Packages(name, version) VALUES (:name, :version);" #:cache? #t)) (define directory-stmt (sqlite-prepare db "\ INSERT INTO Directories(name, package) VALUES (:name, :package);" #:cache? #t)) (define file-stmt (sqlite-prepare db "\ INSERT INTO Files(name, basename, directory) VALUES (:name, :basename, :directory);" #:cache? #t)) (sqlite-exec db "begin immediate;") (sqlite-bind-arguments package-stmt #:name package #:version version) (sqlite-fold (const #t) #t package-stmt) (match (sqlite-fold cons '() last-row-id-stmt) ((#(package-id)) (pk 'package package-id package) (for-each (lambda (directory) (define (strip file) (string-drop file (+ (string-length directory) 1))) (sqlite-reset directory-stmt) (sqlite-bind-arguments directory-stmt #:name directory #:package package-id) (sqlite-fold (const #t) #t directory-stmt) (match (sqlite-fold cons '() last-row-id-stmt) ((#(directory-id)) (for-each (lambda (file) ;; If DIRECTORY is a symlink, (find-files ;; DIRECTORY) returns the DIRECTORY singlet= on. (unless (string=3D? file directory) (sqlite-reset file-stmt) (sqlite-bind-arguments file-stmt #:name (strip file) #:basename (basename file) #:directory directory-id) (sqlite-fold (const #t) #t file-stmt))) (find-files directory))))) directories) (sqlite-exec db "commit;")))) (define (insert-package db package) "Insert all the files of PACKAGE into DB." (mlet %store-monad ((drv (package->derivation package #:graft? #f))) (match (derivation->output-paths drv) (((labels . directories) ...) (when (every file-exists? directories) (insert-files db (package-name package) (package-version package) directories)) (return #t))))) (define (insert-packages db) "Insert all the current packages into DB." (with-store store (parameterize ((%graft? #f)) (fold-packages (lambda (package _) (run-with-store store (insert-package db package))) #t #:select? (lambda (package) (and (not (hidden-package? package)) (not (package-superseded package)) (supported-package? package))))))) (define-record-type (package-match name version file) package-match? (name package-match-name) (version package-match-version) (file package-match-file)) (define (matching-packages db file) "Return a list of corresponding to packages containing FILE." (define lookup-stmt (sqlite-prepare db "\ SELECT Packages.name, Packages.version, Directories.name, Files.name FROM Packages INNER JOIN Files, Directories ON files.basename =3D :file AND directories.id =3D files.directory AND pack= ages.id =3D directories.package;")) (sqlite-bind-arguments lookup-stmt #:file file) (sqlite-fold (lambda (result lst) (match result (#(package version directory file) (cons (package-match package version (string-append directory "/" file)) lst)))) '() lookup-stmt)) (define (file-database . args) (match args ((_ "populate") (call-with-database "/tmp/db" (lambda (db) (insert-packages db)))) ((_ "search" file) (let ((matches (call-with-database "/tmp/db" (lambda (db) (matching-packages db file))))) (for-each (lambda (result) (format #t "~20a ~a~%" (string-append (package-match-name result) "@" (package-match-version result= )) (package-match-file result))) matches) (exit (pair? matches)))) (_ (format (current-error-port) "usage: file-database [populate|search] args ...~%") (exit 1)))) (apply file-database (command-line)) --=-=-=--