From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:470:142:3::10]:39100) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iuiXI-0001Ks-T9 for guix-patches@gnu.org; Thu, 23 Jan 2020 14:53:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iuiXG-0000Lj-Jv for guix-patches@gnu.org; Thu, 23 Jan 2020 14:53:04 -0500 Received: from debbugs.gnu.org ([209.51.188.43]:47235) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1iuiXG-0000LS-FP for guix-patches@gnu.org; Thu, 23 Jan 2020 14:53:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1iuiXG-0001A4-Cy for guix-patches@gnu.org; Thu, 23 Jan 2020 14:53:02 -0500 Subject: [bug#39258] Faster guix search using an sqlite cache Resent-Message-ID: Received: from eggs.gnu.org ([2001:470:142:3::10]:38921) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iuiWh-0008EY-Ap for guix-patches@gnu.org; Thu, 23 Jan 2020 14:52:30 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iuiWe-00087M-FS for guix-patches@gnu.org; Thu, 23 Jan 2020 14:52:26 -0500 Received: from mugam.systemreboot.net ([139.59.75.54]:60578) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iuiWd-00082X-4H for guix-patches@gnu.org; Thu, 23 Jan 2020 14:52:24 -0500 Received: from [192.168.2.1] (helo=steel) by systemreboot.net with esmtpsa (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1iuiWX-0009z3-Pm for guix-patches@gnu.org; Fri, 24 Jan 2020 01:22:17 +0530 From: Arun Isaac Date: Fri, 24 Jan 2020 01:21:57 +0530 Message-ID: MIME-Version: 1.0 Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+kyle=kyleam.com@gnu.org Sender: "Guix-patches" To: 39258@debbugs.gnu.org --==-=-= Content-Type: multipart/mixed; boundary="=-=-=" --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi, As discussed on guix-devel at https://lists.gnu.org/archive/html/guix-devel/2020-01/msg00310.html , I am working on an sqlite cache to improve guix search performance. I have attached a highly incomplete WIP patch. The patch attempts to reimplement the package-cache-file hook in guix/channels.scm using a sqlite database. To this end, it rewrites most of the generate-package-cache and cache-lookup functions in gnu/packages.scm. I am yet to hook this up to guix search. At the moment, I am having some difficulty populating the sqlite database. generate-package-cache populates the database correctly when invoked from a normal guile REPL using geiser, but fails to do so when run by the guix daemon during guix pull. I ran guix pull using $ ./pre-inst-env guix pull --url=3D$PWD --branch=3Dsearch -p /tmp/test where search is the branch I am working on. Running $ ls /tmp/test/lib/guix -lh shows total 2.1M =2Dr--r--r-- 2 root root 2.1M =E0=AE=9C=E0=AE=A9. 1 1970 package-cache.s= qlite =2Dr--r--r-- 2 root root 26K =E0=AE=9C=E0=AE=A9. 1 1970 package-cache.s= qlite-journal On examining package-cache.sqlite, I find that no records have been written. And, there is a lingering journal file that shouldn't be there. For some reason, populating the sqlite database does not work with guix pull. sqlite probably crashes and leaves the journal file. If I try to populate the database with each package record being inserted in its own transaction, at least some of the insertions work. But the journal file still lingers. My unverified guess is that everything except the last transaction was successful. Any ideas what's going on? Also, inserting each package in its own transaction is ridiculously slow and so that is out of the question. See https://www.sqlite.org/faq.html#q19 --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0001-fast-search.patch Content-Transfer-Encoding: quoted-printable From=20d1305351a90a84eb75e4769284d5e06927eade3e Mon Sep 17 00:00:00 2001 From: Arun Isaac Date: Tue, 21 Jan 2020 20:45:43 +0530 Subject: [PATCH] fast search =2D-- build-aux/build-self.scm | 5 + gnu/packages.scm | 207 +++++++++++++++++++++++---------------- 2 files changed, 128 insertions(+), 84 deletions(-) diff --git a/build-aux/build-self.scm b/build-aux/build-self.scm index fc13032b73..c123ad3b11 100644 =2D-- a/build-aux/build-self.scm +++ b/build-aux/build-self.scm @@ -264,6 +264,9 @@ interface (FFI) of Guile.") (define fake-git (scheme-file "git.scm" #~(define-module (git)))) =20 + (define fake-sqlite3 + (scheme-file "sqlite3.scm" #~(define-module (sqlite3)))) + (with-imported-modules `(((guix config) =3D> ,(make-config.scm)) =20 @@ -278,6 +281,8 @@ interface (FFI) of Guile.") ;; (git) to placate it. ((git) =3D> ,fake-git) =20 + ((sqlite3) =3D> ,fake-sqlite3) + ,@(source-module-closure `((guix store) (guix self) (guix derivations) diff --git a/gnu/packages.scm b/gnu/packages.scm index d22c992bb1..4e2c52e62d 100644 =2D-- a/gnu/packages.scm +++ b/gnu/packages.scm @@ -43,6 +43,7 @@ #:use-module (srfi srfi-34) #:use-module (srfi srfi-35) #:use-module (srfi srfi-39) + #:use-module (sqlite3) #:export (search-patch search-patches search-auxiliary-file @@ -204,10 +205,8 @@ PROC is called along these lines: PROC can use #:allow-other-keys to ignore the bits it's not interested in. When a package cache is available, this procedure does not actually load a= ny package module." =2D (define cache =2D (load-package-cache (current-profile))) =2D =2D (if (and cache (cache-is-authoritative?)) + (if (and (cache-is-authoritative?) + (current-profile)) (vhash-fold (lambda (name vector result) (match vector (#(name version module symbol outputs @@ -220,7 +219,7 @@ package module." #:supported? supported? #:deprecated? deprecated?)))) init =2D cache) + (cache-lookup (current-profile))) (fold-packages (lambda (package result) (proc (package-name package) (package-version package) @@ -252,31 +251,7 @@ is guaranteed to never traverse the same package twice= ." =20 (define %package-cache-file ;; Location of the package cache. =2D "/lib/guix/package.cache") =2D =2D(define load-package-cache =2D (mlambda (profile) =2D "Attempt to load the package cache. On success return a vhash keyed= by =2Dpackage names. Return #f on failure." =2D (match profile =2D (#f #f) =2D (profile =2D (catch 'system-error =2D (lambda () =2D (define lst =2D (load-compiled (string-append profile %package-cache-file))) =2D (fold (lambda (item vhash) =2D (match item =2D (#(name version module symbol outputs =2D supported? deprecated? =2D file line column) =2D (vhash-cons name item vhash)))) =2D vlist-null =2D lst)) =2D (lambda args =2D (if (=3D ENOENT (system-error-errno args)) =2D #f =2D (apply throw args)))))))) + "/lib/guix/package-cache.sqlite") =20 (define find-packages-by-name/direct ;bypass the cache (let ((packages (delay @@ -297,25 +272,57 @@ decreasing version order." matching) matching))))) =20 =2D(define (cache-lookup cache name) +(define* (cache-lookup profile #:optional name) "Lookup package NAME in CACHE. Return a list sorted in increasing versi= on order." (define (package-version? (vector-ref v2 1) (vector-ref v1 1))) =20 =2D (sort (vhash-fold* cons '() name cache) =2D package-versionboolean n) + (case n + ((0) #f) + ((1) #t))) + + (define (string->list str) + (call-with-input-string str read)) + + (define select-statement + (string-append + "SELECT name, version, module, symbol, outputs, supported, superseded= , locationFile, locationLine, locationColumn from packages" + (if name " WHERE name =3D :name" ""))) + + (define cache-file + (string-append profile %package-cache-file)) + + (let* ((db (sqlite-open cache-file SQLITE_OPEN_READONLY)) + (statement (sqlite-prepare db select-statement))) + (when name + (sqlite-bind-arguments statement #:name name)) + (let ((result (sqlite-fold (lambda (v result) + (match v + (#(name version module symbol outputs s= upported superseded file line column) + (cons + (vector name + version + (string->list module) + (string->symbol symbol) + (string->list outputs) + (int->boolean supported) + (int->boolean superseded) + (list file line column)) + result)))) + '() statement))) + (sqlite-finalize statement) + (sqlite-close db) + (sort result package-versionstring x) + (call-with-output-string (cut write x <>))) + (define (generate-package-cache directory) "Generate under DIRECTORY a cache of all the available packages. =20 @@ -381,49 +388,81 @@ reducing the memory footprint." (define cache-file (string-append directory %package-cache-file)) =20 =2D (define (expand-cache module symbol variable result+seen) + (define schema + "CREATE TABLE packages (name text, +version text, +module text, +symbol text, +outputs text, +supported int, +superseded int, +locationFile text, +locationLine int, +locationColumn int); +CREATE VIRTUAL TABLE packageSearch USING fts5(name, searchText);") + + (define insert-statement + "INSERT INTO packages(name, version, module, symbol, outputs, supporte= d, superseded, locationFile, locationLine, locationColumn) +VALUES(:name, :version, :module, :symbol, :outputs, :supported, :supersede= d, :locationfile, :locationline, :locationcolumn)") + + (define insert-package-search-statement + "INSERT INTO packageSearch(name, searchText) VALUES(:name, :searchtext= )") + + (define (boolean->int x) + (if x 1 0)) + + (define (list->string x) + (call-with-output-string (cut write x <>))) + + (define (insert-package db module symbol variable seen) (match (false-if-exception (variable-ref variable)) ((? package? package) =2D (match result+seen =2D ((result . seen) =2D (if (or (vhash-assq package seen) =2D (hidden-package? package)) =2D (cons result seen) =2D (cons (cons `#(,(package-name package) =2D ,(package-version package) =2D ,(module-name module) =2D ,symbol =2D ,(package-outputs package) =2D ,(->bool (supported-package? package)) =2D ,(->bool (package-superseded package)) =2D ,@(let ((loc (package-location package))) =2D (if loc =2D `(,(location-file loc) =2D ,(location-line loc) =2D ,(location-column loc)) =2D '(#f #f #f)))) =2D result) =2D (vhash-consq package #t seen)))))) =2D (_ =2D result+seen))) =2D =2D (define exp =2D (first =2D (fold-module-public-variables* expand-cache =2D (cons '() vlist-null) =2D (all-modules (%package-module-path) =2D #:warn =2D warn-about-load-error))= )) + (cond + ((or (vhash-assq package seen) + (hidden-package? package)) + seen) + (else + (let ((statement (sqlite-prepare db insert-statement))) + (sqlite-bind-arguments statement + #:name (package-name package) + #:version (package-version package) + #:module (list->string (module-name modu= le)) + #:symbol (symbol->string symbol) + #:outputs (list->string (package-outputs= package)) + #:supported (boolean->int (supported-pac= kage? package)) + #:superseded (boolean->int (package-supe= rseded package)) + #:locationfile (cond + ((package-location packa= ge) =3D> location-file) + (else #f)) + #:locationline (cond + ((package-location packa= ge) =3D> location-line) + (else #f)) + #:locationcolumn (cond + ((package-location pac= kage) =3D> location-column) + (else #f))) + (sqlite-fold cons '() statement) + (sqlite-finalize statement)) + (let ((statement (sqlite-prepare db insert-package-search-stateme= nt))) + (sqlite-bind-arguments statement + #:name (package-name package) + #:searchtext (package-description packag= e)) + (sqlite-fold cons '() statement) + (sqlite-finalize statement)) + (vhash-consq package #t seen)))) + (_ seen))) =20 (mkdir-p (dirname cache-file)) =2D (call-with-output-file cache-file =2D (lambda (port) =2D ;; Store the cache as a '.go' file. This makes loading fast and r= educes =2D ;; heap usage since some of the static data is directly mmapped. =2D (put-bytevector port =2D (compile `'(,@exp) =2D #:to 'bytecode =2D #:opts '(#:to-file? #t))))) + (let ((db (sqlite-open cache-file))) + (sqlite-exec db schema) + (sqlite-exec db "BEGIN") + (fold-module-public-variables* (cut insert-package db <> <> <> <>) + vlist-null + (all-modules (%package-module-path) + #:warn + warn-about-load-error)) + (sqlite-exec db "COMMIT;") + (sqlite-close db)) + cache-file) =20 =2D-=20 2.23.0 --=-=-=-- --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl4p+V0ACgkQLiXui2GA K7Oe/wf/VUd/Gcd+KFyJxXTfML9vGIxR7xUXl6M92mevuADXCxI4JECqMmjIuj9f ZbC6o/D+XcnJz7XyttQZi/iyjbwZIA0DwUbdAg5BRP8cK6ZkCflPfjamWNQ2RVYu 2S+oITgatidZTLDTFGP6RYeXN27I+fkK5P28XSJHa69aE34bVor0R3bb7Ki57OVS +cEYu6nlGbADqpFpLT6VjB7ewgr9wt0tQyq721JevZzi3PNb+WVq6Pi2N69nHDdO kslYfUVi2kWnN9i3gtBnEwVo2cj2uaD2eSYd5YoA2c5kxHVomo5/CiuDUqdCyxd+ /Xbjbug+6C7SeCcBbfVW0RgRo2Dvng== =HLpI -----END PGP SIGNATURE----- --==-=-=--