From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christopher Baines Subject: Re: [PATCH] git-download: Speed up 'git-predicate'. Date: Sun, 16 Jul 2017 11:42:20 +0100 Message-ID: <20170716114220.55d845e9@cbaines.net> References: <87zidk2c4g.fsf@gnu.org> <20170619071443.23122-1-mail@cbaines.net> <87fuetggn9.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; boundary="Sig_/Wg841smc2OwTyWl2W.m=2A7"; protocol="application/pgp-signature" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:45051) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dWh0P-0005le-Ur for guix-devel@gnu.org; Sun, 16 Jul 2017 06:42:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dWh0L-00056m-B1 for guix-devel@gnu.org; Sun, 16 Jul 2017 06:42:29 -0400 In-Reply-To: <87fuetggn9.fsf@gnu.org> List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Ludovic =?UTF-8?B?Q291cnTDqHM=?= Cc: guix-devel@gnu.org --Sig_/Wg841smc2OwTyWl2W.m=2A7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Wed, 21 Jun 2017 23:44:26 +0200 ludo@gnu.org (Ludovic Court=C3=A8s) wrote: > Hello, >=20 > Christopher Baines skribis: >=20 > > +(define (create-directory-tree files) > > + (define (directory-lists->tree directory-lists) > > + (map (lambda (top-level-dir) > > + (cons top-level-dir > > + (directory-lists->tree > > + (filter-map > > + (lambda (directory-list) > > + (if (eq? (length directory-list) 1) > > + #f > > + (cdr directory-list))) > > + ;; Find all the directory lists under this > > top-level-dir > > + (filter > > + (lambda (directory-list) > > + (equal? (car directory-list) > > + top-level-dir)) > > + directory-lists))))) > > + (delete-duplicates > > + (map car directory-lists)))) > > + > > + (directory-lists->tree > > + (filter-map (lambda (path) > > + (let ((split-path (string-split path #\/))) > > + ;; If this is a file in the top of the > > repository? > > + (if (eq? (length split-path) 1) > > + #f > > + ;; drop-right to remove the filename, as > > it's > > + ;; just the directory tree that's important > > + (drop-right (string-split path #\/) 1)))) > > + files))) > > + > > +(define (directory-in-tree? directory whole-tree) > > + (define directory-list-in-tree? > > + (match-lambda* > > + (((top-directory . rest) tree) > > + (if (null? rest) > > + (list? (member top-directory (map car tree))) > > + (and=3D> (find (match-lambda > > + ((subtree-top-directory . subtree) > > + (equal? subtree-top-directory > > + top-directory))) > > + tree) > > + (match-lambda > > + ((subtree-top-directory . subtree) > > + (directory-list-in-tree? rest subtree)))))))) > > + > > + (directory-list-in-tree? (string-split directory #\/) > > + whole-tree)) =20 >=20 > I tried it to fully understand what was going on. While playing with > it I came up with a variant that is slightly more concise and clearer > (at least to me ;-)): >=20 > --8<---------------cut here---------------start------------->8--- > (define (files->directory-tree files) > "Return a tree of vhashes representing the directory listed in > FILES, a list like '(\"a/b\" \"b/c/d\")." > (fold (lambda (file result) > (let loop ((file (string-split file #\/)) > (result result)) > (match file > ((_) > result) > ((directory children ...) > (match (vhash-assoc directory result) > (#f > (vhash-cons directory (loop children vlist-null) > result)) > ((_ . previous) > ;; XXX: 'vhash-delete' is O(n). > (vhash-cons directory (loop children previous) > (vhash-delete directory result))))) > (() > result)))) > vlist-null > files)) >=20 > (define (directory-in-tree? tree directory) > "Return true if DIRECTORY, a string like \"a/b\", denotes a > directory listed in TREE." > (let loop ((directory (string-split directory #\/)) > (tree tree)) > (match directory > (() > #t) > ((head . tail) > (match (vhash-assoc head tree) > ((_ . sub-tree) (loop tail sub-tree)) > (#f #f)))))) > --8<---------------cut here---------------end--------------->8--- Thanks for looking at this Ludo, and apologies for taking so long to reply. These functions are definately more concise.=20 > The tree is a tree of vhash, which should make lookup (i.e., > =E2=80=98directory-in-tree?=E2=80=99) slightly faster. So it works like = this: >=20 > --8<---------------cut here---------------start------------->8--- > scheme@(guile-user)> (files->directory-tree '("a" "a/b" "a/b/x" "a/c" > "b" "b/c")) $17 =3D # > scheme@(guile-user)> (directory-in-tree? $17 "a/b") > $18 =3D #t > scheme@(guile-user)> (directory-in-tree? $17 "a") > $19 =3D #t > scheme@(guile-user)> (directory-in-tree? $17 "a/b/x") > $21 =3D #f > scheme@(guile-user)> (directory-in-tree? $17 "a/c") > $22 =3D #f > scheme@(guile-user)> (directory-in-tree? $17 "b") > $23 =3D #t > --8<---------------cut here---------------end--------------->8--- >=20 > WDYT? If you like it, take it and adjust as you see fit. We=E2=80=99re = doing > 4-hand coding like whichever fancy methodology recommends. ;-) I've had a little look at the performance, and I think this approach of using a vhash might be slower, but not by much. The biggest difference I saw for the guix repository was 0.008 seconds, and for smart-answers was 0.6 seconds. I'm happy to go ahead with either though, obviously these performance differences don't matter compared to the current performance for large repositories. --Sig_/Wg841smc2OwTyWl2W.m=2A7 Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iQKTBAEBCgB9FiEEPonu50WOcg2XVOCyXiijOwuE9XcFAllrQwxfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcACgkQXiijOwuE 9XfWjQ/+IvouDcVhPPZIHzFhIkiEeDHgGjr8HFZlTJ2sIkmxs+M+vtUMFEUFVjpg 8xnkUg2XwwiqCoXuW1ROrMNG+b/vmbdNYcREGsTMsaHB1UuG3rfmQua0mfZ1CikH VIJY0MiEvSr0/KI2TdWn+ZLIc6PN/0qRTZntXs7MdHzjUqrX6+rVuNXQDqAvIZ+c QroBb7UDtnlrrFnlu0P7DmPChvVVojcuYLcaaEzDlvWIuI4RD8mG4JZcYok9qJY+ 2Y+Apn2lr+qnj1aD4JhN7iGxYNO4BgjNrx9GGfWu17gRsMiW1HkAB4j8FwDV2BzC 3M9zhmuNuZEOpIxzmlpjDp6Z5UbpTVLePbPPPBcQ4mWwT6ObfHb/JAA79GAi65eX +m+W2ElOvt7H1rww7/2c2Q1lgOIebNIvy8KUfDzbjcXwLkbmWYpHyydjtvFH/e70 xbNIda+CPDieLOcUWGO02pMUTh8AWVD+4TaTmjdUVkHOo2CBiyw5JJf+R6Erdz8f VsR+fcAS7RB56l6m8IjJPumcGNU86XwO0chykdunTVx12RLDydfkJMDMezBeNZBE WzYt14Q6XYhefXB+43IdIIsF9bxYRsuIl/K3Yaacy8WpjmP7lb1+fjCydhxvaYIX 9XDZtQVgIYKRmwTOX92jgYzPEE36yq1UIvKoRQBHiOv4Q7uuxXs= =w73a -----END PGP SIGNATURE----- --Sig_/Wg841smc2OwTyWl2W.m=2A7--