From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id mNElB4tUNF+aFAAA0tVLHw (envelope-from ) for ; Wed, 12 Aug 2020 20:43:55 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id XM1pAYtUNF+gFwAAB5/wlQ (envelope-from ) for ; Wed, 12 Aug 2020 20:43:55 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id A119A94066A for ; Wed, 12 Aug 2020 20:43:54 +0000 (UTC) Received: from localhost ([::1]:56898 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k5xbF-0005Hp-Ie for larch@yhetil.org; Wed, 12 Aug 2020 16:43:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46040) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k5xb6-0005Hc-Dp for guix-devel@gnu.org; Wed, 12 Aug 2020 16:43:44 -0400 Received: from relay3-d.mail.gandi.net ([217.70.183.195]:33351) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k5xb4-00013j-3C for guix-devel@gnu.org; Wed, 12 Aug 2020 16:43:44 -0400 X-Originating-IP: 86.246.37.13 Received: from mimimi (lfbn-idf2-1-572-13.w86-246.abo.wanadoo.fr [86.246.37.13]) (Authenticated sender: mail@ambrevar.xyz) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id 95F3560003; Wed, 12 Aug 2020 20:43:38 +0000 (UTC) From: Pierre Neidhardt To: Julien Lepiller , guix-devel@gnu.org, Ricardo Wurmus Subject: Re: File search progress: database review and question on triggers In-Reply-To: <683CAE3A-8B38-4A41-9B3C-18D1284D3EFA@lepiller.eu> References: <87sgcuh8rb.fsf@ambrevar.xyz> <87y2ml429i.fsf@elephly.net> <87364tgja3.fsf@ambrevar.xyz> <87y2mlf4jw.fsf@ambrevar.xyz> <87pn7x3pyw.fsf@elephly.net> <87r1sbel4f.fsf@ambrevar.xyz> <683CAE3A-8B38-4A41-9B3C-18D1284D3EFA@lepiller.eu> Date: Wed, 12 Aug 2020 22:43:37 +0200 Message-ID: <87o8nfy4qu.fsf@ambrevar.xyz> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Received-SPF: pass client-ip=217.70.183.195; envelope-from=mail@ambrevar.xyz; helo=relay3-d.mail.gandi.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/08/12 15:10:10 X-ACL-Warn: Detected OS = Linux 3.11 and newer X-Spam_score_int: -5 X-Spam_score: -0.6 X-Spam_bar: / X-Spam_report: (-0.6 / 5.0 requ) BAYES_00=-1.9, FROM_SUSPICIOUS_NTLD=1, PDS_OTHER_BAD_TLD=1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: -0.61 X-TUID: Cxum1Ka//DmA --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Julien Lepiller writes: > Have you tried something more structured? I have some code for creating a= binary search tree and even compressing/decompressing strings with huffman= , as well as code to serialize all that (my deserialization is in Java thou= gh, so not very useful to you): https://framagit.org/nani-project/nani-webs= ite > > See modules/nani/trie.scm for instance. > > The idea is to have a binary search tree whose keys are filenames and val= ue is a pointer in the file to a structure that holds data for this fileram= e (packages and versions of guix for instance). It's super fast to look it = up, because you don't load the whole file, you only seek to the right posit= ion. It's a lot longer to build, and not so easy to update though. Assuming we'd have only 1 Guix generation per file, I'm not sure a trie would help because we _always_ need to search _all_ file paths, so in all cases we've got some 10,000+ entries to load in memory and loop over them. The total number of entries is the bottleneck here, both for the database load and the individual search queries. An obvious optimization is to memoize the database load. This has a significant memory cost though. The trie could be helpful for multiple Guix generations in the same database, but I'm not sure it warrant the increased complexity. Thoughts? =2D-=20 Pierre Neidhardt https://ambrevar.xyz/ --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAl80VHkACgkQm9z0l6S7 zH8OjAf8C+Q2bqujrKO7A0lgyUwr8HqbX/fG6na/kSVYR1R4hvG0EyStvlsDDQQl UdQD9GUAkMaFFgQd5zDnosokIXBzQAGSN6zIygUTdx8PK0eFl1UGwqnZ7+Ex2UEr GwQioXF3LlqtKpP5SFP3Ja0yEZvpiYWp9fuZ9c4+eBStTaGdjUPn7z4jco1+C1ka Wlz0258SJyqlMDgpxj2VsZ9VxtpalveOjKj1Uh1YPx/+hn59N/TRNPLtSZLE9wwu g6HJtI/y0fAUM43hhMWq6i2f8kJk4BlC5+cw6I7MtBspXyCNmSB78GLUQMyYmgbi e7M/pF6q0fVzl7mkTXztGj9WJE9vCg== =SUN6 -----END PGP SIGNATURE----- --=-=-=--