From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id QKHjHFQQNV87dwAA0tVLHw (envelope-from ) for ; Thu, 13 Aug 2020 10:05:08 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id aee4GFQQNV9oVgAAB5/wlQ (envelope-from ) for ; Thu, 13 Aug 2020 10:05:08 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 0A8B894028F for ; Thu, 13 Aug 2020 10:05:08 +0000 (UTC) Received: from localhost ([::1]:46442 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k6A6c-0001it-Hs for larch@yhetil.org; Thu, 13 Aug 2020 06:05:06 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44532) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k6A6S-0001iY-NB for guix-devel@gnu.org; Thu, 13 Aug 2020 06:04:56 -0400 Received: from relay8-d.mail.gandi.net ([217.70.183.201]:39677) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k6A6Q-0003Vc-B6 for guix-devel@gnu.org; Thu, 13 Aug 2020 06:04:56 -0400 X-Originating-IP: 86.246.37.13 Received: from bababa (lfbn-idf2-1-572-13.w86-246.abo.wanadoo.fr [86.246.37.13]) (Authenticated sender: mail@ambrevar.xyz) by relay8-d.mail.gandi.net (Postfix) with ESMTPSA id 424381BF20B; Thu, 13 Aug 2020 10:04:49 +0000 (UTC) From: Pierre Neidhardt To: Ricardo Wurmus Subject: Re: File search progress: database review and question on triggers In-Reply-To: <87sgcq3n50.fsf@elephly.net> References: <87sgcuh8rb.fsf@ambrevar.xyz> <87y2ml429i.fsf@elephly.net> <87364tgja3.fsf@ambrevar.xyz> <87y2mlf4jw.fsf@ambrevar.xyz> <87pn7x3pyw.fsf@elephly.net> <87r1sbel4f.fsf@ambrevar.xyz> <683CAE3A-8B38-4A41-9B3C-18D1284D3EFA@lepiller.eu> <87o8nfy4qu.fsf@ambrevar.xyz> <72845E8B-35E1-4A27-95E6-452D1D1F626B@lepiller.eu> <87k0y3xcjs.fsf@ambrevar.xyz> <87sgcq3n50.fsf@elephly.net> Date: Thu, 13 Aug 2020 12:04:49 +0200 Message-ID: <87a6yyeu9q.fsf@ambrevar.xyz> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Received-SPF: pass client-ip=217.70.183.201; envelope-from=mail@ambrevar.xyz; helo=relay8-d.mail.gandi.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/08/13 05:20:13 X-ACL-Warn: Detected OS = Linux 3.11 and newer X-Spam_score_int: -15 X-Spam_score: -1.6 X-Spam_bar: - X-Spam_report: (-1.6 / 5.0 requ) BAYES_00=-1.9, FROM_SUSPICIOUS_NTLD=1, PDS_OTHER_BAD_TLD=1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel@gnu.org Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: -3.11 X-TUID: Gd16Y/p626u3 --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Ricardo, See my recent email with the new SQLite benchmark: I can now generate the whole database in under 30 seconds, so about the same order of magnitude than a naive text database (which has a less data). SQLite pattern search queries are extremely fast (<0.1s) and cover all examples named so far: =2D exact basename match =2D partial path match =2D pattern match (e.g. "/include/%foo%") The only thing that's missing is regexp support. I'll see what we can do. Considering this, I find that leveraging SQLite is very attractive at this point: =2D Fast (fastest?). =2D Low on memory. =2D No need to come up with our own data format. =2D Less work, no need to reinvent the wheel. >> - It does not cover the case where I don't know the basename, e.g. if I'm >> looking for a FOO header file my query would look like "/include/.*foo= .*". > > I think this is a rather rare use case, which in my opinion doesn=E2=80= =99t > justify forgoing the use of a smart data structure. I don't find it so rare, actually. I've used filesearch (on other OSes) to find =2D Emacs packages where I didn't know the exact name of the .el. =2D TeXlive packages (every time you get a font a .sty error) =2D Shared libraries (.so of which I didn't know the exact name) =2D include files as above =2D Your favourite programming language package... > It would *still* be possible to use the prefix tree for this kind of > search, but it would certainly not be optimal. (E.g. by searching up > to the first wildcard, and then searching each resulting sub-tree up > to the first non-wildcard, and resuming the search in all remaining > sub-trees.) But it's not a wildcard, it's a regexp. Can we run full-path regexp queries over a trie? >> I believe it's important that the search be as general as possible. > > Search should be cheap and fast. While very important, I would put more priority on exactness. A search which gives unreliable results (e.g. returns nothing while there exists a result) is a search that would deter many users from using it, in my experience. Cheers! =2D-=20 Pierre Neidhardt https://ambrevar.xyz/ --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAl81EEEACgkQm9z0l6S7 zH9lfggAgCBGGqogDIPOi3/8Dglwyhr1qaWQiZ3+iLgx6FBeOeRTMjFzNlvNortY iTRzvwYXrbyjlgHu5lWSJ50ZLWFKcLg+4ZFbic80ezQy61KAu/t0wzpzF1gr/eAx fAT0OiALLRKWPGAJhd1u2ZJOLyk30EJoV+K2WnPg49ieFW/p+OozikEw7hjJOSfR WWuIPEBHhyYXK6vGptOu7w2+cTDL6A5p3abSeLGXkl18alzM3S3hmKGpQr4oJTx+ B39Nq4izcGGB2BV5y98bGIOSmxmQw43j8L1HR2D+q0DAJiQRi0/G3Td6QMKvppUi 59srcwYB19PFrh1ZdL2Lw3gxuPU+pg== =sITF -----END PGP SIGNATURE----- --=-=-=--