From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id 8OA4GWoJNV8BGAAA0tVLHw (envelope-from ) for ; Thu, 13 Aug 2020 09:35:38 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id QCQTFWoJNV81QQAAB5/wlQ (envelope-from ) for ; Thu, 13 Aug 2020 09:35:38 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id DA9439403CB for ; Thu, 13 Aug 2020 09:35:37 +0000 (UTC) Received: from localhost ([::1]:47766 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k69e4-0005mI-IH for larch@yhetil.org; Thu, 13 Aug 2020 05:35:36 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:38148) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k69dn-0005m9-Nr for guix-devel@gnu.org; Thu, 13 Aug 2020 05:35:21 -0400 Received: from sender4-of-o51.zoho.com ([136.143.188.51]:21172) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1k69dl-0008FW-LC for guix-devel@gnu.org; Thu, 13 Aug 2020 05:35:19 -0400 ARC-Seal: i=1; a=rsa-sha256; t=1597311313; cv=none; d=zohomail.com; s=zohoarc; b=RWoguq+I1OpKs6OF7EkdAUnH3WEBO7xJNzsDL6euaRqxcrgYH/eM+UJ5mgkZJiAsu6jkLnvq0hRp5xwkUQR1Dm9NG3X4cMgmkqQIuk7w/y5xbIyZF9dOWahBoIFIU78LYG7wuQfPNCgN4t82JNdJHhhOMSI34Z1arY9iLyA8XWA= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1597311313; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:To; bh=gTAABHGyp8tA0AtBp4A1Btcg+ujkoIIvEdNah55dEeM=; b=ALQgpRDPvFxR7Zz/HkBuxRn29xGeM5W+iiFAreuZ0rJTa6JJUox7gEYj9eUCIjuPDjdedIHVrAYlabfPvnGN+6Bw57U7+Khnhh9Ae5ts0ZpB5NknPN+fh2104hbHNL/HHsStcmPUfl+rgsT31Gn4erNdavfoEysEhhGN77aNQGQ= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=elephly.net; spf=pass smtp.mailfrom=rekado@elephly.net; dmarc=pass header.from= header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1597311313; s=zoho; d=elephly.net; i=rekado@elephly.net; h=References:From:To:Cc:Subject:In-reply-to:Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding; bh=gTAABHGyp8tA0AtBp4A1Btcg+ujkoIIvEdNah55dEeM=; b=A+EpZ1d2IVWCrRS+0Bp4UZdy2ARohC3ellAHGBYaI7lmG+MphMS7RZj4//+IB9M+ 6qIgw6QF1bie8NyvmxeqDuVQj3xvhDxILM6pHICIL9DBS1bI6QMTW1kZomxuiLbyD4f oec8b2TNdqQH4MR+lom5Q7moc6C5bOriIIZ3doHk= Received: from localhost (p54ad45d2.dip0.t-ipconnect.de [84.173.69.210]) by mx.zohomail.com with SMTPS id 159731126350965.74478928962446; Thu, 13 Aug 2020 02:34:23 -0700 (PDT) References: <87sgcuh8rb.fsf@ambrevar.xyz> <87y2ml429i.fsf@elephly.net> <87364tgja3.fsf@ambrevar.xyz> <87y2mlf4jw.fsf@ambrevar.xyz> <87pn7x3pyw.fsf@elephly.net> <87r1sbel4f.fsf@ambrevar.xyz> <683CAE3A-8B38-4A41-9B3C-18D1284D3EFA@lepiller.eu> <87o8nfy4qu.fsf@ambrevar.xyz> <72845E8B-35E1-4A27-95E6-452D1D1F626B@lepiller.eu> <87k0y3xcjs.fsf@ambrevar.xyz> User-agent: mu4e 1.4.10; emacs 26.3 From: Ricardo Wurmus To: Pierre Neidhardt Subject: Re: File search progress: database review and question on triggers In-reply-to: <87k0y3xcjs.fsf@ambrevar.xyz> X-URL: https://elephly.net X-PGP-Key: https://elephly.net/rekado.pubkey X-PGP-Fingerprint: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC Date: Thu, 13 Aug 2020 11:34:19 +0200 Message-ID: <87sgcq3n50.fsf@elephly.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-ZohoMailClient: External Received-SPF: pass client-ip=136.143.188.51; envelope-from=rekado@elephly.net; helo=sender4-of-o51.zoho.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/08/13 05:35:15 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -30 X-Spam_score: -3.1 X-Spam_bar: --- X-Spam_report: (-3.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel@gnu.org Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=fail (rsa verify failed) header.d=elephly.net header.s=zoho header.b=A+EpZ1d2; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: 1.99 X-TUID: Xa35Lwv4VHGy Pierre Neidhardt writes: > Julien Lepiller writes: > >> Why wouldn't it help? Can't you make it a trie from basename -> >> complete name? If I'm looking for "libcord.so" (which is a key in the >> trie), I don't think I need to look for every path. I only need to >> follow the trie until I find a pointer to some structure that contains >> the data I look for (ex: a list of complete filenames). > > Fair enough, but it's a more limited scope: here we assume the user > knows the exact basename. > > It's a bit too limited in my opinion: > > - It's only too common to have shared objects ending with a .X.Y extension > (where X.Y is a version), the version-less file is not always present > which means a lot of trial and error on the user end just to search > the right file. We can always abort the radix trie traversal early and print all leaves that can be reached from the node that we arrived at. This allows us to search for =E2=80=9Clibc=E2=80=9D and find =E2=80=9Clibcord.so.1.2=E2=80=9D= , =E2=80=9Clibc.so=E2=80=9D, =E2=80=9Clibcaca.a=E2=80=9D, etc. > - It does not cover the case where I don't know the basename, e.g. if I'm > looking for a FOO header file my query would look like "/include/.*foo.= *". I think this is a rather rare use case, which in my opinion doesn=E2=80=99t justify forgoing the use of a smart data structure. It would *still* be possible to use the prefix tree for this kind of search, but it would certainly not be optimal. (E.g. by searching up to the first wildcard, and then searching each resulting sub-tree up to the first non-wildcard, and resuming the search in all remaining sub-trees.) > I believe it's important that the search be as general as possible. Search should be cheap and fast. Dependent on how we arrange the data, users can easily filter the results with grep, if necessary. For example, we could have a radix tree of the base names and have the directory prefix as the leaf nodes. The search would then involve two steps: culling the set of directory prefixes by following the base name trie (that=E2=80=99s fast), then do a simple linear search over the results (that=E2=80=99s slow). The trie can easily be serialized as a bunch of offsets that we only need to repeatedly seek to, so we never need to hold it in memory. --=20 Ricardo