From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id iNq4Ap56Q18jVQAA0tVLHw (envelope-from ) for ; Mon, 24 Aug 2020 08:30:22 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id CBL8OZ16Q19OcAAAbx9fmQ (envelope-from ) for ; Mon, 24 Aug 2020 08:30:21 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 7528D9401CB for ; Mon, 24 Aug 2020 08:30:21 +0000 (UTC) Received: from localhost ([::1]:44532 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kA7rw-0001zl-CL for larch@yhetil.org; Mon, 24 Aug 2020 04:30:20 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:55972) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kA7rh-0001zY-SL for guix-devel@gnu.org; Mon, 24 Aug 2020 04:30:05 -0400 Received: from relay12.mail.gandi.net ([217.70.178.232]:59029) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kA7rf-0003B4-58 for guix-devel@gnu.org; Mon, 24 Aug 2020 04:30:05 -0400 Received: from mimimi (91-165-152-159.subs.proxad.net [91.165.152.159]) (Authenticated sender: mail@ambrevar.xyz) by relay12.mail.gandi.net (Postfix) with ESMTPSA id E1D96200005; Mon, 24 Aug 2020 08:29:56 +0000 (UTC) From: Pierre Neidhardt To: Arun Isaac , Ricardo Wurmus Subject: Re: File search progress: database review and question on triggers In-Reply-To: <871rk73dqw.fsf@systemreboot.net> References: <87sgcuh8rb.fsf@ambrevar.xyz> <87y2ml429i.fsf@elephly.net> <87364tgja3.fsf@ambrevar.xyz> <87y2mlf4jw.fsf@ambrevar.xyz> <87pn7x3pyw.fsf@elephly.net> <87r1sbel4f.fsf@ambrevar.xyz> <87eeobh01d.fsf@systemreboot.net> <87d03uevdq.fsf@ambrevar.xyz> <875z9mhh3s.fsf@systemreboot.net> <87tux6d54k.fsf@ambrevar.xyz> <87zh6yfuin.fsf@systemreboot.net> <87r1sad0co.fsf@ambrevar.xyz> <87o8neczen.fsf@ambrevar.xyz> <871rk73dqw.fsf@systemreboot.net> Date: Mon, 24 Aug 2020 10:29:55 +0200 Message-ID: <87blj0sayk.fsf@ambrevar.xyz> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Received-SPF: pass client-ip=217.70.178.232; envelope-from=mail@ambrevar.xyz; helo=relay12.mail.gandi.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/08/24 04:29:58 X-ACL-Warn: Detected OS = Linux 3.11 and newer X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, FROM_SUSPICIOUS_NTLD=0.499, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_PDS_OTHER_BAD_TLD=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel@gnu.org Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: -0.61 X-TUID: knHgtW9e4eey --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Hi Arun, thanks for your feedback! 350 seconds does not seem too bad. Let's keep in mind that this operation will mostly be used by the build farm / substitute server, not by the user. > - Maybe, we shouldn't index hidden files, particularly all the .xxx-real > files created by our wrap phases. I thought about this, but for now I don't think it's a good idea: =2D Hidden files are rather rare and won't take up much space in the database. =2D What if the users actually searches for a hidden file? =2D It's easy to exclude hidden files from the search results. > - You should use SQL prepared statements with sqlite-prepare, > sqlite-bind, etc. That would correctly handle escaping special > characters in the search string. Currently, searching for > "transmission-gtk", "libm.so", etc. errors out. Thanks for pointing this out, I'll look into it. > - Searching for "git perl5" works as expected, but searching for "git > perl" returns no results. I think this is due to the tokenizer used by > the full text search indexer. The tokenizer sees the word "perl5" as > one indivisible token and does not realize that "perl" is a prefix of > "perl5". Unfortunately, I think this is a fundamental problem with FTS > -- one that can only be fixed by using simple LIKE patterns. FTS is > meant for natural language search where this kind of thing would be > normal. Indeed, but "git perl*" would have worked I think. We can always pre-process the query to add stars everywhere. At the moment, the only downside I see with FTS is that it seems to be impossible to match words for which we don't know the beginning. Anyways, switching from FTS to LIKE patterns is rather easy. In fact, I could implement both with a tiny abstraction so that we can choose which one we want in the end. > - I guess you are only indexing local packages now, but will include all > packages later by some means. Indeed, I want the substitute server / build farm to expose the database for syncing. I'd need some help to get started. Anyone? =2D-=20 Pierre Neidhardt https://ambrevar.xyz/ --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAl9DeoMACgkQm9z0l6S7 zH9EgQgAkcelw14YpDlbus/g+WvHyYwqw+thB456IQGJEKxstQOyfc3C+KSAbzkK aLGphUBwM7Ljv/WNZxNFuLmVQBiRJeB5IDDYuMqgI3VTcohjPv9b9aWH9zNBwdXG OWwrNDC0grCThPdUbxVT814yBou1F6Jn/aqD/6hAaNUWuKWm6a/SApnbF81Lxxvt DyPr8UMIWu3Ua7knt81lzJ66thJ3OOdLHqzHRy874zZYMcB0W08TMDADuR6ZmqQF 0jUP5Wx/Z35epJDsxg99GtJd4KeMHYVPzZVtHT/+VjhM40hKhq6Boi0fE1NjiRSd kuGzaELz3CsaVVt8DpfD3J/CyrCgew== =N0ao -----END PGP SIGNATURE----- --=-=-=--