From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id cn2LDds4OF+8SQAA0tVLHw (envelope-from ) for ; Sat, 15 Aug 2020 19:34:51 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id KL+wCNs4OF/5TQAAbx9fmQ (envelope-from ) for ; Sat, 15 Aug 2020 19:34:51 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id E95EC940215 for ; Sat, 15 Aug 2020 19:34:50 +0000 (UTC) Received: from localhost ([::1]:40354 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k71x3-00029N-UY for larch@yhetil.org; Sat, 15 Aug 2020 15:34:49 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:39132) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k71wu-00029B-3t for guix-devel@gnu.org; Sat, 15 Aug 2020 15:34:40 -0400 Received: from mugam.systemreboot.net ([139.59.75.54]:36446) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k71wq-00008Z-3d for guix-devel@gnu.org; Sat, 15 Aug 2020 15:34:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=systemreboot.net; s=default; h=Content-Type:MIME-Version:Message-ID:Date: References:In-Reply-To:Subject:Cc:To:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=lAIvy1lHA8pUC5+aA51eKjwDyKtr3BnfRx9sumTV/lQ=; b=fBYfLvnWEfEUqqH1E2d7+Xwza AI2BDQ1s9bu2fNPMg+3ouJQKPcgj3OGo7Wdb6KFSQthMAK1P4MLp8CXapvAOtSrcgK1mVsb1D+QH5 pzZFbN81pfz49YhAPV1lHU84AwJ37sdFanXk1sAuBAXv9S2oUIu6n7lkvhddXLtsNOcEqC4LYVQXZ j5Zhs9JtjhJcQflxHhwEyN+u4zdS5FoFhYC4gt7goaduMnnuUhhkHCLaXO6XbvpKQD529quNxg3Ym 3PViDB+Iyb8qWxB6dMVuLkKfbLOvnlr8OFj3kwEQb4VDVTFwUuU63jBMTCDZnuUcF0LCdisq4hcQM WJmj46c7Q==; Received: from [192.168.2.1] (helo=steel) by systemreboot.net with esmtpsa (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.94) (envelope-from ) id 1k71wT-0001gF-1T; Sun, 16 Aug 2020 01:04:13 +0530 From: Arun Isaac To: Pierre Neidhardt , Ricardo Wurmus Subject: Re: File search progress: database review and question on triggers In-Reply-To: <87o8neczen.fsf@ambrevar.xyz> References: <87sgcuh8rb.fsf@ambrevar.xyz> <87y2ml429i.fsf@elephly.net> <87364tgja3.fsf@ambrevar.xyz> <87y2mlf4jw.fsf@ambrevar.xyz> <87pn7x3pyw.fsf@elephly.net> <87r1sbel4f.fsf@ambrevar.xyz> <87eeobh01d.fsf@systemreboot.net> <87d03uevdq.fsf@ambrevar.xyz> <875z9mhh3s.fsf@systemreboot.net> <87tux6d54k.fsf@ambrevar.xyz> <87zh6yfuin.fsf@systemreboot.net> <87r1sad0co.fsf@ambrevar.xyz> <87o8neczen.fsf@ambrevar.xyz> Date: Sun, 16 Aug 2020 01:03:59 +0530 Message-ID: <871rk73dqw.fsf@systemreboot.net> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Received-SPF: pass client-ip=139.59.75.54; envelope-from=arunisaac@systemreboot.net; helo=mugam.systemreboot.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/08/15 13:53:49 X-ACL-Warn: Detected OS = ??? X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel@gnu.org Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=fail (rsa verify failed) header.d=systemreboot.net header.s=default header.b=fBYfLvnW; dmarc=fail reason="SPF not aligned (relaxed)" header.from=systemreboot.net (policy=none); spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: -2.01 X-TUID: TnlXKIEh7alu --=-=-= Content-Type: text/plain Hi Pierre, I tried the wip-filesearch branch. Nice work! :-) persist-all-local-packages takes around 350 seconds on my machine (slow machine with spinning disk) and the database is 50 MB. Some other comments follow. - Maybe, we shouldn't index hidden files, particularly all the .xxx-real files created by our wrap phases. - You should use SQL prepared statements with sqlite-prepare, sqlite-bind, etc. That would correctly handle escaping special characters in the search string. Currently, searching for "transmission-gtk", "libm.so", etc. errors out. - Searching for "git perl5" works as expected, but searching for "git perl" returns no results. I think this is due to the tokenizer used by the full text search indexer. The tokenizer sees the word "perl5" as one indivisible token and does not realize that "perl" is a prefix of "perl5". Unfortunately, I think this is a fundamental problem with FTS -- one that can only be fixed by using simple LIKE patterns. FTS is meant for natural language search where this kind of thing would be normal. - I guess you are only indexing local packages now, but will include all packages later by some means. Cheers! --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl84OKgACgkQLiXui2GA K7PDYggArY9AHW2xrXDAnb/TlPeCn94J7gcxMpGEUh4yg0TS4Bv3Q6JjZcYYRXZz OfjPoaHoMkCg99ftNyfoGzWv39XjtY4hyHp+ujhhTZ0Mn18bU6ir7ZK3wvWZyNcn yYxCPK7ngRDKIpnm/iDRULeytfNp9bsdNEBuyrLpRJW0o71HY4XC+Pv4+Mp3c2a6 eSNCWzw92jUwqIONnAQK0JkdquuEndE59Q940ge3RTJVs/AKOZikzpMKBV7beDJu QApwhKKhFSqBMAiBATIxNeKK2cSZfQopoF309ET/2NhMjHON4DwjrSP4LXXxx8qZ HOuyWBBVDe/e8cFXXMK+En5Txp4XUg== =O84L -----END PGP SIGNATURE----- --=-=-=--