From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id AKbGC99YNV+oWAAA0tVLHw (envelope-from ) for ; Thu, 13 Aug 2020 15:14:39 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id eE2rB99YNV/9OAAAB5/wlQ (envelope-from ) for ; Thu, 13 Aug 2020 15:14:39 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 74D3A9402CC for ; Thu, 13 Aug 2020 15:14:38 +0000 (UTC) Received: from localhost ([::1]:42102 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k6Ew9-0007q6-CB for larch@yhetil.org; Thu, 13 Aug 2020 11:14:37 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:35354) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k6Evy-0007nl-Sc for guix-devel@gnu.org; Thu, 13 Aug 2020 11:14:26 -0400 Received: from mugam.systemreboot.net ([139.59.75.54]:60534) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k6Evu-0008IP-OT for guix-devel@gnu.org; Thu, 13 Aug 2020 11:14:26 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=systemreboot.net; s=default; h=Content-Type:MIME-Version:Message-ID:Date: References:In-Reply-To:Subject:Cc:To:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=oeL795Qmn1gJo2oZPXi3VMatxoh3WLKC8+SndM8RcJM=; b=SxB8BRzhpXcoZTuiwvkEJzZtQ qFVPER8yFzLzwH5psJhdt5K4mir4Z+L2lL1KNA9y0QO95vqS4mJU51c9ItCyc2MgB9Wy2yDPI+XER v1gK3/HzeRNYSNBkc5560vxPmMd6oA5CcONhsNzuf+HHL29ipbrtqRQZRNnjIn9j/mj2OTjNsKgIC wCyJk4hZ/3YrY+kSQdhrG2Wj1G4BVSuomMyEDzX6cprdIHgS7i9Qie9f6SGSxLu3H2zdM2m5BVaCx i32cFtk9Cl6WaZSb9D5MQhF/8xi7pl918jtBgxO/e6YAOOLfBIMglCvx9b26m9UWoCVEfTsfQwqOZ 3RD7gvnGw==; Received: from [192.168.2.1] (helo=steel) by systemreboot.net with esmtpsa (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.94) (envelope-from ) id 1k6EvZ-0004R4-RF; Thu, 13 Aug 2020 20:44:01 +0530 From: Arun Isaac To: Pierre Neidhardt , Ricardo Wurmus Subject: Re: File search progress: database review and question on triggers In-Reply-To: <87tux6d54k.fsf@ambrevar.xyz> References: <87sgcuh8rb.fsf@ambrevar.xyz> <87y2ml429i.fsf@elephly.net> <87364tgja3.fsf@ambrevar.xyz> <87y2mlf4jw.fsf@ambrevar.xyz> <87pn7x3pyw.fsf@elephly.net> <87r1sbel4f.fsf@ambrevar.xyz> <87eeobh01d.fsf@systemreboot.net> <87d03uevdq.fsf@ambrevar.xyz> <875z9mhh3s.fsf@systemreboot.net> <87tux6d54k.fsf@ambrevar.xyz> Date: Thu, 13 Aug 2020 20:44:08 +0530 Message-ID: <87zh6yfuin.fsf@systemreboot.net> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Received-SPF: pass client-ip=139.59.75.54; envelope-from=arunisaac@systemreboot.net; helo=mugam.systemreboot.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/08/13 11:14:17 X-ACL-Warn: Detected OS = ??? X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel@gnu.org Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=fail (rsa verify failed) header.d=systemreboot.net header.s=default header.b=SxB8BRzh; dmarc=fail reason="SPF not aligned (relaxed)" header.from=systemreboot.net (policy=none); spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: -2.01 X-TUID: pry2azAwC7CB --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable > Yes, but full text search brings us a few niceties here: These are nice features, but I don't know if all of them are useful for file search. Normally, with Arch's pkgfile, I seach for some missing header file, shared library, etc. Usually, I know the exact filename I am looking for, or I know some prefix or suffix of the exact filename. > - Case insensitive, diacritic insensitive (e.g. "e" matches "=C3=89"). Case insensitivity is quite useful. Most filenames are in lower case, but there is always that one odd filename out there. But filenames usually don't have diacritics. So, I'm not sure if diacritic insensitivity is useful. > All the above is arguably more powerful and easier to use than regexp. > But even if no user ever bothers with the logic operators, the default > behaviour "just works" in the fashion of a search engine. > > The main thing I don't know how to do is suffix matches (like "%foo"). > With FTS, looking up "foo*" won't match "libfoo", which is problematic. > Any clue how to fix that? This is handled by stemming. We'll need a custom stemmer that normalizes libfoo to foo. Xapian has a nice page on stemming. See https://xapian.org/docs/stemming.html Cheers! --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl81WMAACgkQLiXui2GA K7NnmwgAo2f+U2xIL9RXTJ9PVXDAFpmTyrTlntgnmG2XIZV/UMyMp9qZa6oNGxVl ysE8C3CephdU/deBLcfZ6FfIVYUumdPx718hoq/5r1Z5H6X/MqaOxoB/qoUBSEYu M2cshEH+IKyD/30kEK2KoryNGwriwnN7Srx7R9PjfvM6i3lrFSIY59Vsa5UAWwM+ PT/lVWg8HJ7rGfBzYNUlrYd59Y4kl1cJwE2yZ5nx6TJaGzeGCZp6P5++b+vvyDKF fZMtpJff9qjbpMrWGGu4Zvt+r+VIGevoKJLubPl+Z3QvwNop+TvlMz2DSrHllAFT hUejNomIsMkppeoHGuVVwyfT6q5F5Q== =mdPu -----END PGP SIGNATURE----- --=-=-=--