From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id mJVTAQleNV97MAAA0tVLHw (envelope-from ) for ; Thu, 13 Aug 2020 15:36:41 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id mFifOAheNV83ZwAAbx9fmQ (envelope-from ) for ; Thu, 13 Aug 2020 15:36:40 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id A832A940215 for ; Thu, 13 Aug 2020 15:36:40 +0000 (UTC) Received: from localhost ([::1]:49698 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k6FHT-0003qW-Jm for larch@yhetil.org; Thu, 13 Aug 2020 11:36:39 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:40688) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k6FHL-0003qN-B5 for guix-devel@gnu.org; Thu, 13 Aug 2020 11:36:31 -0400 Received: from relay5-d.mail.gandi.net ([217.70.183.197]:57707) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k6FHJ-0002ki-1o for guix-devel@gnu.org; Thu, 13 Aug 2020 11:36:31 -0400 X-Originating-IP: 86.246.37.13 Received: from bababa (lfbn-idf2-1-572-13.w86-246.abo.wanadoo.fr [86.246.37.13]) (Authenticated sender: mail@ambrevar.xyz) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id 4FC171C000C; Thu, 13 Aug 2020 15:36:23 +0000 (UTC) From: Pierre Neidhardt To: Arun Isaac , Ricardo Wurmus Subject: Re: File search progress: database review and question on triggers In-Reply-To: <87zh6yfuin.fsf@systemreboot.net> References: <87sgcuh8rb.fsf@ambrevar.xyz> <87y2ml429i.fsf@elephly.net> <87364tgja3.fsf@ambrevar.xyz> <87y2mlf4jw.fsf@ambrevar.xyz> <87pn7x3pyw.fsf@elephly.net> <87r1sbel4f.fsf@ambrevar.xyz> <87eeobh01d.fsf@systemreboot.net> <87d03uevdq.fsf@ambrevar.xyz> <875z9mhh3s.fsf@systemreboot.net> <87tux6d54k.fsf@ambrevar.xyz> <87zh6yfuin.fsf@systemreboot.net> Date: Thu, 13 Aug 2020 17:36:23 +0200 Message-ID: <87r1sad0co.fsf@ambrevar.xyz> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Received-SPF: pass client-ip=217.70.183.197; envelope-from=mail@ambrevar.xyz; helo=relay5-d.mail.gandi.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/08/13 11:36:25 X-ACL-Warn: Detected OS = Linux 3.11 and newer X-Spam_score_int: -15 X-Spam_score: -1.6 X-Spam_bar: - X-Spam_report: (-1.6 / 5.0 requ) BAYES_00=-1.9, FROM_SUSPICIOUS_NTLD=1, PDS_OTHER_BAD_TLD=1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel@gnu.org Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: -0.61 X-TUID: G0My1quOYKOO --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Arun Isaac writes: > But filenames usually don't have diacritics. So, I'm not sure if > diacritic insensitivity is useful. Probably not, but if there ever is this odd file name with an accent, then we won't have to worry about it, it will be handled. Better too much than too little! > This is handled by stemming. We'll need a custom stemmer that normalizes > libfoo to foo. Xapian has a nice page on stemming. See > https://xapian.org/docs/stemming.html I think I gave a confusing example. I think stemming is too specific. I meant searching a pattern that does not start the word. If I search "bar", I'd like to match "foobar". The reverse is possible ("foo" matches "foobar"). Well, I'll share my FTS demo just now, we can always think about this case = later. =2D-=20 Pierre Neidhardt https://ambrevar.xyz/ --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAl81XfcACgkQm9z0l6S7 zH/xBggAjRqDPbJtFsZq7LMDZ2vB+0S0kqxp23ojef31VZCOhorKV/lSE59Fh1fP JgbtRZmfXqhR+nJDz04ZSalZCun414BQycrWcjnk1RVvkswDCEJSD8vkyxUL7ZAW B4VoRkLZzZjHomS5boqvYpLhyIxAz6m+K9DyUkNTvZ3fvRdk91YZf9HcTIQojegP EfQeLA9EUKBYouuH8QBG36yyK93CocUGC70eqilyX1orsc7MuUNCBfxlrnOhlKfC eRWJdZjpJpbmrmlH/bc9b8qiIu1Zu88TiipExuH3iUWy1nL/cM8/90JQ9IiztPcY yPaVGFh7xsQTrF7m/kivAzfM9KZgkg== =rqlw -----END PGP SIGNATURE----- --=-=-=--