From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id cCWqBt5FNV91TQAA0tVLHw (envelope-from ) for ; Thu, 13 Aug 2020 13:53:34 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id OBmOAt5FNV9ZdgAAB5/wlQ (envelope-from ) for ; Thu, 13 Aug 2020 13:53:34 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id AC86F94051B for ; Thu, 13 Aug 2020 13:53:33 +0000 (UTC) Received: from localhost ([::1]:35374 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k6Dfg-0002oA-Mt for larch@yhetil.org; Thu, 13 Aug 2020 09:53:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:40598) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k6DfY-0002nv-SZ for guix-devel@gnu.org; Thu, 13 Aug 2020 09:53:24 -0400 Received: from relay9-d.mail.gandi.net ([217.70.183.199]:50011) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k6DfW-00066E-SI for guix-devel@gnu.org; Thu, 13 Aug 2020 09:53:24 -0400 X-Originating-IP: 86.246.37.13 Received: from bababa (lfbn-idf2-1-572-13.w86-246.abo.wanadoo.fr [86.246.37.13]) (Authenticated sender: mail@ambrevar.xyz) by relay9-d.mail.gandi.net (Postfix) with ESMTPSA id 08775FF80A; Thu, 13 Aug 2020 13:53:18 +0000 (UTC) From: Pierre Neidhardt To: Arun Isaac , Ricardo Wurmus Subject: Re: File search progress: database review and question on triggers In-Reply-To: <875z9mhh3s.fsf@systemreboot.net> References: <87sgcuh8rb.fsf@ambrevar.xyz> <87y2ml429i.fsf@elephly.net> <87364tgja3.fsf@ambrevar.xyz> <87y2mlf4jw.fsf@ambrevar.xyz> <87pn7x3pyw.fsf@elephly.net> <87r1sbel4f.fsf@ambrevar.xyz> <87eeobh01d.fsf@systemreboot.net> <87d03uevdq.fsf@ambrevar.xyz> <875z9mhh3s.fsf@systemreboot.net> Date: Thu, 13 Aug 2020 15:53:15 +0200 Message-ID: <87tux6d54k.fsf@ambrevar.xyz> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Received-SPF: pass client-ip=217.70.183.199; envelope-from=mail@ambrevar.xyz; helo=relay9-d.mail.gandi.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/08/13 09:53:20 X-ACL-Warn: Detected OS = Linux 3.11 and newer X-Spam_score_int: -15 X-Spam_score: -1.6 X-Spam_bar: - X-Spam_report: (-1.6 / 5.0 requ) BAYES_00=-1.9, FROM_SUSPICIOUS_NTLD=1, PDS_OTHER_BAD_TLD=1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel@gnu.org Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: -3.11 X-TUID: bX7vnatgpl53 --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Arun Isaac writes: >> - Or do you think SQLite patterns (using "%") would do for now? As >> Mathieu pointed out, it's an unfortunate inconsistency with the rest of >> Guix. But maybe regexp support can be added in a second stage. > > The inconsistency is unfortunate. Personally, I am in favor of dropping > regexp support everywhere in Guix, and only having literal string > search. But that is backward incompatible, and may be controversial. > > In this specific case of file search, we could use the sqlite like > patterns, but not expose them to the user. For example, if the search > query is "", we search for the LIKE pattern "%%". I think > this addresses how users normally search for files. I don't think > regexps add much value. I agree. > Full text search may not be relevant to file search. Full text search is > more suited for natural language search involving such things as > stemming algorithms. Yes, but full text search brings us a few niceties here: =2D Wildcards using the `*` character. This fixes the unfamiliarity of `%`. =2D "Automatic word permutations" (maybe not the right term). "foo bar" and "bar foo" both match the same results!=20=20 =2D Case insensitive, diacritic insensitive (e.g. "e" matches "=C3=89"). =2D Logic: we can do "(OR foo bar) AND (OR qux quuz)". =2D Relevance ranking: results can be sorted by relevance, another problem we don't have to fix ourselves ;) All the above is arguably more powerful and easier to use than regexp. But even if no user ever bothers with the logic operators, the default behaviour "just works" in the fashion of a search engine. The main thing I don't know how to do is suffix matches (like "%foo"). With FTS, looking up "foo*" won't match "libfoo", which is problematic. Any clue how to fix that? =2D-=20 Pierre Neidhardt https://ambrevar.xyz/ --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAl81RcsACgkQm9z0l6S7 zH/C+AgAjHE/inmfH95863NdvSCyvr1Eii1zNPZRTjeb0gnOn1RgWYfiSpX3fO8r wMQC2Wpue0bLNlQe8dBOAj0UAAzIh3de+IdaULups9G59NELfeYe0S2C+r9T54yx v9cRw7yHvQpmKDpH3pm29iX3M+yrMcaIFmP65+b+Xai4qE4SXpNMIDdwKLCfbDtx 4ThUhv7BkBhINdz7TqnDH4ICMgY0SrU00oN4tw6uEPI03W4EhK7a2kvTuy7RsfdA H749NPB9GK4CQcNToo/OurV0E5Dn5XXGoNZm1OBSrvYesEnDr5oib5eeG1e4hS2m zyrGsWwtUBGzlK2c27/BkWZYaN/D2A== =+B6C -----END PGP SIGNATURE----- --=-=-=--