From mboxrd@z Thu Jan 1 00:00:00 1970 From: Pierre Neidhardt Subject: Re: Inverted index to accelerate guix package search Date: Fri, 17 Jan 2020 18:13:51 +0100 Message-ID: <87v9paypio.fsf@ambrevar.xyz> References: <87a76r68u6.fsf@ambrevar.xyz> <87muaqnmod.fsf@ambrevar.xyz> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:51392) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1isVC5-0002qk-QH for guix-devel@gnu.org; Fri, 17 Jan 2020 12:14:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1isVC1-0003em-Gu for guix-devel@gnu.org; Fri, 17 Jan 2020 12:14:01 -0500 Received: from relay5-d.mail.gandi.net ([217.70.183.197]:38015) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1isVBz-0003ao-Ac for guix-devel@gnu.org; Fri, 17 Jan 2020 12:13:56 -0500 In-Reply-To: List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane-mx.org@gnu.org Sender: "Guix-devel" To: Arun Isaac , zimoun Cc: Guix Devel --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Arun Isaac writes: > For the time being, since we don't have xapian bindings, I think we > should settle for sqlite's full text search capabilities. > > https://www.sqlite.org/fts5.html > > I have attached a short proof of concept script for an sqlite based > search. Speedup is around 200x, and populating the database only takes > around 2.5 seconds. Here is a sample run. > > Sqlite database populated in 2.5516340732574463 seconds > Brute force search took 0.11850595474243164 seconds > Sqlite search took 5.459785461425781e-4 seconds This is really cool! And quite simple too! So now I suppose the test would be to try with some real world examples :) I don't know the kind of tests we can write for this though. If the results are convincing enough, I'd agree with you: we can first settle for SQLite before moving to something more sophisticated like Xapian. Cheers! =2D-=20 Pierre Neidhardt https://ambrevar.xyz/ --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAl4h608ACgkQm9z0l6S7 zH+xOAgAhvX+bWtjXDsYahVmHvG2vGH9d+2F4eHRokf+6OvFFAVsy77rkVB5alrn nCOJzGeFjKqP+fO5hKL+6+Y6xzVN4/rCYkuW/YfiNMYySThxBiFc5LTDT+0joiVw p5tgf++WDq7RXDt7Aa2odsDeOBKuJ/Qpo3JoUJujtHYZu6vfVF45ZnVCE3U/1hQ6 9QnmYIbP+ZbQcSQnDK9otUWydjcD8YHuipbhIvIZzn737NfP+TjZGLemmuNRg4wf WBjFhk1USH8gKhPfkMjPssblqgG3nLaQJ9cR+oxHmRf6VbzpHSRpD5zo53mdafVz tvprkP6XrM2JhcKSm/arXntzP0S5pA== =rKvv -----END PGP SIGNATURE----- --=-=-=--