From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:470:142:3::10]:53799) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jArpu-0007xf-9P for guix-patches@gnu.org; Sun, 08 Mar 2020 05:03:03 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jArpt-0003Ty-4C for guix-patches@gnu.org; Sun, 08 Mar 2020 05:03:02 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:41910) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jArpt-0003Tu-1o for guix-patches@gnu.org; Sun, 08 Mar 2020 05:03:01 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jArps-0000ow-UJ for guix-patches@gnu.org; Sun, 08 Mar 2020 05:03:00 -0400 Subject: [bug#39258] [PATCH v2 0/3] Xapian for Guix package search Resent-Message-ID: From: Arun Isaac In-Reply-To: <87sgijgb1v.fsf@gnu.org> References: <20200307133116.11443-1-arunisaac@systemreboot.net> <87sgijgb1v.fsf@gnu.org> Date: Sun, 08 Mar 2020 14:31:42 +0530 Message-ID: MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+kyle=kyleam.com@gnu.org Sender: "Guix-patches" To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: mail@ambrevar.xyz, 39258@debbugs.gnu.org, zimon.toutoune@gmail.com --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable >> It turns out that most of the time is spent in printing and texinfo >> rendering of the search results. Also, when we put all package metadata into the Xapian index, we don't have to look up any of the package variables in (gnu packages *) during `guix search` time. This also contributes substantially to the speedup. > In general, pre-rendering doesn=E2=80=99t seem practical to me: the outpu= t of > =E2=80=98guix search=E2=80=99 is locale-dependent (it speaks the user=E2= =80=99s language) and Note that we already need to index package synopses and descriptions in all languages. I still haven't implemented this, though. > adjusts to the terminal width (well, this is temporarily broken on > Guile 3.0.0, but see =E2=80=98%text-width=E2=80=99 in (guix ui)). This could be accomplished even with pre-rendering. Xapian provides "slots" to store arbitrary strings with a document. Instead of storing the pre-rendered document as a whole, we could store pre-rendered fields in separate slots. Then, during `guix search` time, we can assemble the result from these pre-rendered fields. > Also, if the 12K+ descriptions need to be rendered at the time the user > runs =E2=80=98guix pull=E2=80=99, the experience may not be great, becaus= e it could take > a bit of time. This is a problem, but I would see it as a necessary "compilation" step. :-P In fact, this whole patchset speeds up `guix search` by doing part of the work of `guix search` ahead of time. So, some such cost is unavoidable. > What I like about the recutils format in this context is that it=E2=80=99= s both > human- and machine-readable. The examples in the manual show how it can > be useful to select the information displayed or to refine the search > (info "(guix) Invoking guix package"). Xapian's query language is much more natural (as in natural language) than the regexp based techniques we need to use with recutils. I have hardly ever used the regexp based search and I suspect many others haven't either. Also, refining the search query should be easier to do with Xapian. We could even use Xapian's query expansion feature to suggest improved queries to the user. That said, if we want the recutils format, we can still keep it in a simplified form like so. name: inkscape version: 0.92.4 synopsis: Vector graphics editor name: inklingreader version: 0.8 synopsis: Wacom Inkling skecth format conversion and manipulation > Also: I=E2=80=99d recommend tackling one thing at a time. :-) I totally agree, but I'm tempted to say that pre-rendering would be a lot cheaper with the simplified form of search results. :-) > Were you able to measure the cost of rendering specifically? generate-package-search-index takes around 50 seconds. If I modify generate-package-search-index to not pre-render but simply store the package description alone, it takes around 20 seconds. That gives us a rough idea of the cost of pre-rendering. > I think we should look at a profile of =E2=80=98package->recutils=E2=80= =99, there=E2=80=99s > probably room for improvement there. On quick inspection, most of the time in package->recutils is spent in texinfo rendering the description. Unless we use the simplified search results format as discussed above, we cannot avoid it. --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl5ktHYACgkQLiXui2GA K7NqqQgAnPZx1ilBTQfl8kAoHTyQrdcdhL//PI4LHFMtfYqi+s+AnYvu10jNVM9K lPX2AL3hDRcDYSJL1r2KQ+uicc/FZwY9qWVPg5axflohUnwa6pzn7LDnJ3U5ClU/ ON5uY7Vh6nixBjsIMwMqklRSWb2Lk/VEYTM9+HYUXCPgpvTzUz1XuJN5IP6YeXMo EIIMpOUVFz3Qxg3bZqPXSZ29gxAg+KdS3DhtK4zgmeFZTBeT+81Lld6rDy1q68Y3 kwyFKmiLSsg0XWRStK5tai7J0h7snN+/EM4SkORjOGBlEnsUPQSYfeRnAUXy2f5Y kxw+OnaWgsrF+UtGYrWAg4+SLo0ocw== =rzzL -----END PGP SIGNATURE----- --=-=-=--