From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:470:142:3::10]:33438) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jBHiR-0005Wz-JL for guix-patches@gnu.org; Mon, 09 Mar 2020 08:41:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jBHiQ-0006RI-Bi for guix-patches@gnu.org; Mon, 09 Mar 2020 08:41:03 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:43967) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jBHiQ-0006Qy-7h for guix-patches@gnu.org; Mon, 09 Mar 2020 08:41:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jBHiQ-0002Ee-4t for guix-patches@gnu.org; Mon, 09 Mar 2020 08:41:02 -0400 Subject: [bug#39258] [PATCH v2 0/3] Xapian for Guix package search Resent-Message-ID: MIME-Version: 1.0 References: <20200307133116.11443-1-arunisaac@systemreboot.net> <87sgijgb1v.fsf@gnu.org> In-Reply-To: From: zimoun Date: Mon, 9 Mar 2020 13:40:39 +0100 Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+kyle=kyleam.com@gnu.org Sender: "Guix-patches" To: Arun Isaac Cc: Ludovic =?UTF-8?Q?Court=C3=A8s?= , Pierre Neidhardt , 39258@debbugs.gnu.org On Sun, 8 Mar 2020 at 10:02, Arun Isaac wrote: > >> It turns out that most of the time is spent in printing and texinfo > >> rendering of the search results. > > Also, when we put all package metadata into the Xapian index, we don't > have to look up any of the package variables in (gnu packages *) during > `guix search` time. This also contributes substantially to the speedup. Yes, magic power of inverted index. ;-) > > Also, if the 12K+ descriptions need to be rendered at the time the user > > runs =E2=80=98guix pull=E2=80=99, the experience may not be great, beca= use it could take > > a bit of time. > > This is a problem, but I would see it as a necessary "compilation" > step. :-P In fact, this whole patchset speeds up `guix search` by doing > part of the work of `guix search` ahead of time. So, some such cost is > unavoidable. Currently "guix pull" is rather long on my machine. I would accept a couple of seconds more (even minutes). So this compilation step could be done at the "guix pull" time. Or even we could imagine something indexing in the background. > > What I like about the recutils format in this context is that it=E2=80= =99s both > > human- and machine-readable. The examples in the manual show how it ca= n > > be useful to select the information displayed or to refine the search > > (info "(guix) Invoking guix package"). > > Xapian's query language is much more natural (as in natural language) > than the regexp based techniques we need to use with recutils. I have > hardly ever used the regexp based search and I suspect many others > haven't either. Also, refining the search query should be easier to do > with Xapian. We could even use Xapian's query expansion feature to > suggest improved queries to the user. > > That said, if we want the recutils format, we can still keep it in a > simplified form like so. > > name: inkscape > version: 0.92.4 > synopsis: Vector graphics editor > > name: inklingreader > version: 0.8 > synopsis: Wacom Inkling skecth format conversion and manipulation > > > Also: I=E2=80=99d recommend tackling one thing at a time. :-) > > I totally agree, but I'm tempted to say that pre-rendering would be a > lot cheaper with the simplified form of search results. :-) IMHO, we "just" need to propose different outputs mimicking "git log --format". Soemthing like "guix search --format=3D". What do you think? All the best, simon