From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:470:142:3::10]:34253) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jBHpD-0004kO-FA for guix-patches@gnu.org; Mon, 09 Mar 2020 08:48:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jBHpC-0007Ag-8V for guix-patches@gnu.org; Mon, 09 Mar 2020 08:48:03 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:43978) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jBHpC-00079v-4s for guix-patches@gnu.org; Mon, 09 Mar 2020 08:48:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jBHpC-0002Pa-2j for guix-patches@gnu.org; Mon, 09 Mar 2020 08:48:02 -0400 Subject: [bug#39258] [PATCH v2 0/3] Xapian for Guix package search Resent-Message-ID: MIME-Version: 1.0 References: <20200307133116.11443-1-arunisaac@systemreboot.net> <87sgijgb1v.fsf@gnu.org> <875zffcc87.fsf@gnu.org> In-Reply-To: <875zffcc87.fsf@gnu.org> From: zimoun Date: Mon, 9 Mar 2020 13:47:01 +0100 Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+kyle=kyleam.com@gnu.org Sender: "Guix-patches" To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: Arun Isaac , Pierre Neidhardt , 39258@debbugs.gnu.org On Sun, 8 Mar 2020 at 12:33, Ludovic Court=C3=A8s wrote: > Arun Isaac skribis: > > This is a problem, but I would see it as a necessary "compilation" > > step. :-P In fact, this whole patchset speeds up `guix search` by doing > > part of the work of `guix search` ahead of time. So, some such cost is > > unavoidable. > > Yeah. I think we need to take the whole user experience into account, > not just =E2=80=98guix search=E2=80=99. =E2=80=98guix pull=E2=80=99 alre= ady feels very slow, and it=E2=80=99s a > fairly common operation. Conversely, =E2=80=98guix search=E2=80=99 takes= roughly > between 0.5 and 2 seconds and is an uncommon operation on a =E2=80=9Cslow= path=E2=80=9D > (in the sense that when you=E2=80=99re searching for software, you=E2=80= =99ll probably > have to spend more than a couple of seconds to find what you=E2=80=99re l= ooking > for.) We could imagine something doing the job of indexing in the background; using the daemon or whatever. > >> What I like about the recutils format in this context is that it=E2=80= =99s both > >> human- and machine-readable. The examples in the manual show how it c= an > >> be useful to select the information displayed or to refine the search > >> (info "(guix) Invoking guix package"). > > > > Xapian's query language is much more natural (as in natural language) > > than the regexp based techniques we need to use with recutils. I have > > hardly ever used the regexp based search and I suspect many others > > haven't either. Also, refining the search query should be easier to do > > with Xapian. We could even use Xapian's query expansion feature to > > suggest improved queries to the user. > > I=E2=80=99m not sufficiently familiar with Xapian=E2=80=99s query languag= e. The > examples I had in mind were: > > guix search malloc | recsel -p name,version,relevance > guix search | recsel -p name -e 'license ~ "LGPL 3"' > guix search crypto library | \ > recsel -e '! (name ~ "^(ghc|perl|python|ruby)")' -p name,synopsis I think these examples are good ones to benchmark the different approaches. Because the speed is one thing, the accuracy is another one. Let cut the "slow path" by providing a better experience when searching. ;-= ) > It=E2=80=99s not so much about regexps than it is about selecting individ= ual > fields. The regexp should be provided directly to "guix search" actually and 'recsel' is only a "filter" allowing to deal differently with the fields. > To me, adding 20=E2=80=9350 seconds on =E2=80=98guix pull=E2=80=99 would = be undesirable. :-/ Ok, at least it is clear. :-) And computing in the background? All the best, simon