From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:470:142:3::10]:39992) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jBFlU-0006W4-0a for guix-patches@gnu.org; Mon, 09 Mar 2020 06:36:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jBFlR-0001uI-R9 for guix-patches@gnu.org; Mon, 09 Mar 2020 06:36:02 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:43871) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jBFlR-0001uA-Mt for guix-patches@gnu.org; Mon, 09 Mar 2020 06:36:01 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jBFlR-0003cx-KO for guix-patches@gnu.org; Mon, 09 Mar 2020 06:36:01 -0400 Subject: [bug#39258] [PATCH v2 0/3] Xapian for Guix package search Resent-Message-ID: From: Ludovic =?UTF-8?Q?Court=C3=A8s?= References: <20200307133116.11443-1-arunisaac@systemreboot.net> <87sgijgb1v.fsf@gnu.org> <875zffcc87.fsf@gnu.org> Date: Mon, 09 Mar 2020 11:35:35 +0100 In-Reply-To: (Arun Isaac's message of "Mon, 09 Mar 2020 01:57:40 +0530") Message-ID: <87r1y13jew.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+kyle=kyleam.com@gnu.org Sender: "Guix-patches" To: Arun Isaac Cc: mail@ambrevar.xyz, 39258@debbugs.gnu.org, zimon.toutoune@gmail.com Hello! Arun Isaac skribis: >>> This could be accomplished even with pre-rendering. Xapian provides >>> "slots" to store arbitrary strings with a document. Instead of storing >>> the pre-rendered document as a whole, we could store pre-rendered fields >>> in separate slots. Then, during `guix search` time, we can assemble the >>> result from these pre-rendered fields. >> >> I=E2=80=99m not sure I understand. The index wouldn=E2=80=99t store pre= -rendered >> strings for every possible terminal width, right? > > No, it wouldn't. It would store a partially pre-rendered string, that is > without fill-paragraph. We run fill-paragraph at `guix search` time to > complete the rendering. Note that Texinfo rendering doesn=E2=80=99t use (@ (guix ui) fill-paragraph= ). It has its own paragraph-filling code. We cannot use =E2=80=98fill-paragra= ph=E2=80=99 after Texinfo rendering anyway, since Texinfo knows where things can be filled and where they cannot=E2=80=94e.g., @example. >> I think we need to take the whole user experience into account, not >> just =E2=80=98guix search=E2=80=99. =E2=80=98guix pull=E2=80=99 already= feels very slow, and it=E2=80=99s a >> fairly common operation. Conversely, =E2=80=98guix search=E2=80=99 take= s roughly >> between 0.5 and 2 seconds and is an uncommon operation on a =E2=80=9Cslow >> path=E2=80=9D (in the sense that when you=E2=80=99re searching for softw= are, you=E2=80=99ll >> probably have to spend more than a couple of seconds to find what >> you=E2=80=99re looking for.) > > I agree we can't compromise too much on `guix pull` performance. > >> To me, adding 20=E2=80=9350 seconds on =E2=80=98guix pull=E2=80=99 would= be undesirable. :-/ > > Maybe I'm missing something here. guix pull takes around 40 minutes on > my machine. In comparison to that, is another 20-50 seconds (roughly 1 > minute) a big deal? How much time would it be acceptable to spend on > building the Xapian index? On my laptop, in the best case, when all the substitutes are available (not uncommon), it takes 2 minutes. Sometimes, when some substitutes are missing, it takes 15 minutes. So of course, the 20=E2=80=9350 seconds matter only in the best case. But = they matter primarily because that index build may not be substitutable: it=E2= =80=99s possibly unique to each profile (see below). That means we know we=E2=80= =99re often going to pay for it. > Also, is it possible to somehow provide substitutes for the Xapian index > so that the user does not have to actually build it locally during `guix > pull` time? We could provide a substitute for users who use only the official 'guix channel. However, as soon as users combine multiple channels, they=E2=80= =99ll have to build the index locally. >> I=E2=80=99m not sufficiently familiar with Xapian=E2=80=99s query langua= ge. The >> examples I had in mind were: >> It=E2=80=99s not so much about regexps than it is about selecting indivi= dual >> fields. > > I have totally not tested this, but I imagine that equivalent Xapian > queries might look something like: > >> guix search | recsel -p name -e 'license ~ "LGPL 3"' > > guix search license:LGPL3 Nice. >> guix search crypto library | \ >> recsel -e '! (name ~ "^(ghc|perl|python|ruby)")' -p name,synopsis > > guix search crypto library AND (NOT ghc) AND (NOT perl) AND (NOT python) > AND (NOT ruby) This one is not quite equivalent I guess, but yeah. :-) >> What I meant was that we could use (statprof) to see whether/how Texinfo >> rendering/parsing can be optimized. > > Oh, ok. I'll try this if we decide not to pre-render. It=E2=80=99d be beneficial anyways. Thank you! Ludo=E2=80=99.