From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:470:142:3::10]:48697) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jAg95-0007Zm-Au for guix-patches@gnu.org; Sat, 07 Mar 2020 15:34:04 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jAg94-0007SO-2B for guix-patches@gnu.org; Sat, 07 Mar 2020 15:34:03 -0500 Received: from debbugs.gnu.org ([209.51.188.43]:41649) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jAg93-0007R0-UK for guix-patches@gnu.org; Sat, 07 Mar 2020 15:34:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jAg93-0000Ch-SF for guix-patches@gnu.org; Sat, 07 Mar 2020 15:34:01 -0500 Subject: [bug#39258] [PATCH v2 0/3] Xapian for Guix package search Resent-Message-ID: From: Ludovic =?UTF-8?Q?Court=C3=A8s?= References: <20200307133116.11443-1-arunisaac@systemreboot.net> Date: Sat, 07 Mar 2020 21:33:16 +0100 In-Reply-To: <20200307133116.11443-1-arunisaac@systemreboot.net> (Arun Isaac's message of "Sat, 7 Mar 2020 19:01:13 +0530") Message-ID: <87sgijgb1v.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+kyle=kyleam.com@gnu.org Sender: "Guix-patches" To: Arun Isaac Cc: mail@ambrevar.xyz, 39258@debbugs.gnu.org, zimon.toutoune@gmail.com Hello, Arun Isaac skribis: > Here is the second iteration of my Xapian Guix package search patchset. I= have > found the reason the earlier patchset did not show significant speedup. It > turns out that most of the time is spent in printing and texinfo renderin= g of > the search results. So, in this patchset, I pre-render the search results > while building the Xapian index and stuff them into the Xapian database > itself. Therefore, during `guix search`, I just pull out the pre-rendered > search results and print it on the screen. This is much faster. See compa= rison > below. > > With a warm cache, > $ time guix search inkscape > > real 0m1.787s > user 0m1.745s > sys 0m0.111s > > $ time /tmp/test/bin/guix search inkscape > > real 0m0.199s > user 0m0.182s > sys 0m0.024s Nice! In general, pre-rendering doesn=E2=80=99t seem practical to me: the output = of =E2=80=98guix search=E2=80=99 is locale-dependent (it speaks the user=E2=80= =99s language) and adjusts to the terminal width (well, this is temporarily broken on Guile 3.0.0, but see =E2=80=98%text-width=E2=80=99 in (guix ui)). Also, if the 12K+ descriptions need to be rendered at the time the user runs =E2=80=98guix pull=E2=80=99, the experience may not be great, because = it could take a bit of time. WDYT? > Why not use a simpler package search results format like Arch Linux or De= bian > does? We could just display the package name, version and synopsis like s= o. > > inkscape 0.92.4 > Vector graphics editor > inklingreader 0.8 > Wacom Inkling sketch format conversion and manipulation > > Why do we need the entire recutils format? If the user is interested, the= y can > always use `guix package --show` to get the full recutils formatted > info. Having shorter search results will make everything even faster and = much > more readable. WDYT? What I like about the recutils format in this context is that it=E2=80=99s = both human- and machine-readable. The examples in the manual show how it can be useful to select the information displayed or to refine the search (info "(guix) Invoking guix package"). Also: I=E2=80=99d recommend tackling one thing at a time. :-) > Ludovic Court=C3=A8s writes: > >> Note that =E2=80=98guix search=E2=80=99 time is largely dominated by I/O. > > Yes, `guix search` is I/O intensive. That is why I expect Xapian to do be= tter > since it only needs to access matching packages not all packages. Also, t= he > Xapian index is fast at all times. It is not very dependent on a warm > filesystem cache. Yes, indeed. >> On my laptop, >> I get (first measurement is cold cache, second one is warm cache): >> >> --8<---------------cut here---------------start------------->8--- >> $ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches' >> $ time guix search foo >/dev/null >> >> real 0m2.631s >> user 0m1.134s >> sys 0m0.124s >> $ time guix search foo >/dev/null >> >> real 0m0.836s >> user 0m1.027s >> sys 0m0.053s >> --8<---------------cut here---------------end--------------->8--- >> >> It=E2=80=99s hard to do better on the warm cache case because at this le= vel, >> there may be other things to optimize having little to do with searching >> itself. >> >> Note that this is on an SSD; the cold-cache case must be worse on NFS or >> on a spinning disk, and there we could gain a lot. > > My laptop is quite old with a particularly slow HDD. Hence my motivation = to > improve guix search performance! Were you able to measure the cost of rendering specifically? Here=E2=80=99s what I see when I turn =E2=80=98package->recutils=E2=80=99 i= nto a no-op: --8<---------------cut here---------------start------------->8--- $ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches' $ time ./pre-inst-env guix search foo=20 real 0m1.617s user 0m0.812s sys 0m0.094s $ time ./pre-inst-env guix search foo=20 real 0m0.595s user 0m0.747s sys 0m0.043s --8<---------------cut here---------------end--------------->8--- To compare with: --8<---------------cut here---------------start------------->8--- $ time ./pre-inst-env guix search foo >/dev/null real 0m0.829s user 0m1.026s sys 0m0.046s --8<---------------cut here---------------end--------------->8--- I think we should look at a profile of =E2=80=98package->recutils=E2=80=99,= there=E2=80=99s probably room for improvement there. Thoughts? Ludo=E2=80=99.