From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:470:142:3::10]:36915) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jBI4h-0004zu-QK for guix-patches@gnu.org; Mon, 09 Mar 2020 09:04:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jBI4g-0000Uc-Qr for guix-patches@gnu.org; Mon, 09 Mar 2020 09:04:03 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:44002) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jBI4g-0000U4-KE for guix-patches@gnu.org; Mon, 09 Mar 2020 09:04:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jBI4g-0002nl-HF for guix-patches@gnu.org; Mon, 09 Mar 2020 09:04:02 -0400 Subject: [bug#39258] [PATCH v2 0/3] Xapian for Guix package search Resent-Message-ID: MIME-Version: 1.0 References: <20200307133116.11443-1-arunisaac@systemreboot.net> <87sgijgb1v.fsf@gnu.org> <875zffcc87.fsf@gnu.org> <877dzuvues.fsf@ambrevar.xyz> <87blp54yag.fsf@gnu.org> In-Reply-To: <87blp54yag.fsf@gnu.org> From: zimoun Date: Mon, 9 Mar 2020 14:03:06 +0100 Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+kyle=kyleam.com@gnu.org Sender: "Guix-patches" To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: Arun Isaac , Pierre Neidhardt , 39258@debbugs.gnu.org On Mon, 9 Mar 2020 at 11:29, Ludovic Court=C3=A8s wrote: > > Back to the topic: I believe that Xapian is a huge win both for the > > shell and the future GUI :) > > It could be, but we need to consider all the aspects of the story, > including the maintenance cost and overhead moved to =E2=80=98guix pull= =E2=80=99. So > it=E2=80=99s not so much about =E2=80=9Cbeliefs=E2=80=9D at this point, b= ut rather about > demonstrating what can be done, and I=E2=80=99m glad Arun is exploring th= at > space! I agree. What is currently tested with Xapian is: 1- speeding up (or not) using an inverted index 2- the accuracy using the state-of-art of information retrieval (BM25) About 1- I do not have a strong opinion; even if I find "guix search" terribly slow as I mentioned earlier (one year ago ;-)). About 2- as I mentioned earlier, the 'relevance' function could be improved. Currently, the score is computed only considering the package itself and not the other packages (the words they use, their number etc.). BM25 is the state-of-art using what I tried to explained some time ago when I showed for example TF-IDF. The question is so what the best move to improve the accuracy. And the improvement necessarily uses a global index (of terms, at least). But on the other hand, the improvement should not pay off because it would add complexity and burden, more than the improvement itself. Without testing, we cannot say. Thank you Arun for pushing forward. All the best, simon