From mboxrd@z Thu Jan 1 00:00:00 1970 From: zimoun Subject: Re: Inverted index to accelerate guix package search Date: Wed, 15 Jan 2020 12:54:52 +0100 Message-ID: References: <87a76r68u6.fsf@ambrevar.xyz> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:43808) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1irhGL-0006ng-VZ for guix-devel@gnu.org; Wed, 15 Jan 2020 06:55:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1irhGK-0007IB-BR for guix-devel@gnu.org; Wed, 15 Jan 2020 06:55:05 -0500 Received: from mail-qk1-x735.google.com ([2607:f8b0:4864:20::735]:36192) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1irhGK-0007Hw-76 for guix-devel@gnu.org; Wed, 15 Jan 2020 06:55:04 -0500 Received: by mail-qk1-x735.google.com with SMTP id a203so15351766qkc.3 for ; Wed, 15 Jan 2020 03:55:04 -0800 (PST) In-Reply-To: List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane-mx.org@gnu.org Sender: "Guix-devel" To: Arun Isaac Cc: Guix Devel Hi, On Wed, 15 Jan 2020 at 06:44, Arun Isaac wrote: > > Well, the issue is 'scoring' a query. > > I think the issue of whether to use an inverted index is orthogonal to > the quest to improve the relevance of the search results. Implementing > tf-idf like you suggested could greatly benefit from having fast > searches. I think it is not so orthogonal. In general, a fast and good system is a combination of relevant scoring adapted to the good data structure, IMHO. However, I agree that adding an inverted index will improve the current situation of "guix search" -- keeping the current scoring function -- and ease the end-user experience. > Pierre Neidhardt writes: > > > By the way, what about using Xapian in Guix? > > > > https://en.wikipedia.org/wiki/Xapian > > > > If it's relevant, maybe we can follow up with a discussion in a new > > thread. > > I feel xapian is too much work (considering that we don't yet have guile > bindings) compared to our own simple implementation of an inverted > index. But, of course, I am biased since I wrote the inverted index > code! :-) It depends on how long run we are talking. :-) Xapian avoids to reinvent the wheel. ;-) > But, on a more serious note, if we move to xapian, we will not be able > to support regular expression based search queries that we support > today. I am not convinced... > On the question of whether xapian is too heavy, I think we should make > it an optional dependency of Guix so that it remains possible to build > and use a more minimalistic Guix for those interested in such things. Guix comes with SQLite and it is ok. The question is: how Xapian is minimalist. :-) (need some investigation) All the best, simon