From mboxrd@z Thu Jan 1 00:00:00 1970 From: zimoun Subject: Re: NLNet grant "Next Generation Internet -- Search & discovery": I'm in! Date: Sat, 14 Dec 2019 18:04:13 +0100 Message-ID: References: <874ky4s2m2.fsf@ambrevar.xyz> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:36697) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1igEO5-0002A6-0V for guix-devel@gnu.org; Sat, 14 Dec 2019 15:51:42 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1igEO3-0002EQ-7U for guix-devel@gnu.org; Sat, 14 Dec 2019 15:51:40 -0500 Received: from mail-qt1-x82a.google.com ([2607:f8b0:4864:20::82a]:35136) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1igEO3-0002Ac-0N for guix-devel@gnu.org; Sat, 14 Dec 2019 15:51:39 -0500 Received: by mail-qt1-x82a.google.com with SMTP id e12so1839065qto.2 for ; Sat, 14 Dec 2019 12:51:38 -0800 (PST) In-Reply-To: <874ky4s2m2.fsf@ambrevar.xyz> List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Pierre Neidhardt Cc: Guix Devel Hi Pierre, Congrats! On Fri, 13 Dec 2019 at 15:49, Pierre Neidhardt wrote: > 2. File search > (Previous discussion: https://lists.gnu.org/archive/html/guix-devel/2019-03/msg00236.html) Yes, it is really lacking. For example, if one wants to use the 'hg' control version system, then one will naively search "guix search hg" and this will return "Human Genome" packages (useful in bioinformatics stuff). Worse, because the description/synopsis of the package which provides the command 'hg' do not mention the term 'hg', it is impossible to reach it if one does not know that it is provided by the very package mercurial. So, what I am personally doing is: DuckDuckGo, look at some Debian packages and hope it is the same name in Guix. Ouch! And it is super useful to find headers. You have a code with "#include " and it is hard to know which Guix package provides this very header 'name-it.h'. However, IMHO, the "filesearch" should be included in the "search" command and not another command. I mean: guix package --file-search=hg or guix search hg --file-search It appears to me a better UI than adding again another subcommand. :-) > 5. Social integration with the Guix catalogue > > Previous discussions: > - Adding wikidata, wikipedia & screenshot-url fields to package-recipes: https://lists.gnu.org/archive/html/guix-devel/2018-11/msg00007.html > - Re-approaching package tagging: https://lists.gnu.org/archive/html/guix-devel/2018-12/msg00385.html > - New library: guile-wikidata: https://lists.gnu.org/archive/html/guix-devel/2018-12/msg00107.html > - Guix <-> Wikidata: https://lists.gnu.org/archive/html/guix-devel/2019-05/msg00017.html > - Guix Wikidata module - next steps: https://lists.gnu.org/archive/html/guix-devel/2019-01/msg00089.html > > There were also a few discussions regarding package search improvements, in > which has Zimoun participated quite a bit if I recall correctly. Feel free > to share all your precious links! :) Firstly, IMHO tagging, i.e., assign a specific word belonging to a set of words, is not a good approach. My main argument is: the set of words is arbitrary, and at the end, it is bikeshedding and/or it is not really useful because it is not self-organised by the data themselves. As a Debian user, I do not use their tag system; and I am almost sure they have documented the "usefulness" of their tagging system and the feedback (I have in mind talks in DebConf but I am not able to find it now). However, grouping packages by similar topic is important for discoverybility. The question is: how is the grouping done? Instead of a manual tagging, I propose to first compare clustering methods based on synopsis+description and Natural Language Processing (NLP). It is what I had in mind when I answered to the thread "Re-approaching package tagging" but life intervened and I did nothing on this front. Well, the Python ecosystem provides nice packages (most not yet in Guix last time I checked) to ease the first exploratory and see if it will pay off or not. Not about tagging but close enough to be maybe relevant: https://lists.gnu.org/archive/html/guix-devel/2019-07/msg00133.html https://lists.gnu.org/archive/html/guix-devel/2019-07/msg00252.html Secondly, instead of manual tagging, I propose to work on the relevance scoring. Basically, "guix search" should act as a recommendation system IMHO. Then the questions are: where is done the indexing computations? locally? by the Guix Data Service and "guix pull" will fetch this index? Can be merge with other distro or upstream (CRAN, github) via wikidata or API? etc. Well, thirdly I also think that Guix lacks tools to navigate in its Git history. Now we have "guix time-machine", it appears to me that finding specific package back in the history is complicated (basically git checkout+git log+grepouch! not user-friendly). I have tried to describe use cases in this message. https://lists.gnu.org/archive/html/guix-devel/2019-11/msg00513.html IMO, something similar to "git tag" should be added (in "guix pull"?). But one can also think to integrate such historical information in Wikidata and for example "guix search emacs --all" will return all the versions and commits present in Guix, then it is easy to run "guix time-machine --commit=1234 -- install emacs". Kind of such ideas... and not fully clear in my mind. ;-) Last about UI: https://lists.gnu.org/archive/html/guix-devel/2019-07/msg00289.html Currently "guix search" supports regexp but part of the filtering is done by recsel. And I do not find that handy. Hope that these words make sense. Cheers, simon