From mboxrd@z Thu Jan 1 00:00:00 1970 From: zimoun Subject: Re: bug#33844: Rename ghc-pandoc to pandoc Date: Thu, 27 Feb 2020 14:10:15 +0100 Message-ID: References: <9251b0f558ee90be48420e76d7c4d326@riseup.net> <878skpacih.fsf@ambrevar.xyz> <20200226102314.GE12956@E5400> <87pne0q0wb.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:54104) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j7Ivy-0006Ge-0o for guix-devel@gnu.org; Thu, 27 Feb 2020 08:10:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j7Ivw-0003r7-O6 for guix-devel@gnu.org; Thu, 27 Feb 2020 08:10:33 -0500 In-Reply-To: <87pne0q0wb.fsf@gnu.org> List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane-mx.org@gnu.org Sender: "Guix-devel" To: Mike Gerwitz Cc: Guix Devel , 33844@debbugs.gnu.org Hi Mike, On Thu, 27 Feb 2020 at 02:23, Mike Gerwitz wrote: > Ah, for the record, I had searched for pandoc using `guix package -s > pandoc` in the past and didn't find what I was looking for, and so fell > back to a Debian system. It turns out what I wanted was ghc-pandoc > after all. Thank you for pointing the issue. My remark is *not* about the rename which seems fine. For the very same reason than the "git-annex" software is named 'git-annex' and not 'ghc-git-annex'. Well, your comment is pointing: a) that the description is badly written and b) the 'relevance' score is too rough. The command "guix search pandoc" returns as the highest ranked package: ghc-pandoc-citeproc with the relevance score of 17. The package of interest 'ghc-pandoc' appears at the 6th position with a relevance score of 8. (And after emacs-pandoc-mode, ghc-pandoc-types, emacs-ox-pandoc and python-pandocfilters; well less relevant packages, IMO.) Why? Because the number of occurrences of the term 'pandoc' in synopsis+description+name. ghc-pandoc-citeproc: 1+5+1 ghc-pandoc: 0+2+1 To be precise, the score uses weights and so it reads: ghc-pandoc-citeproc: 3*1 + 2*5 + 4*1 = 17 ghc-pandoc: 3*0 + 2*2 + 4*1 = 8 And the rename bumps the score because there is an additional weight (5) for exact match (which normally happens only for the 'name' field). ghc-pandoc-citeproc: 3*1 + 2*5 + 4*1 = 17 pandoc: 3*0 + 2*2 + 4*1*5 = 24 It apparently fixes the issue and now the package named 'pandoc' will show up first. But it is an artefact because it is easy* to find other weights that invalidate this expected ranking; and the current weights are a working rule of thumbs but not deeply thought, AFAIK. *For example instead of 5, let choose 2, then the score becomes: 3*0+2*2+4*1*2=12 which is less than 17. Well, not so easy because 2 is the same as 'description' and it seems less natural; i.e., it appears more natural to have a high weight for an exact match. But the point is: it is possible to find another working rule of thumb which will not return the expected result for all the packages. The real problem is not the non-obvious name (ghc-pandoc instead of simply pandoc) but it is: a) some descriptions are badly written and b) the 'relevance' scoring function is not enough "smart" to detect them. All the best, simon