unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: zimoun <zimon.toutoune@gmail.com>
To: Mike Gerwitz <mtg@gnu.org>
Cc: Guix Devel <guix-devel@gnu.org>, 33844@debbugs.gnu.org
Subject: bug#33844: Rename ghc-pandoc to pandoc
Date: Thu, 27 Feb 2020 14:10:15 +0100	[thread overview]
Message-ID: <CAJ3okZ2-Mz=Bh6dbg7BEcH54Vc0RMRDfCOkqTJzP=cW4S2wAJg__7081.11952278472$1582809096$gmane$org@mail.gmail.com> (raw)
In-Reply-To: <87pne0q0wb.fsf@gnu.org>

Hi Mike,

On Thu, 27 Feb 2020 at 02:23, Mike Gerwitz <mtg@gnu.org> wrote:

> Ah, for the record, I had searched for pandoc using `guix package -s
> pandoc` in the past and didn't find what I was looking for, and so fell
> back to a Debian system.  It turns out what I wanted was ghc-pandoc
> after all.

Thank you for pointing the issue.

My remark is *not* about the rename which seems fine. For the very
same reason than the "git-annex" software is named 'git-annex' and not
'ghc-git-annex'.


Well, your comment is pointing: a) that the description is badly
written and b) the 'relevance' score is too rough.

The command "guix search pandoc" returns as the highest ranked
package: ghc-pandoc-citeproc with the relevance score of 17. The
package of interest 'ghc-pandoc' appears at the 6th position with a
relevance score of 8. (And after emacs-pandoc-mode, ghc-pandoc-types,
emacs-ox-pandoc and python-pandocfilters; well less relevant packages,
IMO.)
Why? Because the number of occurrences of the term 'pandoc' in
synopsis+description+name.
ghc-pandoc-citeproc: 1+5+1
ghc-pandoc: 0+2+1

To be precise, the score uses weights and so it reads:

ghc-pandoc-citeproc: 3*1 + 2*5 + 4*1 = 17
ghc-pandoc: 3*0 + 2*2 + 4*1 = 8

And the rename bumps the score because there is an additional weight
(5) for exact match (which normally happens only for the 'name'
field).

ghc-pandoc-citeproc: 3*1 + 2*5 + 4*1 = 17
pandoc: 3*0 + 2*2 + 4*1*5 = 24

It apparently fixes the issue and now the package named 'pandoc' will
show up first. But it is an artefact because it is easy* to find other
weights that invalidate this expected ranking; and the current weights
are a working rule of thumbs but not deeply thought, AFAIK.


*For example instead of 5, let choose 2, then the score becomes:
3*0+2*2+4*1*2=12 which is less than 17. Well, not so easy because 2 is
the same as 'description' and it seems less natural; i.e., it appears
more natural to have a high weight for an exact match. But the point
is: it is possible to find another working rule of thumb which will
not return the expected result for all the packages.


The real problem is not the non-obvious name (ghc-pandoc instead of
simply pandoc) but it is: a) some descriptions are badly written and
b) the 'relevance' scoring function is not enough "smart" to detect
them.



All the best,
simon

  parent reply	other threads:[~2020-02-27 13:11 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-23  8:46 bug#33844: Rename ghc-pandoc to pandoc swedebugia
2020-02-26 10:06 ` Pierre Neidhardt
     [not found] ` <878skpacih.fsf@ambrevar.xyz>
2020-02-26 10:23   ` Efraim Flashner
2020-02-26 10:23   ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
2020-02-26 11:57   ` Ricardo Wurmus
     [not found]   ` <20200226102314.GE12956@E5400>
2020-02-27  1:23     ` Mike Gerwitz
     [not found]     ` <87pne0q0wb.fsf@gnu.org>
2020-02-27 13:10       ` zimoun [this message]
     [not found]       ` <CAJ3okZ2-Mz=Bh6dbg7BEcH54Vc0RMRDfCOkqTJzP=cW4S2wAJg@mail.gmail.com>
2020-02-28  5:03         ` Mike Gerwitz
     [not found]   ` <87wo89imsx.fsf@elephly.net>
2020-02-26 12:17     ` Pierre Neidhardt
     [not found]     ` <87wo898rv3.fsf@ambrevar.xyz>
2020-02-26 12:52       ` Pierre Neidhardt
2020-09-09 16:08     ` zimoun
2020-09-09 16:11       ` Ricardo Wurmus
2020-10-07 15:27         ` zimoun
2020-12-19  0:03           ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJ3okZ2-Mz=Bh6dbg7BEcH54Vc0RMRDfCOkqTJzP=cW4S2wAJg__7081.11952278472$1582809096$gmane$org@mail.gmail.com' \
    --to=zimon.toutoune@gmail.com \
    --cc=33844@debbugs.gnu.org \
    --cc=guix-devel@gnu.org \
    --cc=mtg@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).