all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Philip Kaludercic <philipk@posteo.net>
To: Akib Azmain Turja <akib@disroot.org>
Cc: Adam Porter <adam@alphapapa.net>,
	 rms@gnu.org,  yoni@rabkins.net, emacs-devel@gnu.org
Subject: Re: Distribution statistics for ELPA and EMMS
Date: Tue, 19 Sep 2023 22:06:02 +0000	[thread overview]
Message-ID: <87h6npakdx.fsf@posteo.net> (raw)
In-Reply-To: <87fs3agf84.fsf@disroot.org> (Akib Azmain Turja's message of "Wed, 20 Sep 2023 01:00:59 +0600")

Akib Azmain Turja <akib@disroot.org> writes:

> Philip Kaludercic <philipk@posteo.net> writes:
>
>> Adam Porter <adam@alphapapa.net> writes:
>>
>>> [I just noticed this message from a few months ago.]
>>>
>>> On 7/16/23 21:25, Richard Stallman wrote:
>>>> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
>>>> [[[ whether defending the US Constitution against all enemies,     ]]]
>>>> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>>>> We could have two options for downloading, one which is "for a real
>>>> user" and one which is "for periodic testing".
>>>> The only difference would be that the former increments the user
>>>> download count and the latter does not.
>>>
>>> I like this idea, but it seems like it would be hard to enforce.  It
>>> could even go the other way, i.e. have Emacs send a query string or
>>> header when installing a package manually, which could be logged and
>>> used to filter the download logs later.  But even that might be harder
>>> than it seems, e.g. if I call a command like:
>>>
>>>   emacs --eval "(package-install FOO)"
>>>
>>> ...to non-interactively install a package into a local directory for
>>> testing, how far, and in how many places, would some kind of flag need
>>> to be propagated to end up in the server's logs?
>>
>> There is an inherent unreliability in these kinds of statistics that has
>> to be accepted.  The question is therefore are issues like these
>> significant or would they skew the results.  This has to be considered
>> under a false-positive and a false-negative approach, depending on what
>> we want to measure.
>
> How are these numbers going to be useful?  This can't be a measure of
> "popularity."

Yes, they are at best an indicator.  A malicious person could always
manipulate them, unless considerable effort is put into verifying the
information -- which not only comes at the cost of time but also is
likely to decrease the amount of available information.

> Say, for example, the package "git-commit" is 11th most downloaded
> package on MELPA.  Is it really popular?  Few people install it
> explicitly.  Only one package depends on it, which is Magit, a super
> popular package.  So git-commit is automatically installed as a
> dependency when Magit is installed.

We should be able to solve that problem by adding a query string to the
request, as Adam suggests:

https://elpa.gnu.org/packages/poker-0.2.tar?selected=yes
https://elpa.gnu.org/packages/seq-2.24.tar?selected=no
https://elpa.gnu.org/packages/project-0.10.0.tar?selected=yes&upgrade=yes
etc.

Given this information, you know the user doesn't object to having this
information used (depending on whether or not this is a opt-in or
out-out thing), the version being fetched, whether it is a dependency or
not and whether it was an upgrade.

> And also, packages that get more frequent update are downloaded more
> than whose update less frequently.  So its indeed possible for a less
> popular but frequently updated package gets more downloaded than a
> mature well written more popular package.

We can remember upgrade-counts over the last week, year and all time.

> And also there are straight.el, Elpaca and Quelpa guys who don't use the
> ELPA at all.

Of course, hence "inherent unreliability", though I would be surprised
if the choice of package manager has a strong causal effect on what
packages one uses (setting aside that from-source package managers can
install unreleased packages that are not distributed in any archive).

>>                      If it is all about dopamine-boosting, I think a
>> false-positive approach would be better ;^)
>
> OK...
>
> (while t
>   (package-install 'eat)
>   (package-delete (cadr (assoc 'eat package-alist))))
>
> Soon: Eat is the most popular terminal emulator.  xD

Good point (though just asynchronously spamming the right URL would be
more efficient), my idea would be to count an IP address only once per
day, ignoring how many concrete requests were sent out and also use a
list of excluded addresses, such as Tor exit nodes, to filter out from
the statistics.

This approach approach, together with the fact that from-source package
managers wouldn't participate unless they are actively instructed to do
so, are further arguments for a false-negative approach.



  parent reply	other threads:[~2023-09-19 22:06 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-13 20:54 Distribution statistics for ELPA and EMMS Yoni Rabkin
2023-07-13 23:16 ` Eduardo Ochs
2023-07-14  7:03   ` Philip Kaludercic
2023-07-14 14:02     ` Yoni Rabkin
2023-07-14 19:45       ` Adam Porter
2023-07-17  2:25         ` Richard Stallman
2023-09-19 14:49           ` Adam Porter
2023-09-19 16:38             ` Philip Kaludercic
2023-09-19 19:00               ` Akib Azmain Turja
2023-09-19 19:13                 ` Emanuel Berg
2023-09-19 19:42                 ` Yoni Rabkin
2023-09-19 22:06                 ` Philip Kaludercic [this message]
2023-09-07 16:46       ` Stefan Kangas
2023-09-07 17:10         ` Yoni Rabkin
2023-09-07 21:35           ` Akib Azmain Turja
2023-09-07 22:07             ` Stefan Kangas
2023-09-07 23:09         ` Lynn Winebarger
2023-09-08  7:51         ` Philip Kaludercic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h6npakdx.fsf@posteo.net \
    --to=philipk@posteo.net \
    --cc=adam@alphapapa.net \
    --cc=akib@disroot.org \
    --cc=emacs-devel@gnu.org \
    --cc=rms@gnu.org \
    --cc=yoni@rabkins.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.