unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA
@ 2021-09-19 21:13 Stefan Kangas
  2021-09-20  4:35 ` Eli Zaretskii
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Stefan Kangas @ 2021-09-19 21:13 UTC (permalink / raw)
  To: 50686; +Cc: monnier

Severity: wishlist

I think we should show the number of downloads on packages on GNU ELPA
and NonGNU ELPA.  This information should be shown on both the package
page, and in the package listing.

Sorting by downloads is a good way of getting a quick sense of which
packages are worth looking into, as they have many users.

MELPA already has this feature.  Here is the script they use to extract
the data from their webserver:

https://github.com/melpa/melpa/blob/master/docker/logprocessor/process_log.py

Maybe we could just "borrow" that script from them to get the data.
They use an sqlite3 database to save the information over time, which
seems to me like a reasonable approach.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA
  2021-09-19 21:13 bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA Stefan Kangas
@ 2021-09-20  4:35 ` Eli Zaretskii
  2021-09-20  5:54   ` Stefan Kangas
  2021-09-20  6:22 ` Lars Ingebrigtsen
  2021-10-01 19:58 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 1 reply; 18+ messages in thread
From: Eli Zaretskii @ 2021-09-20  4:35 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: 50686, monnier

> From: Stefan Kangas <stefan@marxist.se>
> Date: Sun, 19 Sep 2021 14:13:08 -0700
> Cc: monnier@iro.umontreal.ca
> 
> I think we should show the number of downloads on packages on GNU ELPA
> and NonGNU ELPA.  This information should be shown on both the package
> page, and in the package listing.
> 
> Sorting by downloads is a good way of getting a quick sense of which
> packages are worth looking into, as they have many users.

I hope you mean sorting by downloads as an optional sorting order?
Because having it by default is IMO an annoyance: t makes it hard to
find a package.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA
  2021-09-20  4:35 ` Eli Zaretskii
@ 2021-09-20  5:54   ` Stefan Kangas
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Kangas @ 2021-09-20  5:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50686, monnier

Eli Zaretskii <eliz@gnu.org> writes:

> I hope you mean sorting by downloads as an optional sorting order?
> Because having it by default is IMO an annoyance: t makes it hard to
> find a package.

Yes, you should be able to sort by any column but probably sorting by
name is the best default.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA
  2021-09-19 21:13 bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA Stefan Kangas
  2021-09-20  4:35 ` Eli Zaretskii
@ 2021-09-20  6:22 ` Lars Ingebrigtsen
  2023-09-07 22:05   ` Stefan Kangas
  2021-10-01 19:58 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 1 reply; 18+ messages in thread
From: Lars Ingebrigtsen @ 2021-09-20  6:22 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: 50686, monnier

Stefan Kangas <stefan@marxist.se> writes:

> I think we should show the number of downloads on packages on GNU ELPA
> and NonGNU ELPA.  This information should be shown on both the package
> page, and in the package listing.

I think that's a very good idea.  There's an information disclosure
issue, I guess, but the privacy implications should be pretty much
non-existent.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA
  2021-09-19 21:13 bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA Stefan Kangas
  2021-09-20  4:35 ` Eli Zaretskii
  2021-09-20  6:22 ` Lars Ingebrigtsen
@ 2021-10-01 19:58 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-10-02 13:39   ` Stefan Kangas
  2 siblings, 1 reply; 18+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-10-01 19:58 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: 50686

> I think we should show the number of downloads on packages on GNU ELPA
> and NonGNU ELPA.  This information should be shown on both the package
> page, and in the package listing.

Fine by me, but I'd need someone else to do it ;-)


        Stefan






^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA
  2021-10-01 19:58 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-10-02 13:39   ` Stefan Kangas
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Kangas @ 2021-10-02 13:39 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 50686

Den fre 1 okt. 2021 kl 21:58 skrev Stefan Monnier <monnier@iro.umontreal.ca>:
>
> > I think we should show the number of downloads on packages on GNU ELPA
> > and NonGNU ELPA.  This information should be shown on both the package
> > page, and in the package listing.
>
> Fine by me, but I'd need someone else to do it ;-)

I've been looking into sorting by columns as well, and here is one
ready-made option that is dual-licensed under GPLv2 and MIT:

https://mottie.github.io/tablesorter/
https://github.com/Mottie/tablesorter

Maybe there exist even better ones, but this one seems to do the job.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA
  2021-09-20  6:22 ` Lars Ingebrigtsen
@ 2023-09-07 22:05   ` Stefan Kangas
  2023-09-08  8:30     ` Adam Porter
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Kangas @ 2023-09-07 22:05 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 50686, monnier

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Stefan Kangas <stefan@marxist.se> writes:
>
>> I think we should show the number of downloads on packages on GNU ELPA
>> and NonGNU ELPA.  This information should be shown on both the package
>> page, and in the package listing.
>
> I think that's a very good idea.  There's an information disclosure
> issue, I guess, but the privacy implications should be pretty much
> non-existent.

For reference, here are some sample logs from the server that Stefan
Monnier sent me in 2021, with all IPs changed to 127.0.0.1 or ::1.

I apologize in advance if the formatting is messed up, obviously there
will be exactly one log message (starting with the IP address) per line.

127.0.0.1 - - [01/Oct/2021:00:02:27 -0400] "GET /packages/ HTTP/1.1"
200 36385 "-" "curl/7.47.0"
127.0.0.1 - - [01/Oct/2021:00:02:41 -0400] "GET
/packages/auto-overlays-0.10.8.tar.lz HTTP/1.1" 200 43410 "-"
"CCBot/3.1
(https://commoncrawl.org/faq/; info@commoncrawl.org)"
127.0.0.1 - - [01/Oct/2021:00:02:56 -0400] "GET
/packages/auto-overlays-0.10.9.tar.lz HTTP/1.1" 200 43672 "-"
"CCBot/3.1
(https://commoncrawl.org/faq/; info@commoncrawl.org)"
127.0.0.1 - - [01/Oct/2021:00:03:00 -0400] "GET /archive-contents
HTTP/1.1" 404 491 "-" "URL/Emacs Emacs/27.2 (X11;
x86_64-pc-linux-gnu)"
127.0.0.1 - - [01/Oct/2021:00:03:19 -0400] "GET /archive-contents
HTTP/1.1" 404 491 "-" "URL/Emacs Emacs/27.2 (X11;
x86_64-pc-linux-gnu)"
::1 - - [01/Oct/2021:00:04:09 -0400] "GET / HTTP/1.1" 200 4133 "-"
"check_http/v1.5 (nagios-plugins 1.5)"
::1 - - [01/Oct/2021:00:04:12 -0400] "GET /packages/archive-contents
HTTP/1.1" 200 95101 "-" "URL/Emacs Emacs/27.1
(Windows-NT; 32bit; i686-w64-mingw32)"
::1 - - [01/Oct/2021:00:04:12 -0400] "GET
/packages/archive-contents.sig HTTP/1.1" 200 738 "-" "URL/Emacs
Emacs/27.1
(Windows-NT; 32bit; i686-w64-mingw32)"
127.0.0.1 - - [01/Oct/2021:00:04:18 -0400] "GET /org-readme.txt
HTTP/1.1" 404 491 "-" "URL/Emacs Emacs/27.2 (X11;
x86_64-pc-linux-gnu)"
::1 - - [01/Oct/2021:00:04:47 -0400] "GET / HTTP/1.0" 200 4133 "-"
"check_http/v1.5 (nagios-plugins 1.5)"
127.0.0.1 - - [01/Oct/2021:00:05:02 -0400] "GET
/packages/javascript/jquery.filtertable.min.js HTTP/1.1" 404 491
"http://elpa.gnu.org/packages//svg-clock.html" "Mozilla/5.0 (Linux;
Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like
Gecko)
Chrome/94.0.4606.61 Mobile Safari/537.36 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
127.0.0.1 - - [01/Oct/2021:00:05:36 -0400] "GET /packages/ HTTP/1.1"
200 36385 "-" "curl/7.47.0"
127.0.0.1 - - [01/Oct/2021:00:06:00 -0400] "GET
/packages/archive-contents HTTP/1.1" 200 95101 "-" "URL/Emacs"
127.0.0.1 - - [01/Oct/2021:00:06:00 -0400] "GET /packages/seq-2.23.tar
HTTP/1.1" 200 13095 "-" "URL/Emacs"
127.0.0.1 - - [01/Oct/2021:00:06:01 -0400] "GET
/packages/let-alist-1.0.6.el HTTP/1.1" 200 2641 "-" "URL/Emacs"
127.0.0.1 - - [01/Oct/2021:00:06:37 -0400] "GET
/packages/archive-contents HTTP/1.1" 200 95064 "-" "URL/Emacs
Emacs/27.2 (Windows-NT; 32bit;
i686-w64-mingw32)"





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA
  2023-09-07 22:05   ` Stefan Kangas
@ 2023-09-08  8:30     ` Adam Porter
  2024-03-05 23:58       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 18+ messages in thread
From: Adam Porter @ 2023-09-08  8:30 UTC (permalink / raw)
  To: stefan; +Cc: 50686, larsi, monnier

[-- Attachment #1: Type: text/plain, Size: 123 bytes --]

If it helps, here's a first pass at a regexp and function to parse the 
logs into download counts per package file.

--Adam

[-- Attachment #2: elpa-stats.el --]
[-- Type: text/x-emacs-lisp, Size: 1457 bytes --]

(defconst elpa-download-log-line-re
  (rx (group (1+ (or digit "." ":")))
      " - - "
      "[" (group (repeat 2 digit) "/" (repeat 3 alpha) "/" (repeat 4 digit)
                 ":" (repeat 2 digit) ":" (repeat 2 digit) ":" (repeat 2 digit)
                 " " (or "+" "-") (repeat 4 digit)) "]"
      ;; ;; HTTP query:
      " \"" (group (1+ alpha))         ; method
      " " (group (1+ (not (any blank)))) ; path
      " " "HTTP/" (1+ (or alnum ".")) "\"" ; protocol
      " " (group (1+ digit))               ; status code
      " " (group (1+ digit))               ; size
      " \"-\" "                            ; ?
      "\"" (group (1+ (not (any "\"")))) "\""))

(defun elpa-log-to-package-stats (log-string)
  "Return alist of (PACKAGE . DOWNLOADS) seen in LOG-STRING.
    LOG-STRING is an HTTP download log."
  (let ((stats (make-hash-table :test #'equal)))
    (with-temp-buffer
      (insert log-string)
      (goto-char (point-min))
      (cl-loop while (re-search-forward elpa-download-log-line-re nil t)
               for file = (match-string 4)
               when (string-match-p (rx bos "/packages/" (1+ anything) (or ".tar" ".el") eos)
                                    file)
               do (if (gethash file stats)
                      (cl-incf (gethash file stats))
                    (setf (gethash file stats) 1))))
    (map-into stats 'alist)))

;; (("/packages/seq-2.23.tar" . 1) ("/packages/let-alist-1.0.6.el" . 1))

^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA
  2023-09-08  8:30     ` Adam Porter
@ 2024-03-05 23:58       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-06  0:22         ` Adam Porter
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-05 23:58 UTC (permalink / raw)
  To: Adam Porter; +Cc: 50686, stefan, larsi

> If it helps, here's a first pass at a regexp and function to parse the logs
> into download counts per package file.

Thanks.  I got some inspiration from it for the code I pushed to
`elpa-admin.el`.  It's still not doing anything, tho.


        Stefan






^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA
  2024-03-05 23:58       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-06  0:22         ` Adam Porter
  2024-03-06  2:57           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 18+ messages in thread
From: Adam Porter @ 2024-03-06  0:22 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 50686, stefan, larsi

On 3/5/24 17:58, Stefan Monnier wrote:
>> If it helps, here's a first pass at a regexp and function to parse the logs
>> into download counts per package file.
> 
> Thanks.  I got some inspiration from it for the code I pushed to
> `elpa-admin.el`.  It's still not doing anything, tho.

Thanks.  What would the next steps be?





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA
  2024-03-06  0:22         ` Adam Porter
@ 2024-03-06  2:57           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-06  5:04             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-06  2:57 UTC (permalink / raw)
  To: Adam Porter; +Cc: 50686, stefan, larsi

>>> If it helps, here's a first pass at a regexp and function to parse the logs
>>> into download counts per package file.
>> Thanks.  I got some inspiration from it for the code I pushed to
>> `elpa-admin.el`.  It's still not doing anything, tho.
> Thanks.  What would the next steps be?

- Have a cron job use that code to maintain historical access stats.
  [ almost done.  ]
- change the HTML-building code to use those stats to "enrich" the HTML
  with "popularity" info.
  [ not started yet.  ]


        Stefan






^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA
  2024-03-06  2:57           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-06  5:04             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-08 23:20               ` Adam Porter
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-06  5:04 UTC (permalink / raw)
  To: Adam Porter; +Cc: 50686, stefan, larsi

If you go to http://elpa.gnu.org/packages/ you'll now see a new column
"Rank" which shows a percentile ranking for each package.


        Stefan






^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA
  2024-03-06  5:04             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-08 23:20               ` Adam Porter
  2024-03-09 14:37                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 18+ messages in thread
From: Adam Porter @ 2024-03-08 23:20 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 50686, stefan, larsi

Hi Stefan,

On 3/5/24 23:04, Stefan Monnier wrote:
> If you go to http://elpa.gnu.org/packages/ you'll now see a new column
> "Rank" which shows a percentile ranking for each package.

That's very cool.  I guess it's not looking very far back in the 
download data (yet?), because I see, e.g. my recently added Activities 
package being listed at 90%, which couldn't possibly have nearly as many 
downloads as other packages that have been there for much longer and are 
much more widely used.

What are your plans for the stats from here?  e.g. it would be helpful 
to be able to see stats within a time period, maybe a graph over time, a 
list of downloads per version, etc.  (Not that I expect you to do all of 
these things yourself, just curious.)

Thanks,
Adam





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA
  2024-03-08 23:20               ` Adam Porter
@ 2024-03-09 14:37                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-11 20:07                   ` Adam Porter
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-09 14:37 UTC (permalink / raw)
  To: Adam Porter; +Cc: 50686, stefan, larsi

>> If you go to http://elpa.gnu.org/packages/ you'll now see a new column
>> "Rank" which shows a percentile ranking for each package.
> That's very cool.  I guess it's not looking very far back in the download
> data (yet?),

I had the logs only for a two weeks or so (plus some old logs from
many years ago, actually), indeed.

> What are your plans for the stats from here?

As the info is extracted from the logs it's added to a file that
accumulates the counts per week per package.

You can see the relevant code in the `elpa--wsl*` functions in:

    https://git.savannah.gnu.org/cgit/emacs/elpa.git/tree/elpa-admin.el?h=elpa-admin

> e.g. it would be helpful to be able to see stats within a time period,
> maybe a graph over time,

Indeed.  Patches welcome.

> a list of downloads per version, etc.

Currently I count the "interest" in the package, so I don't distinguish
the version of the package, nor whether the access is for the tarball or
the package's web page, or the package's readme.txt, or the package's badge.

I'd like to the keep the stats database reasonably small (it's currently
around 150kB,  and I expect it'll take a year before it reaches 1MB), so
I'd rather not segregate per version.


        Stefan






^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA
  2024-03-09 14:37                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-11 20:07                   ` Adam Porter
  2024-03-11 20:28                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 18+ messages in thread
From: Adam Porter @ 2024-03-11 20:07 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 50686, stefan, larsi

Hi Stefan,

On 3/9/24 08:37, Stefan Monnier wrote:
>>> If you go to http://elpa.gnu.org/packages/ you'll now see a new column
>>> "Rank" which shows a percentile ranking for each package.
>> That's very cool.  I guess it's not looking very far back in the download
>> data (yet?),
> 
> I had the logs only for a two weeks or so (plus some old logs from
> many years ago, actually), indeed.

I see.  Are the rest of the logs still available on the ELPA server, or 
is that all we have for historical data?

>> a list of downloads per version, etc.
> 
> Currently I count the "interest" in the package, so I don't distinguish
> the version of the package, nor whether the access is for the tarball or
> the package's web page, or the package's readme.txt, or the package's badge.

That seems like a very different kind of data than the number of times a 
package has been downloaded (i.e. by an Emacs instance).  IME a small 
fraction of hits to a package's GitHub repo seem to result in 
installations; "interest" tends to be far more than "interested enough 
to install."

> I'd like to the keep the stats database reasonably small (it's currently
> around 150kB,  and I expect it'll take a year before it reaches 1MB), so
> I'd rather not segregate per version.

Is there a way that I could change your mind about that?  Having the 
actual download counts per version would be very useful.

As far as database size, the download counts per version (i.e. per 
tarball filename) could be stored in a table like:

   FILENAME | DOWNLOAD_COUNT | LAST_UPDATED

Which could be updated when the logs are processed (omitting any logged 
download from before the LAST_UPDATED timestamp).  And while that 
wouldn't show when the downloads occurred, it would still be useful to 
get an idea of how many users a package has (i.e. ones that actually 
install updates to it), and it would be a very small amount of data to 
store.

--Adam





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA
  2024-03-11 20:07                   ` Adam Porter
@ 2024-03-11 20:28                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-11 20:55                       ` Adam Porter
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-11 20:28 UTC (permalink / raw)
  To: Adam Porter; +Cc: 50686, stefan, larsi

>> I had the logs only for a two weeks or so (plus some old logs from
>> many years ago, actually), indeed.
> I see.  Are the rest of the logs still available on the ELPA server, or is
> that all we have for historical data?

That's all we have.

>>> a list of downloads per version, etc.
>> Currently I count the "interest" in the package, so I don't distinguish
>> the version of the package, nor whether the access is for the tarball or
>> the package's web page, or the package's readme.txt, or the package's badge.
> That seems like a very different kind of data than the number of times
> a package has been downloaded (i.e. by an Emacs instance).  IME a small
> fraction of hits to a package's GitHub repo seem to result in installations;
> "interest" tends to be far more than "interested enough to install."

Just because the "interest" tends to be far more than "interested enough
to install" doesn't mean that the two aren't strongly correlated.
Also my impression is that package web pages in `elpa.gnu.org` are not
visited nearly as often as a Github project page.

But it'd be definitely worth checking how the two measures compare.
Patches welcome.

>> I'd like to the keep the stats database reasonably small (it's currently
>> around 150kB,  and I expect it'll take a year before it reaches 1MB), so
>> I'd rather not segregate per version.
> Is there a way that I could change your mind about that?  Having the actual
> download counts per version would be very useful.

Maybe if you argue about what kind of use would make it useful?

> As far as database size, the download counts per version (i.e. per tarball
> filename) could be stored in a table like:
>
>   FILENAME | DOWNLOAD_COUNT | LAST_UPDATED

Maybe we could keep that in addition to the current data (not sure how
useful would be the "last_updated").

Again, tho, the question is "what for?".

My goal was mostly to show relative popularity, so when you search for
packages providing a given feature and you find 4 different options, the
rank percentile can give you an idea of which one is more popular.


        Stefan






^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA
  2024-03-11 20:28                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-11 20:55                       ` Adam Porter
  2024-03-11 22:13                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 18+ messages in thread
From: Adam Porter @ 2024-03-11 20:55 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 50686, stefan, larsi

On 3/11/24 15:28, Stefan Monnier wrote:
>>> I had the logs only for a two weeks or so (plus some old logs from
>>> many years ago, actually), indeed.
>> I see.  Are the rest of the logs still available on the ELPA server, or is
>> that all we have for historical data?
> 
> That's all we have.

Ok.  Going forward, will the logs we have now be preserved, or do they 
get rotated away?

>>>> a list of downloads per version, etc.
>>> Currently I count the "interest" in the package, so I don't distinguish
>>> the version of the package, nor whether the access is for the tarball or
>>> the package's web page, or the package's readme.txt, or the package's badge.
>> That seems like a very different kind of data than the number of times
>> a package has been downloaded (i.e. by an Emacs instance).  IME a small
>> fraction of hits to a package's GitHub repo seem to result in installations;
>> "interest" tends to be far more than "interested enough to install."
> 
> Just because the "interest" tends to be far more than "interested enough
> to install" doesn't mean that the two aren't strongly correlated.
> Also my impression is that package web pages in `elpa.gnu.org` are not
> visited nearly as often as a Github project page.
> 
> But it'd be definitely worth checking how the two measures compare.
> Patches welcome.

Ok, meaning that you'd accept a patch that does...what, exactly, to the 
database?  :)

>>> I'd like to the keep the stats database reasonably small (it's currently
>>> around 150kB,  and I expect it'll take a year before it reaches 1MB), so
>>> I'd rather not segregate per version.
>> Is there a way that I could change your mind about that?  Having the actual
>> download counts per version would be very useful.
> 
> Maybe if you argue about what kind of use would make it useful?

For example, if a package at version V has N downloads after 6 months, 
and then the package is updated to version V+1, how many downloads that 
version has after 6 months would give some indication of whether the 
package is growing in popularity, whether initial users are still using 
it and upgrading it, or whether it's falling out of favor.  And, over 
time, that might help determine whether an obsolete package should be 
removed from ELPA.

Also, since a package's minimum Emacs version may increase when its 
version increases, that could provide some useful information (not that 
I'm suggesting to track that in the ELPA code, but some other tool could 
correlate the data).

> My goal was mostly to show relative popularity, so when you search for
> packages providing a given feature and you find 4 different options, the
> rank percentile can give you an idea of which one is more popular.

That's definitely a worthy goal.

Another goal that's relevant to me, as a package author, is to determine 
whether a package of mine is still in use at all.  For example, my 
package org-ql is intended to subsume my older package, org-rifle, but I 
hear now and then about people who still use org-rifle.  Eventually I'd 
like to see that the downloads of org-rifle fall off to the point that I 
could declare it an archived, obsoleted package, but I don't want to do 
that prematurely.  (Those packages are on MELPA, but the principle 
applies regardless.)

--Adam





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA
  2024-03-11 20:55                       ` Adam Porter
@ 2024-03-11 22:13                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-11 22:13 UTC (permalink / raw)
  To: Adam Porter; +Cc: 50686, stefan, larsi

>>>> I had the logs only for a two weeks or so (plus some old logs from
>>>> many years ago, actually), indeed.
>>> I see.  Are the rest of the logs still available on the ELPA server, or is
>>> that all we have for historical data?
>> That's all we have.
> Ok.  Going forward, will the logs we have now be preserved, or do they get
> rotated away?

They get rotated away.  We do keep the weekly counts that we accumulate
in our `wsl-stats.eld` file.

>>>>> a list of downloads per version, etc.
>>>> Currently I count the "interest" in the package, so I don't distinguish
>>>> the version of the package, nor whether the access is for the tarball or
>>>> the package's web page, or the package's readme.txt, or the package's badge.
>>> That seems like a very different kind of data than the number of times
>>> a package has been downloaded (i.e. by an Emacs instance).  IME a small
>>> fraction of hits to a package's GitHub repo seem to result in installations;
>>> "interest" tends to be far more than "interested enough to install."
>> Just because the "interest" tends to be far more than "interested enough
>> to install" doesn't mean that the two aren't strongly correlated.
>> Also my impression is that package web pages in `elpa.gnu.org` are not
>> visited nearly as often as a Github project page.
>> But it'd be definitely worth checking how the two measures compare.
>> Patches welcome.
> Ok, meaning that you'd accept a patch that does...what, exactly, to the
> database?  :)

I guess keep separate counts for tarballs and other files, so we can compare?

>>>> I'd like to the keep the stats database reasonably small (it's currently
>>>> around 150kB,  and I expect it'll take a year before it reaches 1MB), so
>>>> I'd rather not segregate per version.
>>> Is there a way that I could change your mind about that?  Having the actual
>>> download counts per version would be very useful.
>> Maybe if you argue about what kind of use would make it useful?
>
> For example, if a package at version V has N downloads after 6 months, and
> then the package is updated to version V+1, how many downloads that version
> has after 6 months would give some indication of whether the package is
> growing in popularity, whether initial users are still using it and
> upgrading it, or whether it's falling out of favor.  And, over time, that
> might help determine whether an obsolete package should be removed
> from ELPA.

Ah, so as to factor out the fact that frequently updated packages will
naturally see more downloads?  I guess that would make sense.

Not completely sure how to write the code, tho: I can see how to go and
dig in the numbers to answer "is the new version less/more popular than
the old one", but not how to use that insight to adjust the percentile
ranking of the package.

>> My goal was mostly to show relative popularity, so when you search for
>> packages providing a given feature and you find 4 different options, the
>> rank percentile can give you an idea of which one is more popular.
>
> That's definitely a worthy goal.
>
> Another goal that's relevant to me, as a package author, is to determine
> whether a package of mine is still in use at all.  For example, my package
> org-ql is intended to subsume my older package, org-rifle, but I hear now
> and then about people who still use org-rifle.  Eventually I'd like to see
> that the downloads of org-rifle fall off to the point that I could declare
> it an archived, obsoleted package, but I don't want to do that prematurely.
> (Those packages are on MELPA, but the principle applies regardless.)

Right.  I guess it would be hard to do because of the mirroring-style
downloads, so even the least popular package still gets downloads.

It's not super high on my todo list for now, but if you're interested in
improving this, I'll be happy to take your patches, install them and let
you play with it to see what comes up.

Currently the `wsl-states.eld` "database" is not exposed on the web
site, part of it is because it contains some "irrelevant" entries
(accesses to non-existing files, some of them very much on purpose
because their names look like "<RANDOM>_nonexisting") which may contain
information I'd rather not expose.  We should try and sanitize it first
to only keep things which do correspond to existing packages/files
(which will also improve the quality of the rankings).


        Stefan






^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-03-11 22:13 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-19 21:13 bug#50686: Show number of downloads on packages on GNU ELPA/NonGNU ELPA Stefan Kangas
2021-09-20  4:35 ` Eli Zaretskii
2021-09-20  5:54   ` Stefan Kangas
2021-09-20  6:22 ` Lars Ingebrigtsen
2023-09-07 22:05   ` Stefan Kangas
2023-09-08  8:30     ` Adam Porter
2024-03-05 23:58       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-06  0:22         ` Adam Porter
2024-03-06  2:57           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-06  5:04             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-08 23:20               ` Adam Porter
2024-03-09 14:37                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-11 20:07                   ` Adam Porter
2024-03-11 20:28                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-11 20:55                       ` Adam Porter
2024-03-11 22:13                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-10-01 19:58 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-10-02 13:39   ` Stefan Kangas

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).