From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Philip Kaludercic Newsgroups: gmane.emacs.devel Subject: Re: Distribution statistics for ELPA and EMMS Date: Tue, 19 Sep 2023 22:06:02 +0000 Message-ID: <87h6npakdx.fsf@posteo.net> References: <875y6mzj4n.fsf@rabkins.net> <2f28dcca-3f8b-eb7b-95ec-1867c0d1eaf4@alphapapa.net> <4da4d2f6-2197-3727-674e-034c353207c5@alphapapa.net> <87o7hy9kzz.fsf@posteo.net> <87fs3agf84.fsf@disroot.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="36402"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Adam Porter , rms@gnu.org, yoni@rabkins.net, emacs-devel@gnu.org To: Akib Azmain Turja Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Sep 20 00:07:11 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qiisB-0009Av-GI for ged-emacs-devel@m.gmane-mx.org; Wed, 20 Sep 2023 00:07:11 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qiirD-00038B-Eo; Tue, 19 Sep 2023 18:06:11 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qiirB-00037y-IX for emacs-devel@gnu.org; Tue, 19 Sep 2023 18:06:09 -0400 Original-Received: from mout01.posteo.de ([185.67.36.65]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qiir9-0000vM-4f for emacs-devel@gnu.org; Tue, 19 Sep 2023 18:06:09 -0400 Original-Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 22CDA240028 for ; Wed, 20 Sep 2023 00:06:05 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1695161165; bh=PmSE7eIAQZs8ZFVujDOXs7hKy41zEa2NxEff0DldmkU=; h=From:To:Cc:Subject:Autocrypt:Date:Message-ID:MIME-Version:From; b=kTQtSd0UbW7GXZH0wktkijnNrYFdsjN2TNXFt9PEhdab93yDCBPRNvD9P0JmCC294 z3bal1npYxWcyZC4QypXKCOTSONlz3mzO7o63A39LOAwr5pc/PbiIh5IlXrFV+ZCEO Oy8wQrCKDpNOcvnTzqloD62EIeSTiBwSww+I/zR3+1uX+Fhn6UaplNQNvbfQQgxHPW EKyLldyYFe+Cu+HIbJPWYFr/d8PqvfszhJxVw6gKhpw3PF+F1TKI+E2QxNGtDK9B16 9ahvx0VgVRTOyqXe4Yx7bckjHLfDKj8b/6/MXbtOFhCUrUSOhqDVYsgbOywARKioTr wqKHk49K8nXSA== Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4Rqwj76KhDz9rxD; Wed, 20 Sep 2023 00:06:03 +0200 (CEST) In-Reply-To: <87fs3agf84.fsf@disroot.org> (Akib Azmain Turja's message of "Wed, 20 Sep 2023 01:00:59 +0600") Autocrypt: addr=philipk@posteo.net; keydata= mDMEZBBQQhYJKwYBBAHaRw8BAQdAHJuofBrfqFh12uQu0Yi7mrl525F28eTmwUDflFNmdui0QlBo aWxpcCBLYWx1ZGVyY2ljIChnZW5lcmF0ZWQgYnkgYXV0b2NyeXB0LmVsKSA8cGhpbGlwa0Bwb3N0 ZW8ubmV0PoiWBBMWCAA+FiEEDg7HY17ghYlni8XN8xYDWXahwukFAmQQUEICGwMFCQHhM4AFCwkI BwIGFQoJCAsCBBYCAwECHgECF4AACgkQ8xYDWXahwulikAEA77hloUiSrXgFkUVJhlKBpLCHUjA0 mWZ9j9w5d08+jVwBAK6c4iGP7j+/PhbkxaEKa4V3MzIl7zJkcNNjHCXmvFcEuDgEZBBQQhIKKwYB BAGXVQEFAQEHQI5NLiLRjZy3OfSt1dhCmFyn+fN/QKELUYQetiaoe+MMAwEIB4h+BBgWCAAmFiEE Dg7HY17ghYlni8XN8xYDWXahwukFAmQQUEICGwwFCQHhM4AACgkQ8xYDWXahwukm+wEA8cml4JpK NeAu65rg+auKrPOP6TP/4YWRCTIvuYDm0joBALw98AMz7/qMHvSCeU/hw9PL6u6R2EScxtpKnWof z4oM Received-SPF: pass client-ip=185.67.36.65; envelope-from=philipk@posteo.net; helo=mout01.posteo.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:310797 Archived-At: Akib Azmain Turja writes: > Philip Kaludercic writes: > >> Adam Porter writes: >> >>> [I just noticed this message from a few months ago.] >>> >>> On 7/16/23 21:25, Richard Stallman wrote: >>>> [[[ To any NSA and FBI agents reading my email: please consider ]]] >>>> [[[ whether defending the US Constitution against all enemies, ]]] >>>> [[[ foreign or domestic, requires you to follow Snowden's example. ]]] >>>> We could have two options for downloading, one which is "for a real >>>> user" and one which is "for periodic testing". >>>> The only difference would be that the former increments the user >>>> download count and the latter does not. >>> >>> I like this idea, but it seems like it would be hard to enforce. It >>> could even go the other way, i.e. have Emacs send a query string or >>> header when installing a package manually, which could be logged and >>> used to filter the download logs later. But even that might be harder >>> than it seems, e.g. if I call a command like: >>> >>> emacs --eval "(package-install FOO)" >>> >>> ...to non-interactively install a package into a local directory for >>> testing, how far, and in how many places, would some kind of flag need >>> to be propagated to end up in the server's logs? >> >> There is an inherent unreliability in these kinds of statistics that has >> to be accepted. The question is therefore are issues like these >> significant or would they skew the results. This has to be considered >> under a false-positive and a false-negative approach, depending on what >> we want to measure. > > How are these numbers going to be useful? This can't be a measure of > "popularity." Yes, they are at best an indicator. A malicious person could always manipulate them, unless considerable effort is put into verifying the information -- which not only comes at the cost of time but also is likely to decrease the amount of available information. > Say, for example, the package "git-commit" is 11th most downloaded > package on MELPA. Is it really popular? Few people install it > explicitly. Only one package depends on it, which is Magit, a super > popular package. So git-commit is automatically installed as a > dependency when Magit is installed. We should be able to solve that problem by adding a query string to the request, as Adam suggests: https://elpa.gnu.org/packages/poker-0.2.tar?selected=yes https://elpa.gnu.org/packages/seq-2.24.tar?selected=no https://elpa.gnu.org/packages/project-0.10.0.tar?selected=yes&upgrade=yes etc. Given this information, you know the user doesn't object to having this information used (depending on whether or not this is a opt-in or out-out thing), the version being fetched, whether it is a dependency or not and whether it was an upgrade. > And also, packages that get more frequent update are downloaded more > than whose update less frequently. So its indeed possible for a less > popular but frequently updated package gets more downloaded than a > mature well written more popular package. We can remember upgrade-counts over the last week, year and all time. > And also there are straight.el, Elpaca and Quelpa guys who don't use the > ELPA at all. Of course, hence "inherent unreliability", though I would be surprised if the choice of package manager has a strong causal effect on what packages one uses (setting aside that from-source package managers can install unreleased packages that are not distributed in any archive). >> If it is all about dopamine-boosting, I think a >> false-positive approach would be better ;^) > > OK... > > (while t > (package-install 'eat) > (package-delete (cadr (assoc 'eat package-alist)))) > > Soon: Eat is the most popular terminal emulator. xD Good point (though just asynchronously spamming the right URL would be more efficient), my idea would be to count an IP address only once per day, ignoring how many concrete requests were sent out and also use a list of excluded addresses, such as Tor exit nodes, to filter out from the statistics. This approach approach, together with the fact that from-source package managers wouldn't participate unless they are actively instructed to do so, are further arguments for a false-negative approach.