From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Pip Cet via "Emacs development discussions." Newsgroups: gmane.emacs.devel Subject: Re: New "make benchmark" target Date: Sat, 04 Jan 2025 16:34:24 +0000 Message-ID: <87pll2fsj7.fsf@protonmail.com> References: <87h679kftn.fsf@protonmail.com> <87frm5z06l.fsf@protonmail.com> <86msgdnqmv.fsf@gnu.org> <87wmfhxjce.fsf@protonmail.com> <86jzbhnmzg.fsf@gnu.org> <87o70txew4.fsf@protonmail.com> <871pxorh30.fsf@protonmail.com> <86wmfgm3a5.fsf@gnu.org> Reply-To: Pip Cet Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="21355"; mail-complaints-to="usenet@ciao.gmane.io" Cc: acorallo@gnu.org, stefankangas@gmail.com, mattiase@acm.org, eggert@cs.ucla.edu, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Jan 04 18:38:37 2025 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tU86f-0005PF-4I for ged-emacs-devel@m.gmane-mx.org; Sat, 04 Jan 2025 18:38:37 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tU85s-0003pO-JC; Sat, 04 Jan 2025 12:37:48 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tU76f-0006dU-KV for emacs-devel@gnu.org; Sat, 04 Jan 2025 11:34:33 -0500 Original-Received: from mail-4316.protonmail.ch ([185.70.43.16]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tU76d-0004Zk-6e; Sat, 04 Jan 2025 11:34:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1736008467; x=1736267667; bh=e+bsNnwPb9VgKYDFlnusISwzlT4MpWkaVkabosXb5Rg=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector:List-Unsubscribe:List-Unsubscribe-Post; b=C6tVfFwyx/Onq5FeUg3dzwIKejFZ4bw/8FvfPb+8G33GlfWWsLU7RkrbVV1FcNPUV ddIJb8WrGbry72s78DYxu4yZcMPpBPPM5cnUF9tjqZMhmaE/hhO6cz1+g28/e0xCw9 DcNhr8xTZ8PHongHbyNlGkE3DUvApT+Zbu17qcSxKoNcOZdJaqgfQ7AqEzsTsymdzz pmicooYvE6g7BltoBUtEcx+JcK/LuHy1fc0Mw6EAfvZbzXUWOdEd2vzH3dt13DrplI 8sMdk0QqR/ynIgiZJ8pybTkQrS7Gg+8by1ZyMm3aSGDMwHAonkMQvZAZLUQ6vFgX32 2rWhgwNSNeX7A== In-Reply-To: <86wmfgm3a5.fsf@gnu.org> Feedback-ID: 112775352:user:proton X-Pm-Message-ID: a55e324a65aff5396ffa20a9dd8cf11312881250 Received-SPF: pass client-ip=185.70.43.16; envelope-from=pipcet@protonmail.com; helo=mail-4316.protonmail.ch X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, FREEMAIL_REPLY=1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Sat, 04 Jan 2025 12:37:46 -0500 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:327672 Archived-At: "Eli Zaretskii" writes: >> Date: Mon, 30 Dec 2024 21:34:55 +0000 >> From: Pip Cet >> Cc: Eli Zaretskii , stefankangas@gmail.com, mattiase@acm.o= rg, eggert@cs.ucla.edu, emacs-devel@gnu.org >> >> > I'm open to patches to elisp-benchmarks (and to its hypothetical copy = in >> > emacs-core). My opinion that something can potentially be improved in >> >> What's the best way to report the need for such improvements? > > Since you've pushed that to a branch, I suggest to submit bug reports > about these issues, using "[scratch/elisp-benchmarks]" in the Subject > of the bug. I've studied the issues a bit more. This is a bit long, but in summary, I think improving elisp-benchmarks.el (the specific file, not the entire package, which I still intend to reuse) would take more time than starting from ERT, so I'll look into the latter a bit more (maybe I'll run into unforeseen difficulties and change my mind). >> > it (why not), but I personally ATM don't understand the need for ERT. >> >> Let's focus on the basics right now: people know how to write ERT tests. >> We have hundreds of them. Some of them could be benchmarks, and we want >> to make that as easy as possible. > > We can later add more benchmarks using ERT. There's no contradiction. Before describing the issues I found, let me agree with this. If only for the sake of having a better git history, we should merge the elisp-benchmarks branch ASAP after changing the directory name as discussed. I'll force-push a fixed branch after filing the reports; I still think doing a synchronized rebase-and-merge would be worth it since it would result in a cleaner git history than a merge-with-conflicts of a branch based on a previous commit on the master branch. >> It also allows a third class of tests: stress tests which we want to >> execute more often than once per test run, which identify occasional >> failures in code that needs to be executed very often to establish >> stability (think bug#75105: (cl-random 1.0e+INF) produces an incorrect >> result once every 8 million runs). IIRC, right now ERT uses ad-hoc >> loops for such tests, but it'd be nicer to expose the repetition count >> in the framework (I'm not going to run the non-expensive testsuite on >> FreeDOS if that means waiting for a million iterations on an emulated >> machine). >> >> (I also think we should introduce an ert-how structure that describes ho= w >> a test is to be run: do we want to inhibit GC or allow it? Run some >> warm-up test runs or not? What's the expected time, and when should we >> time out? We can't run the complete matrix for all tests, so we need >> some hints in the test, and the lack of a test declaration in >> elisp-benchmarks hurts us there). > > These seem to be long-term goals of improving the benchmark suite. > They are fine by me, but I don't see why they should preclude > installing the benchmarks we have without first converting them to > ERT. We can do that later, if we decide it's worth the effort. I agree again. Please read this message as an explanation for why I, personally, think that it is worth the effort. It's not meant as an attack, and it doesn't contradict what you said above in any way. I'm reporting a small number of elisp-benchmarks "bugs" (I think the term is likely to be contentious; I use it because that's what the mailing list is called). All of them are fixable. Most of them are easily fixable by moving to ERT. In my opinion, fixing them in elisp-bechmarks.el is not, as far as Emacs development is concerned, necessary or helpful: we should spend our time improving ERT rather than discussing which parts of it need to be reimplemented (the answer, of course, is that all parts of ERT are needed and none need to be reimplemented: let's just use it). I'm not saying elisp-benchmarks.el is bad software: if the goal is to produce a new benchmarking framework, without using existing code, for use in Emacs, it's a good early start. Continuing the effort would be a significant time investment, and the remaining time to reaching the goal of a generally useful benchmarking framework is greater than what we need to do if we start with ERT (and reuse the benchmarks, of course; the issues are overwhelmingly in elisp-benchmarks.el). elisp-benchmarks.el has not defined the circumstances in which it is meant to be used: many of the issues can be avoided by running elisp-benchmarks in a clean session which is terminated immediately after running the benchmark. However, if this limitation is meant to be permanent, it is inappropriate to declare elisp-benchmarks-run interactive, and it would mean elisp-benchmarks-run should enforce that it is run just once per session (either by terminating it or by setting a flag). In reporting the issues, I worked under the assumption that elisp-benchmarks can usefully be run in existing Emacs sessions, interactive or not, as well as new ones, interactive or not. If this is considered out of scope for elisp-benchmarks, this would limit its usefulness massively, and we would still need to declare this limitation precisely. Unfortunately, I decided to stop at some point: the issues, IMHO, need to be addressed before we can consider the question of whether the numbers produced by elisp-benchmarks are useful enough. In particular, as you (Andrea) correctly pointed out, it is sometimes appropriate to use an average run time (or, non-equivalently, an average speed) for reporting test results; the assumptions needed for this are very significant and need to be spelled out explicitly. The vast majority of "make benchmark" uses which I think should happen cannot meet these stringent requirements. To put things simply, it is better to discard outliers (test runs which take significantly longer than the rest). Averaging doesn't do that: it simply ruins your entire test run if there is a significant outlier. IOW, running the benchmarks with a large repetition count is very likely to result in useful data being discarded, and a useless result. elisp-benchmarks.el makes an attempt to detect outliers by reporting the (modified) standard deviation of test times. This is, again, okay for some use cases, but for others, not so much. In particular, while a large standard deviation is a sufficient criterion for discarding a test, a large repetition count can produce a small standard deviation while reporting an unreliable average. IMHO, reporting the minimum and maximum run time would be more useful than the current result (the minimum time for a successful benchmark run is a very useful number. If there was no system malfunction and the repetition count was large enough, I still thisk this is almost always the number we want). It would mean increasing the repetition count would improve the data, while in the current implementation, it mostly increases the risk of reporting unreliable data. My conclusion is that elisp-benchmarks.el (again, the benchmarks are fine) isn't the right way forward. I'm happy to change the scratch/elisp-benchmarks branch in the ways we've discussed, and it should be merged, but if someone decides to incrementally solve some of the issues, that, while not very harmful, would be an inefficient use of resources. Benchmarks need a test framework. The options are reimplementing ERT or using it. I prefer the second approach and will investigate it further. Pip