all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: acorallo@gnu.org, stefankangas@gmail.com, mattiase@acm.org,
	eggert@cs.ucla.edu, emacs-devel@gnu.org
Subject: Re: New "make benchmark" target
Date: Sat, 04 Jan 2025 16:34:24 +0000	[thread overview]
Message-ID: <87pll2fsj7.fsf@protonmail.com> (raw)
In-Reply-To: <86wmfgm3a5.fsf@gnu.org>

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Mon, 30 Dec 2024 21:34:55 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: Eli Zaretskii <eliz@gnu.org>, stefankangas@gmail.com, mattiase@acm.org, eggert@cs.ucla.edu, emacs-devel@gnu.org
>>
>> > I'm open to patches to elisp-benchmarks (and to its hypothetical copy in
>> > emacs-core).  My opinion that something can potentially be improved in
>>
>> What's the best way to report the need for such improvements?
>
> Since you've pushed that to a branch, I suggest to submit bug reports
> about these issues, using "[scratch/elisp-benchmarks]" in the Subject
> of the bug.

I've studied the issues a bit more.  This is a bit long, but in summary,
I think improving elisp-benchmarks.el (the specific file, not the entire
package, which I still intend to reuse) would take more time than
starting from ERT, so I'll look into the latter a bit more (maybe I'll
run into unforeseen difficulties and change my mind).

>> > it (why not), but I personally ATM don't understand the need for ERT.
>>
>> Let's focus on the basics right now: people know how to write ERT tests.
>> We have hundreds of them.  Some of them could be benchmarks, and we want
>> to make that as easy as possible.
>
> We can later add more benchmarks using ERT.  There's no contradiction.

Before describing the issues I found, let me agree with this.  If only
for the sake of having a better git history, we should merge the
elisp-benchmarks branch ASAP after changing the directory name as
discussed.  I'll force-push a fixed branch after filing the reports; I
still think doing a synchronized rebase-and-merge would be worth it
since it would result in a cleaner git history than a
merge-with-conflicts of a branch based on a previous commit on the
master branch.

>> It also allows a third class of tests: stress tests which we want to
>> execute more often than once per test run, which identify occasional
>> failures in code that needs to be executed very often to establish
>> stability (think bug#75105: (cl-random 1.0e+INF) produces an incorrect
>> result once every 8 million runs).  IIRC, right now ERT uses ad-hoc
>> loops for such tests, but it'd be nicer to expose the repetition count
>> in the framework (I'm not going to run the non-expensive testsuite on
>> FreeDOS if that means waiting for a million iterations on an emulated
>> machine).
>>
>> (I also think we should introduce an ert-how structure that describes how
>> a test is to be run: do we want to inhibit GC or allow it?  Run some
>> warm-up test runs or not? What's the expected time, and when should we
>> time out? We can't run the complete matrix for all tests, so we need
>> some hints in the test, and the lack of a test declaration in
>> elisp-benchmarks hurts us there).
>
> These seem to be long-term goals of improving the benchmark suite.
> They are fine by me, but I don't see why they should preclude
> installing the benchmarks we have without first converting them to
> ERT.  We can do that later, if we decide it's worth the effort.

I agree again.

Please read this message as an explanation for why I, personally, think
that it is worth the effort.  It's not meant as an attack, and it
doesn't contradict what you said above in any way.

I'm reporting a small number of elisp-benchmarks "bugs" (I think the
term is likely to be contentious; I use it because that's what the
mailing list is called).  All of them are fixable.  Most of them are
easily fixable by moving to ERT.

In my opinion, fixing them in elisp-bechmarks.el is not, as far as Emacs
development is concerned, necessary or helpful: we should spend our time
improving ERT rather than discussing which parts of it need to be
reimplemented (the answer, of course, is that all parts of ERT are
needed and none need to be reimplemented: let's just use it).

I'm not saying elisp-benchmarks.el is bad software: if the goal is to
produce a new benchmarking framework, without using existing code, for
use in Emacs, it's a good early start.  Continuing the effort would be a
significant time investment, and the remaining time to reaching the goal
of a generally useful benchmarking framework is greater than what we
need to do if we start with ERT (and reuse the benchmarks, of course;
the issues are overwhelmingly in elisp-benchmarks.el).

elisp-benchmarks.el has not defined the circumstances in which it is
meant to be used: many of the issues can be avoided by running
elisp-benchmarks in a clean session which is terminated immediately
after running the benchmark.  However, if this limitation is meant to be
permanent, it is inappropriate to declare elisp-benchmarks-run
interactive, and it would mean elisp-benchmarks-run should enforce that
it is run just once per session (either by terminating it or by setting
a flag).

In reporting the issues, I worked under the assumption that
elisp-benchmarks can usefully be run in existing Emacs sessions,
interactive or not, as well as new ones, interactive or not.  If this is
considered out of scope for elisp-benchmarks, this would limit its
usefulness massively, and we would still need to declare this limitation
precisely.

Unfortunately, I decided to stop at some point: the issues, IMHO, need
to be addressed before we can consider the question of whether the
numbers produced by elisp-benchmarks are useful enough.

In particular, as you (Andrea) correctly pointed out, it is sometimes
appropriate to use an average run time (or, non-equivalently, an average
speed) for reporting test results; the assumptions needed for this are
very significant and need to be spelled out explicitly.  The vast
majority of "make benchmark" uses which I think should happen cannot
meet these stringent requirements.

To put things simply, it is better to discard outliers (test runs which
take significantly longer than the rest).  Averaging doesn't do that: it
simply ruins your entire test run if there is a significant outlier.
IOW, running the benchmarks with a large repetition count is very likely
to result in useful data being discarded, and a useless result.

elisp-benchmarks.el makes an attempt to detect outliers by reporting the
(modified) standard deviation of test times.  This is, again, okay for
some use cases, but for others, not so much.  In particular, while a
large standard deviation is a sufficient criterion for discarding a
test, a large repetition count can produce a small standard deviation
while reporting an unreliable average.  IMHO, reporting the minimum and
maximum run time would be more useful than the current result (the
minimum time for a successful benchmark run is a very useful number.  If
there was no system malfunction and the repetition count was large
enough, I still thisk this is almost always the number we want).  It
would mean increasing the repetition count would improve the data, while
in the current implementation, it mostly increases the risk of reporting
unreliable data.

My conclusion is that elisp-benchmarks.el (again, the benchmarks are
fine) isn't the right way forward.

I'm happy to change the scratch/elisp-benchmarks branch in the ways
we've discussed, and it should be merged, but if someone decides to
incrementally solve some of the issues, that, while not very harmful,
would be an inefficient use of resources.

Benchmarks need a test framework.  The options are reimplementing ERT or
using it.  I prefer the second approach and will investigate it further.

Pip




  parent reply	other threads:[~2025-01-04 16:34 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-11 22:37 Improving EQ Pip Cet via Emacs development discussions.
2024-12-12  6:36 ` Eli Zaretskii
2024-12-12  8:23   ` Andrea Corallo
2024-12-12  8:36   ` Pip Cet via Emacs development discussions.
2024-12-12  9:18     ` Eli Zaretskii
2024-12-12  9:35     ` Visuwesh
2024-12-12 10:40     ` Andrea Corallo
2024-12-12 17:46       ` Pip Cet via Emacs development discussions.
2024-12-12 19:09         ` Eli Zaretskii
2024-12-12 10:53     ` New "make benchmark" target Stefan Kangas
2024-12-12 10:59       ` Andrea Corallo
2024-12-12 16:53         ` Pip Cet via Emacs development discussions.
2024-12-13  0:49           ` Stefan Kangas
2024-12-13  7:37             ` Andrea Corallo
2024-12-14 12:00               ` Stefan Kangas
2024-12-14 14:06                 ` Stefan Monnier
2024-12-14 11:34             ` Pip Cet via Emacs development discussions.
2024-12-14 11:58               ` Stefan Kangas
2024-12-14 20:07                 ` Pip Cet via Emacs development discussions.
2024-12-14 20:20                   ` João Távora
2024-12-15  0:57                   ` Stefan Kangas
2024-12-22 16:04                     ` Pip Cet via Emacs development discussions.
2024-12-29 10:47                       ` Andrea Corallo
2024-12-30 11:45                         ` Pip Cet via Emacs development discussions.
2024-12-30 14:15                           ` Eli Zaretskii
2024-12-30 15:00                             ` Pip Cet via Emacs development discussions.
2024-12-30 15:21                               ` Eli Zaretskii
2024-12-30 15:49                                 ` Pip Cet via Emacs development discussions.
2024-12-30 15:53                                   ` João Távora
2024-12-30 16:40                                   ` Eli Zaretskii
2024-12-30 17:25                                     ` Pip Cet via Emacs development discussions.
2024-12-30 18:16                                       ` Eli Zaretskii
2024-12-31  4:00                                         ` Pip Cet via Emacs development discussions.
2024-12-31  5:26                                           ` Stefan Kangas
2024-12-31 13:05                                             ` Eli Zaretskii
2024-12-31 14:14                                               ` Pip Cet via Emacs development discussions.
2024-12-31 14:22                                                 ` Eli Zaretskii
2024-12-31 12:53                                           ` Eli Zaretskii
2024-12-31 14:34                                             ` Andrea Corallo
2024-12-30 18:26                                       ` Andrea Corallo
2024-12-30 18:58                                         ` Stefan Kangas
2024-12-30 21:34                                         ` Pip Cet via Emacs development discussions.
2024-12-31  9:55                                           ` Andrea Corallo
2024-12-31 12:43                                           ` Eli Zaretskii
2024-12-31 14:01                                             ` Pip Cet via Emacs development discussions.
2025-01-04 16:34                                             ` Pip Cet via Emacs development discussions. [this message]
2025-01-04 18:33                                               ` Eli Zaretskii
2025-01-05 10:18                                                 ` Pip Cet via Emacs development discussions.
2025-01-15 22:20                                                   ` Stefan Kangas
2025-01-16  6:42                                                     ` Eli Zaretskii
2025-01-17 13:59                                                       ` Andrea Corallo
2025-01-17 14:37                                                         ` Pip Cet via Emacs development discussions.
2025-01-17 20:48                                                           ` Andrea Corallo
2025-01-17 21:00                                                             ` Pip Cet via Emacs development discussions.
2025-01-18 19:54                                                               ` Andrea Corallo
2025-01-18  5:29                                                             ` Pip Cet via Emacs development discussions.
2025-01-18 20:33                                                               ` Andrea Corallo
2025-01-18 20:52                                                                 ` Pip Cet via Emacs development discussions.
2025-01-06 11:23                                               ` Andrea Corallo
2025-01-06 14:46                                                 ` Eli Zaretskii
2025-01-06 18:41                                                   ` Andrea Corallo
2024-12-15  0:58                   ` Stefan Kangas
2024-12-12 10:42 ` Improving EQ Óscar Fuentes
2024-12-12 10:50   ` Andrea Corallo
2024-12-12 11:21     ` Óscar Fuentes
2024-12-13 12:24       ` Pip Cet via Emacs development discussions.
2024-12-12 17:05     ` Pip Cet via Emacs development discussions.
2024-12-12 18:10     ` John ff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pll2fsj7.fsf@protonmail.com \
    --to=emacs-devel@gnu.org \
    --cc=acorallo@gnu.org \
    --cc=eggert@cs.ucla.edu \
    --cc=eliz@gnu.org \
    --cc=mattiase@acm.org \
    --cc=pipcet@protonmail.com \
    --cc=stefankangas@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.