From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Pip Cet via "Emacs development discussions." Newsgroups: gmane.emacs.devel Subject: Re: New "make benchmark" target Date: Mon, 30 Dec 2024 21:34:55 +0000 Message-ID: <871pxorh30.fsf@protonmail.com> References: <87h679kftn.fsf@protonmail.com> <87frm51jkr.fsf@protonmail.com> <861pxpp88q.fsf@gnu.org> <87frm5z06l.fsf@protonmail.com> <86msgdnqmv.fsf@gnu.org> <87wmfhxjce.fsf@protonmail.com> <86jzbhnmzg.fsf@gnu.org> <87o70txew4.fsf@protonmail.com> Reply-To: Pip Cet Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="31959"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Eli Zaretskii , stefankangas@gmail.com, mattiase@acm.org, eggert@cs.ucla.edu, emacs-devel@gnu.org To: Andrea Corallo Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Dec 31 04:21:51 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tSSpK-00088N-Pb for ged-emacs-devel@m.gmane-mx.org; Tue, 31 Dec 2024 04:21:50 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tSSoZ-00054G-NT; Mon, 30 Dec 2024 22:21:03 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tSNPo-0006RZ-Vq for emacs-devel@gnu.org; Mon, 30 Dec 2024 16:35:09 -0500 Original-Received: from mail-10628.protonmail.ch ([79.135.106.28]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tSNPm-0007d2-MH for emacs-devel@gnu.org; Mon, 30 Dec 2024 16:35:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1735594501; x=1735853701; bh=Q2sgZ7bG4cOwemteTu4UREAods8PxpkNO6y3oOcD6y0=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector:List-Unsubscribe:List-Unsubscribe-Post; b=iamxyhakWxWaKEtlEEWOTQpfB6mIzCSSS6fOj2mRS+UWL86zIP8c4Axd7hLkd6YYI 7NIuYGVR8sv9kBNvtqZomT/M/ddnioxD9DIzFQf4gDjg+5kfUpaJv4utZ3ms3ljGmj aHeLME4PbqzxXsGWfR7UwzxuD20mcIxk9olRd6Mnw9E5U2YXr3AUZpf5mmMhB+wfqi t2cKRGzdHGu2x1YRDHxcUWgsCswrnggFiOXNqpzrO+MVoAsx524saxGNLfULuec28S 4EMSCSL4nR8MxYnkiIoZ/qxtfpJandUsduje0QMovVWQq8JvqaOmIM+ZGBvGaOoha1 CDGwltWmqPuxw== In-Reply-To: Feedback-ID: 112775352:user:proton X-Pm-Message-ID: 78e776d9c36d755c07f7e5c3b224049d95574425 Received-SPF: pass client-ip=79.135.106.28; envelope-from=pipcet@protonmail.com; helo=mail-10628.protonmail.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Mon, 30 Dec 2024 22:21:01 -0500 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:327470 Archived-At: "Andrea Corallo" writes: >> Benchmarking is hard, and I wouldn't have provided this very verbose >> example if I hadn't seen "paradoxical" results that can only be >> explained by such mechanisms. We need to move away from average run >> times either way, and that requires code changes. > > I'm not sure I understand what you mean, if we prefer something like > geo-mean in elisp-beanhcmarks we can change for that, should be easy. In such situations (machines that don't allow reasonable benchmarks; this has become the standard situation for me) I've usually found it necessary to store a bucket histogram (or full history) across many benchmark runs; this clearly allows you to see the different throttling levels as separate peaks. If we must use a single number, we want the fastest actual run; so, in practice, discard a few percentiles to account for possible rare errors. > I'm open to patches to elisp-benchmarks (and to its hypothetical copy in > emacs-core). My opinion that something can potentially be improved in What's the best way to report the need for such improvements? I'm currently aware of four "bugs" we should definitely fix; one of them, ideally, before merging. > it (why not), but I personally ATM don't understand the need for ERT. Let's focus on the basics right now: people know how to write ERT tests. We have hundreds of them. Some of them could be benchmarks, and we want to make that as easy as possible. ERT provides a way to do that, in the same file if we want to: just add a tag. It provides a way to locate and properly identify resources (five "bugs": reusing test A as input for test B means we don't have separation of tests in elisp-benchmarks, and that's something we should strive for). It also allows a third class of tests: stress tests which we want to execute more often than once per test run, which identify occasional failures in code that needs to be executed very often to establish stability (think bug#75105: (cl-random 1.0e+INF) produces an incorrect result once every 8 million runs). IIRC, right now ERT uses ad-hoc loops for such tests, but it'd be nicer to expose the repetition count in the framework (I'm not going to run the non-expensive testsuite on FreeDOS if that means waiting for a million iterations on an emulated machine). (I also think we should introduce an ert-how structure that describes how a test is to be run: do we want to inhibit GC or allow it? Run some warm-up test runs or not? What's the expected time, and when should we time out? We can't run the complete matrix for all tests, so we need some hints in the test, and the lack of a test declaration in elisp-benchmarks hurts us there). Pip