From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Andrea Corallo Newsgroups: gmane.emacs.devel Subject: Re: New "make benchmark" target Date: Mon, 30 Dec 2024 13:26:28 -0500 Message-ID: References: <87h679kftn.fsf@protonmail.com> <87y107g0xc.fsf@protonmail.com> <87frm51jkr.fsf@protonmail.com> <861pxpp88q.fsf@gnu.org> <87frm5z06l.fsf@protonmail.com> <86msgdnqmv.fsf@gnu.org> <87wmfhxjce.fsf@protonmail.com> <86jzbhnmzg.fsf@gnu.org> <87o70txew4.fsf@protonmail.com> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="19383"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: Eli Zaretskii , stefankangas@gmail.com, mattiase@acm.org, eggert@cs.ucla.edu, emacs-devel@gnu.org To: Pip Cet Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Dec 30 19:26:54 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tSKTe-0004rv-80 for ged-emacs-devel@m.gmane-mx.org; Mon, 30 Dec 2024 19:26:54 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tSKTM-0002mt-4r; Mon, 30 Dec 2024 13:26:36 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tSKTJ-0002kK-MT for emacs-devel@gnu.org; Mon, 30 Dec 2024 13:26:33 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tSKTI-0004Bp-99; Mon, 30 Dec 2024 13:26:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:Date:References:In-Reply-To:Subject:To: From; bh=s7u4Lu3dvNWaFFaqb3ZKd3DaLzgi5TX2ppX+KMbImKU=; b=XF4XE/h7D/NoOxkP+vP0 meVhKfoDdmwnEOI8TerxIU5Mt6iSaGfFgmJ4wzj6eF0czKMds/K2o5LcH993h6B5WulFOpeACQVRX fWf74PpJS6Jhs3uWAFsSccVAOL3ZDrl7itXvqEFi+DQcIwugE5KXJ9AgZ7xI9wQHU/9ScgC1AHEWh 1YfsiPBx8oKNcmTXC00p1ctQszh8DkAmKLh70cXiQt04R2hwhVEJ6bUYZ+JHcypbkOgd11K3x6PBU lKk7y1quRzkvZIvUGUlsJRRvJvlKE4KhekbJw5b8acjW8zMUEbuEAvRlG18KfkZ6e11ZkQ90UEO9m jmsOvynstCbl2w==; Original-Received: from acorallo by fencepost.gnu.org with local (Exim 4.90_1) (envelope-from ) id 1tSKTF-00069Y-22; Mon, 30 Dec 2024 13:26:30 -0500 In-Reply-To: <87o70txew4.fsf@protonmail.com> (Pip Cet's message of "Mon, 30 Dec 2024 17:25:44 +0000") X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:327460 Archived-At: Pip Cet writes: > "Eli Zaretskii" writes: > > Top-posted TL;DR: let's call Andrea's code "make elisp-benchmarks" and > include it now? That would preserve the Git history and importantly (to > me) reserve the name for now. > >>> Date: Mon, 30 Dec 2024 15:49:30 +0000 >>> From: Pip Cet >>> Cc: acorallo@gnu.org, stefankangas@gmail.com, mattiase@acm.org, eggert@cs.ucla.edu, emacs-devel@gnu.org, joaotavora@gmail.com >>> >>> >> https://lists.gnu.org/archive/html/emacs-devel/2024-12/msg00595.html >>> > >>> > Thanks, but AFAICT this just says that you intended to use/extend ERT >>> > to run this benchmark suite, but doesn't explain why you think using >>> > ERT would be an advantage worthy of keeping. >>> >>> I think some advantages are stated in that email: the ERT tagging >>> mechanism is more general, works, and can be extended (I describe one >>> such extension). All that isn't currently true for elisp-benchmarks. >> >> Unlike the rest of the test suite, where we need a way to be able to >> run individual tests, a benchmark suite is much more likely to run as >> a whole, because benchmarking a single kind of jobs in Emacs is much >> less useful than producing a benchmark of a representative sample of >> jobs. So I'm not sure this particular aspect is such a serious > > Not my experience. Running the entire suite is much more likely not to > produce usable data due to such issues as CPU thermal management (for > example: the first few tests are run at full clock speed and heat up the > system so much that thermal throttling is activated; the next few tests > are run at a reduced rate while the fan is running; eventually we run > out of amperes that we're allowed to drain the battery by and reduce > clock speed even further; this results in reduced temperature, so the > fan speed is reduced, which means we will eventually decide to try a > higher clock speed again, which will work for a while only before > repeating the cycle. The whole thing will appear regular enough we > won't notice the data is bad, but it will be, until we rerun the test on > the same system in a different room and get wildly different results). > A single-second test run in a loop produces the occasional mid-stream > result which is actually useful (and promptly lost to the averaging > mechanism of elisp-benchmarks). Yes, elisp-benchmark is running all the selected benchmarks at each iteration, so that a single one cannot take advantaged of the initial cool CPU state. If unstable throttling on a specific system is a problem this will show up as computed error for that test. If a system is throttling the right (and only) thing to do is to measure it, this is in my experience what benchmarks do. That said tipically Eli is right, the typical use of a benchmark suite is to run it as a whole and look at the total results, this indeed accounts for avg throttling as well. > Benchmarking is hard, and I wouldn't have provided this very verbose > example if I hadn't seen "paradoxical" results that can only be > explained by such mechanisms. We need to move away from average run > times either way, and that requires code changes. I'm not sure I understand what you mean, if we prefer something like geo-mean in elisp-beanhcmarks we can change for that, should be easy. I'm open to patches to elisp-benchmarks (and to its hypothetical copy in emacs-core). My opinion that something can potentially be improved in it (why not), but I personally ATM don't understand the need for ERT.