From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Andrea Corallo Newsgroups: gmane.emacs.devel Subject: Re: New "make benchmark" target Date: Mon, 06 Jan 2025 13:41:55 -0500 Message-ID: References: <87h679kftn.fsf@protonmail.com> <87frm5z06l.fsf@protonmail.com> <86msgdnqmv.fsf@gnu.org> <87wmfhxjce.fsf@protonmail.com> <86jzbhnmzg.fsf@gnu.org> <87o70txew4.fsf@protonmail.com> <871pxorh30.fsf@protonmail.com> <86wmfgm3a5.fsf@gnu.org> <87pll2fsj7.fsf@protonmail.com> <86ikqs57bc.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="40877"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: pipcet@protonmail.com, stefankangas@gmail.com, mattiase@acm.org, eggert@cs.ucla.edu, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Jan 06 19:43:42 2025 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tUs4j-000AQb-Gd for ged-emacs-devel@m.gmane-mx.org; Mon, 06 Jan 2025 19:43:41 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tUs3w-0007E2-Hf; Mon, 06 Jan 2025 13:42:52 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tUs3u-0007CW-PF for emacs-devel@gnu.org; Mon, 06 Jan 2025 13:42:51 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tUs3s-0003kV-SD; Mon, 06 Jan 2025 13:42:48 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:Date:References:In-Reply-To:Subject:To: From; bh=zIThMRArHeG+Xq7suJP858PtKAXq682DhOC99qMvd3w=; b=pKTxWmSQtCpiiB5/6z1M pdb+m1W9Ys6MzFOQksHFZcabcqVaSBmYjQH7kzPntEK+wwiX+UCfC5GIN9o2+DMHXJE0iMxR+Wlu5 JXD4bSU5m5LjUuquRs1w8bT9JKOaYpa/O1/CnvW7RAxX5t6apnZoAWnWL4Dhf0SdsXG/IJQAKIVxu vizSh//VcJJRZlxAxKNqgyt3kE4qCgRhJbg48AdKiPCQaMM+yQCUKy/OLTJ9Q4XbFyfmWDiiM4PtZ CjRuMmET9dvPOXQbSsvnIKMp7nYby5ydcuYOKTUj2Pg7Jtq+jUFCjCJ280rzxWC0K3H1frFbVJeOU 9D6AYxUc/9eROQ==; Original-Received: from acorallo by fencepost.gnu.org with local (Exim 4.90_1) (envelope-from ) id 1tUs32-0004l9-Bo; Mon, 06 Jan 2025 13:42:28 -0500 In-Reply-To: <86ikqs57bc.fsf@gnu.org> (Eli Zaretskii's message of "Mon, 06 Jan 2025 16:46:15 +0200") X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:327747 Archived-At: Eli Zaretskii writes: >> From: Andrea Corallo >> Cc: Eli Zaretskii , stefankangas@gmail.com, >> mattiase@acm.org, eggert@cs.ucla.edu, emacs-devel@gnu.org >> Date: Mon, 06 Jan 2025 06:23:22 -0500 >> >> Pip Cet writes: >> >> > In particular, as you (Andrea) correctly pointed out, it is sometimes >> > appropriate to use an average run time (or, non-equivalently, an average >> > speed) for reporting test results; the assumptions needed for this are >> > very significant and need to be spelled out explicitly. The vast >> > majority of "make benchmark" uses which I think should happen cannot >> > meet these stringent requirements. >> > >> > To put things simply, it is better to discard outliers (test runs which >> > take significantly longer than the rest). Averaging doesn't do that: it >> > simply ruins your entire test run if there is a significant outlier. >> > IOW, running the benchmarks with a large repetition count is very likely >> > to result in useful data being discarded, and a useless result. >> >> As mentioned, I disagree with having some logic put in place to >> arbitrarily decide which value is worth to be considered and which value >> should be discarded. If a system is producing noisy measures this has >> to be reported as error of the measure. Those numbers are there for >> some real reason and have to be accounted. > > Without too deep understanding of the underlying issue: IME, if some > sample can include outliers, it is always better to use robust > estimators, rather than attempt to detect and discard outliers. > That's because detection of outliers can decide that a valid > measurement is an outlier, and then the estimation becomes biased. 100% agreed > In practical terms, for estimating the mean, I can suggest to use the > sample median instead of the sample average. The median is very > robust to outliers, and only slightly less efficient (i.e., converges > a bit slower) than the sample average. For my experience benchmarks typically use geo-mean, there's quite some info around on why is that, ex [1]. The use of arithmetic mean in elisp-benchmarks is an error of youth (I'm responsible of) which I think should be fixed. Andrea [1]