emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New "make benchmark" target


From: Eli Zaretskii
Subject: Re: New "make benchmark" target
Date: Mon, 06 Jan 2025 16:46:15 +0200

> From: Andrea Corallo <acorallo@gnu.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  stefankangas@gmail.com,
>   mattiase@acm.org,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
> Date: Mon, 06 Jan 2025 06:23:22 -0500
> 
> Pip Cet <pipcet@protonmail.com> writes:
> 
> > In particular, as you (Andrea) correctly pointed out, it is sometimes
> > appropriate to use an average run time (or, non-equivalently, an average
> > speed) for reporting test results; the assumptions needed for this are
> > very significant and need to be spelled out explicitly.  The vast
> > majority of "make benchmark" uses which I think should happen cannot
> > meet these stringent requirements.
> >
> > To put things simply, it is better to discard outliers (test runs which
> > take significantly longer than the rest).  Averaging doesn't do that: it
> > simply ruins your entire test run if there is a significant outlier.
> > IOW, running the benchmarks with a large repetition count is very likely
> > to result in useful data being discarded, and a useless result.
> 
> As mentioned, I disagree with having some logic put in place to
> arbitrarily decide which value is worth to be considered and which value
> should be discarded.  If a system is producing noisy measures this has
> to be reported as error of the measure.  Those numbers are there for
> some real reason and have to be accounted.

Without too deep understanding of the underlying issue: IME, if some
sample can include outliers, it is always better to use robust
estimators, rather than attempt to detect and discard outliers.
That's because detection of outliers can decide that a valid
measurement is an outlier, and then the estimation becomes biased.

In practical terms, for estimating the mean, I can suggest to use the
sample median instead of the sample average.  The median is very
robust to outliers, and only slightly less efficient (i.e., converges
a bit slower) than the sample average.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]