emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New "make benchmark" target


From: Pip Cet
Subject: Re: New "make benchmark" target
Date: Mon, 30 Dec 2024 21:34:55 +0000

"Andrea Corallo" <acorallo@gnu.org> writes:
>> Benchmarking is hard, and I wouldn't have provided this very verbose
>> example if I hadn't seen "paradoxical" results that can only be
>> explained by such mechanisms.  We need to move away from average run
>> times either way, and that requires code changes.
>
> I'm not sure I understand what you mean, if we prefer something like
> geo-mean in elisp-beanhcmarks we can change for that, should be easy.

In such situations (machines that don't allow reasonable benchmarks;
this has become the standard situation for me) I've usually found it
necessary to store a bucket histogram (or full history) across many
benchmark runs; this clearly allows you to see the different throttling
levels as separate peaks.  If we must use a single number, we want the
fastest actual run; so, in practice, discard a few percentiles to
account for possible rare errors.

> I'm open to patches to elisp-benchmarks (and to its hypothetical copy in
> emacs-core).  My opinion that something can potentially be improved in

What's the best way to report the need for such improvements?  I'm
currently aware of four "bugs" we should definitely fix; one of them,
ideally, before merging.

> it (why not), but I personally ATM don't understand the need for ERT.

Let's focus on the basics right now: people know how to write ERT tests.
We have hundreds of them.  Some of them could be benchmarks, and we want
to make that as easy as possible.

ERT provides a way to do that, in the same file if we want to: just add
a tag.

It provides a way to locate and properly identify resources (five
"bugs": reusing test A as input for test B means we don't have
separation of tests in elisp-benchmarks, and that's something we should
strive for).

It also allows a third class of tests: stress tests which we want to
execute more often than once per test run, which identify occasional
failures in code that needs to be executed very often to establish
stability (think bug#75105: (cl-random 1.0e+INF) produces an incorrect
result once every 8 million runs).  IIRC, right now ERT uses ad-hoc
loops for such tests, but it'd be nicer to expose the repetition count
in the framework (I'm not going to run the non-expensive testsuite on
FreeDOS if that means waiting for a million iterations on an emulated
machine).

(I also think we should introduce an ert-how structure that describes how
a test is to be run: do we want to inhibit GC or allow it?  Run some
warm-up test runs or not? What's the expected time, and when should we
time out? We can't run the complete matrix for all tests, so we need
some hints in the test, and the lack of a test declaration in
elisp-benchmarks hurts us there).

Pip




reply via email to

[Prev in Thread] Current Thread [Next in Thread]