[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: New "make benchmark" target
From: |
Pip Cet |
Subject: |
Re: New "make benchmark" target |
Date: |
Sat, 14 Dec 2024 11:34:19 +0000 |
"Stefan Kangas" <stefankangas@gmail.com> writes:
> Pip Cet <pipcet@protonmail.com> writes:
>
>> I agree a full benchmark suite would be even better, I don't recall why
>> I typed "reduced" there. So let's do that?
>
> Please go ahead, thanks.
Here's my proposal:
Expand ERT to handle the benchmark case. Copy ALL benchmarks from
elisp-benchmarks, but use the ERT framework instead of
elisp-benchmarks.el. Keep things minimal for now, try it out, and add
complexity only if we get the impression this would make a useful
addition; otherwise, revert the changes and go back to using
elisp-benchmarks.el.
This is what would work:
1. add a :benchmark tag to ert tests
2. create a new directory test/benchmark/ for benchmarks
3. modify test/Makefile not to run benchmark tests by default
4. add make targets to run the benchmark tests only
I think the mechanism used by elisp-benchmarks.el to select tests is
very ad-hoc and less powerful than the ert tagging mechanism. It also
doesn't work for me: executing
(progn (elisp-benchmarks-run "" t 1)
(elisp-benchmarks-run "bubble" t 1))
means all tests are run twice, but I intended to run all tests once, as
a warm up, then run only the bubble test again. However, I suspect this
is merely a bug which can easily be fixed (maybe it's intentional,
though?). I'm also seeing problems with a "Window size not as
stipulated by the benchmark" error message, but I'll have to investigate
that one...
The mathematical part of elisp-benchmarks.el is questionable: it's built
around the idea of using an arithmetic average of several test runs
(which is equivalent to a single test run with more iterations); I
believe a median/percentile-based approach to selecting a "typical" run
would yield more useful numbers. So I'm not proposing to reuse the
avg-based code.
I tried to resist the temptation of making ert.el overly general; for
example, instead of defining a new defstruct to determine HOW tests are
run, I merely added a benchmark flag. Maybe we can revisit this if the
approach is adopted and the need for more detailed benchmark
specifications (inhibit GC? warmup? iteration counts? interact with
"perf"?) becomes apparent. However, I did fail and give in to the
temptation to allow an inhibit-gc mode specifier, which should probably
be removed again...
The main problems I see are that "make benchmark" starts emacs instances
for all files in test/, which takes a lot of time (but that's a general
ERT problem that should be fixed for pass-or-fail testing, too).
A minor problem is how to copy the elisp-benchmark tests and keep them
in sync. This would very much depend on how much work Andrea is willing
to do.
Finally, elisp-benchmarks has a very useful feature, somewhat hidden,
that this code lacks: while calculating the arithmetic average of
several benchmark runs isn't useful, calculating the standard deviation
from that average is, because it gives us an indication of how scattered
the results are; scattered test results (i.e. high numbers reported in
the "tot avg err" column) are a sufficient, but not a necessary,
condition for knowing when to discard the test results because something
unexpected happened (most likely system load issues or CPU clock games).
Benchmarking is, of course, very hard. I understand Paul Eggert is using
an ancient machine for benchmarking because it avoids many of the issues
that have arisen in the past decade. With a modern machine, we have a
heterogeneous set of CPU cores (energy/performance), each of which can
be configured in different ways (energy-performance preference for each
core) in addition to running at a variable and unpredictable clock speed
(cpufreq/boost); CPU caches are now large enough to persist across
benchmark runs, and the system memory clock is also variable.
This is a very rough start which would allow us to detect many, but not
all, unexpected catastrophic performance reductions due to code changes.
Finally, if Someone is willing to work on this, we should look into
finding a set of benchmarks representative of "typical" Emacs usage so
we can use PGO when building Emacs. While I'd prefer doing neither of
the two, playing with PGO is a much more promising and maintainable
approach than adding __builtin_expect noise to our C source code.
Pip
>From 4217a5b8f990760775709392b24e0205041cfed3 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@protonmail.com>
Date: Sat, 14 Dec 2024 10:45:42 +0000
Subject: [PATCH 1/3] Add a benchmark from elisp-benchmarks
DO NOT MERGE
FIXME BEFORE MERGING: Should we add a link to
https://git.savannah.gnu.org/gitweb/?p=emacs/elpa.git;a=history;\
f=benchmarks/bubble.el;h=d7101b1b99b60a3bd6945909d1f0125215f4ce1c;\
hb=refs/heads/externals/elisp-benchmarks
here? Losing git history because we copy a file from elpa to emacs
seems suboptimal...
* test/benchmark/bubble.el: New file.
---
test/benchmark/bubble.el | 52 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
create mode 100644 test/benchmark/bubble.el
diff --git a/test/benchmark/bubble.el b/test/benchmark/bubble.el
new file mode 100644
index 00000000000..0c38cdbce39
--- /dev/null
+++ b/test/benchmark/bubble.el
@@ -0,0 +1,52 @@
+;; -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2019 Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+
+;; From:
+;; https://www.emacswiki.org/emacs/EmacsLispBenchmark
+
+(require 'ert)
+
+(require 'cl-lib)
+
+(defvar elb-bubble-len 1000)
+(defvar elb-bubble-list (mapcar #'random (make-list elb-bubble-len
+ most-positive-fixnum)))
+
+(defun elb-bubble (list)
+ (let ((i (length list)))
+ (while (> i 1)
+ (let ((b list))
+ (while (cdr b)
+ (when (< (cadr b) (car b))
+ (setcar b (prog1 (cadr b)
+ (setcdr b (cons (car b) (cddr b))))))
+ (setq b (cdr b))))
+ (setq i (1- i)))
+ list))
+
+(defun elb-bubble-entry ()
+ (cl-loop repeat 100
+ for l = (copy-sequence elb-bubble-list)
+ do (elb-bubble l)))
+
+(ert-deftest benchmark-bubble ()
+ :tags '(:benchmark)
+ (elb-bubble-entry))
--
2.47.0
>From df31e19452dff0fe804af2fd3c73f4cee84b6d16 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@protonmail.com>
Date: Sat, 14 Dec 2024 10:56:19 +0000
Subject: [PATCH 2/3] Expand the ERT framework to allow for benchmarks
* lisp/emacs-lisp/ert.el (ert-test-result, ert--test-execution-info):
Expand structs to include "benchmark" field.
(ert--run-benchmark-test-internal): New function.
(ert-run-test, ert-run-or-rerun-test):
(ert-run-tests-batch-and-exit):
(ert-run-tests): Add benchmark argument.
(ert-run-tests-batch): Include GC information when running benchmarks.
(ert-summarize-tests-batch-and-exit): Handle 1.0e+INF.
---
lisp/emacs-lisp/ert.el | 91 +++++++++++++++++++++++++++++++++---------
1 file changed, 73 insertions(+), 18 deletions(-)
diff --git a/lisp/emacs-lisp/ert.el b/lisp/emacs-lisp/ert.el
index 97aa233f6e2..76365ed8152 100644
--- a/lisp/emacs-lisp/ert.el
+++ b/lisp/emacs-lisp/ert.el
@@ -698,6 +698,7 @@ ert-test-result
(messages nil)
(should-forms nil)
(duration 0)
+ (benchmark nil)
)
(cl-defstruct (ert-test-passed (:include ert-test-result)))
(cl-defstruct (ert-test-result-with-condition (:include ert-test-result))
@@ -723,7 +724,8 @@ ert--test-execution-info
;; execution of the current test. We store it to avoid being
;; affected by any new bindings the test itself may establish. (I
;; don't remember whether this feature is important.)
- ert-debug-on-error)
+ ert-debug-on-error
+ benchmark)
(defun ert--run-test-debugger (info condition debugfun)
"Error handler used during the test run.
@@ -801,6 +803,39 @@ ert--run-test-internal
(make-ert-test-passed))
nil)
+(defun ert--run-benchmark-test-internal (test-execution-info benchmark)
+ (setf (ert--test-execution-info-ert-debug-on-error test-execution-info)
+ ert-debug-on-error)
+ (catch 'ert--pass
+ ;; For now, each test gets its own temp buffer and its own
+ ;; window excursion, just to be safe. If this turns out to be
+ ;; too expensive, we can remove it.
+ (with-temp-buffer
+ (save-window-excursion
+ (let ((lexical-binding t) ;;FIXME: Why?
+ (ert--infos '())
+ time)
+ (letrec ((debugfun (lambda (err)
+ (ert--run-test-debugger test-execution-info
+ err debugfun))))
+ (handler-bind (((error quit) debugfun))
+ (garbage-collect)
+ (let ((gc-cons-threshold (if (eq benchmark 'inhibit-gc)
+ most-positive-fixnum
+ gc-cons-threshold)))
+ (setq time (benchmark-run nil
+ (funcall (ert-test-body
(ert--test-execution-info-test
+ test-execution-info))))))
+ (and (eq benchmark 'inhibit-gc)
+ (not (= (cadr time) 0))
+ (warn "failed to inhibit gc; time %S" time))
+ (setf (ert--test-execution-info-benchmark test-execution-info)
+ time))))))
+ (ert-pass))
+ (setf (ert--test-execution-info-result test-execution-info)
+ (make-ert-test-passed))
+ nil)
+
(defun ert--force-message-log-buffer-truncation ()
"Immediately truncate *Messages* buffer according to `message-log-max'.
@@ -832,7 +867,7 @@ ert--running-tests
The elements are of type `ert-test'.")
-(defun ert-run-test (ert-test)
+(defun ert-run-test (ert-test &optional benchmark)
"Run ERT-TEST.
Returns the result and stores it in ERT-TEST's `most-recent-result' slot."
@@ -855,8 +890,12 @@ ert-run-test
(push form-description should-form-accu)))
(message-log-max t)
(ert--running-tests (cons ert-test ert--running-tests)))
- (ert--run-test-internal info))
+ (if benchmark
+ (ert--run-benchmark-test-internal info benchmark)
+ (ert--run-test-internal info)))
(let ((result (ert--test-execution-info-result info)))
+ (setf (ert-test-result-benchmark result)
+ (ert--test-execution-info-benchmark info))
(setf (ert-test-result-messages result)
(with-current-buffer (messages-buffer)
(buffer-substring begin-marker (point-max))))
@@ -1206,7 +1245,7 @@ ert--make-stats
:test-start-times (make-vector (length tests) nil)
:test-end-times (make-vector (length tests) nil))))
-(defun ert-run-or-rerun-test (stats test listener)
+(defun ert-run-or-rerun-test (stats test listener &optional benchmark)
;; checkdoc-order: nil
"Run the single test TEST and record the result using STATS and LISTENER."
(let ((ert--current-run-stats stats)
@@ -1221,19 +1260,26 @@ ert-run-or-rerun-test
(setf (ert-test-most-recent-result test) nil)
(setf (aref (ert--stats-test-start-times stats) pos) (current-time))
(unwind-protect
- (ert-run-test test)
+ (ert-run-test test benchmark)
(setf (aref (ert--stats-test-end-times stats) pos) (current-time))
(let ((result (ert-test-most-recent-result test)))
- (setf (ert-test-result-duration result)
- (float-time
- (time-subtract
- (aref (ert--stats-test-end-times stats) pos)
- (aref (ert--stats-test-start-times stats) pos))))
+ (cond ((ert-test-result-benchmark result)
+ (setf (ert-test-result-duration result)
+ (if (memq benchmark '(no-gc inhibit-gc))
+ (- (car (ert-test-result-benchmark result))
+ (caddr (ert-test-result-benchmark result)))
+ (car (ert-test-result-benchmark result)))))
+ (t
+ (setf (ert-test-result-duration result)
+ (float-time
+ (time-subtract
+ (aref (ert--stats-test-end-times stats) pos)
+ (aref (ert--stats-test-start-times stats) pos))))))
(ert--stats-set-test-and-result stats pos test result)
(funcall listener 'test-ended stats test result))
(setf (ert--stats-current-test stats) nil))))
-(defun ert-run-tests (selector listener &optional interactively)
+(defun ert-run-tests (selector listener &optional interactively benchmark)
"Run the tests specified by SELECTOR, sending progress updates to LISTENER."
(let* ((tests (ert-select-tests selector t))
(stats (ert--make-stats tests selector)))
@@ -1245,7 +1291,7 @@ ert-run-tests
(force-mode-line-update)
(unwind-protect
(cl-loop for test in tests do
- (ert-run-or-rerun-test stats test listener)
+ (ert-run-or-rerun-test stats test listener benchmark)
(when (and interactively
(ert-test-quit-p
(ert-test-most-recent-result test))
@@ -1367,7 +1413,7 @@ ert-batch-backtrace-right-margin
"The maximum line length for printing backtraces in `ert-run-tests-batch'.")
;;;###autoload
-(defun ert-run-tests-batch (&optional selector)
+(defun ert-run-tests-batch (&optional selector benchmark)
"Run the tests specified by SELECTOR, printing results to the terminal.
SELECTOR selects which tests to run as described in `ert-select-tests' when
@@ -1493,7 +1539,7 @@ ert-run-tests-batch
(let* ((max (prin1-to-string (length (ert--stats-tests stats))))
(format-string (concat "%9s %"
(prin1-to-string (length max))
- "s/" max " %S (%f sec)%s")))
+ "s/" max " %S (%f sec%s)%s")))
(message format-string
(ert-string-for-test-result result
(ert-test-result-expected-p
@@ -1501,13 +1547,19 @@ ert-run-tests-batch
(1+ (ert--stats-test-pos stats test))
(ert-test-name test)
(ert-test-result-duration result)
+ (if (ert-test-result-benchmark result)
+ (format ", %f sec in GC, %d GCs"
+ (caddr (ert-test-result-benchmark result))
+ (cadr (ert-test-result-benchmark result)))
+ "")
(if (ert-test-result-expected-p test result)
""
(concat " " (ert-test-location test))))))))))
- nil))
+ nil
+ benchmark))
;;;###autoload
-(defun ert-run-tests-batch-and-exit (&optional selector)
+(defun ert-run-tests-batch-and-exit (&optional selector benchmark)
"Like `ert-run-tests-batch', but exits Emacs when done.
The exit status will be 0 if all test results were as expected, 1
@@ -1525,7 +1577,7 @@ ert-run-tests-batch-and-exit
(setq attempt-stack-overflow-recovery nil
attempt-orderly-shutdown-on-fatal-signal nil)
(unwind-protect
- (let ((stats (ert-run-tests-batch selector)))
+ (let ((stats (ert-run-tests-batch selector benchmark)))
(when eln-dir
(ignore-errors
(delete-directory eln-dir t)))
@@ -1726,7 +1778,10 @@ ert-summarize-tests-batch-and-exit
If HIGH is a natural number, the HIGH long lasting tests are summarized."
(or noninteractive
(user-error "This function is only for use in batch mode"))
- (or (natnump high) (setq high 0))
+ (cond
+ ;; FIXME: ntake doesn't allow an infinity argument
+ ((eql high 1.0e+INF) (setq high most-positive-fixnum))
+ ((not (natnump high)) (setq high 0)))
;; Better crash loudly than attempting to recover from undefined
;; behavior.
(setq attempt-stack-overflow-recovery nil
--
2.47.0
>From 8cd4053967a0aa6521039ba887c911daa13b0cf0 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@protonmail.com>
Date: Sat, 14 Dec 2024 10:59:38 +0000
Subject: [PATCH 3/3] Add "make benchmark" rule
* Makefile.in (benchmark): New recipe.
* test/Makefile.in (SELECTOR_BENCHMARK): New selector.
(SELECTOR_ALL, SELECTOR_EXPENSIVE, SELECTOR_DEFAULT): Modify selectors
not to include benchmark tests.
(check-benchmark): New recipe.
---
Makefile.in | 7 +++++++
test/Makefile.in | 25 +++++++++++++++++--------
2 files changed, 24 insertions(+), 8 deletions(-)
diff --git a/Makefile.in b/Makefile.in
index 30a762ed03b..13a55452da2 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -69,6 +69,9 @@
# check-expensive includes additional tests that can be slow.
# check-all runs all tests, including ones that can be slow, or
# fail unpredictably
+#
+# make benchmark
+# Run the Emacs benchmark suite.
SHELL = @SHELL@
@@ -1138,6 +1141,10 @@ $(CHECK_TARGETS):
test/%:
$(MAKE) -C test $*
+BENCHMARK_TARGETS = benchmark
+.PHONY: $(BENCHMARK_TARGETS)
+$(BENCHMARK_TARGETS): all
+ $(MAKE) SUMMARIZE_TESTS="1.0e+INF" BENCHMARKP="t" -C test
check-benchmark
dist:
cd ${srcdir}; ./make-dist
diff --git a/test/Makefile.in b/test/Makefile.in
index 7a3178546a1..18a478b3e6c 100644
--- a/test/Makefile.in
+++ b/test/Makefile.in
@@ -78,9 +78,9 @@ TEST_TIMEOUT =
TEST_INTERACTIVE ?= no
ifeq ($(TEST_INTERACTIVE),yes)
-TEST_RUN_ERT = --eval '(ert (quote ${SELECTOR_ACTUAL}))'
+TEST_RUN_ERT = --eval '(ert (quote ${SELECTOR_ACTUAL}) ${BENCHMARKP})'
else
-TEST_RUN_ERT = --batch --eval '(ert-run-tests-batch-and-exit (quote
${SELECTOR_ACTUAL}))' ${WRITE_LOG}
+TEST_RUN_ERT = --batch --eval '(ert-run-tests-batch-and-exit (quote
${SELECTOR_ACTUAL}) ${BENCHMARKP})' ${WRITE_LOG}
endif
# Whether to run tests from .el files in preference to .elc, we do
@@ -136,13 +136,15 @@ TEST_NATIVE_COMP =
TEST_NATIVE_COMP = no
endif
ifeq ($(TEST_NATIVE_COMP),yes)
-SELECTOR_DEFAULT = (not (or (tag :expensive-test) (tag :unstable)))
-SELECTOR_EXPENSIVE = (not (tag :unstable))
-SELECTOR_ALL = t
+SELECTOR_DEFAULT = (not (or (tag :expensive-test) (tag :unstable) (tag
:benchmark)))
+SELECTOR_EXPENSIVE = (not (or (tag :unstable) (tag :benchmark)))
+SELECTOR_ALL = (not (tag :benchmark))
+SELECTOR_BENCHMARK = (tag :benchmark)
else
-SELECTOR_DEFAULT = (not (or (tag :expensive-test) (tag :unstable) (tag
:nativecomp)))
-SELECTOR_EXPENSIVE = (not (or (tag :unstable) (tag :nativecomp)))
-SELECTOR_ALL = (not (tag :nativecomp))
+SELECTOR_DEFAULT = (not (or (tag :expensive-test) (tag :unstable) (tag
:nativecomp) (tag :benchmark)))
+SELECTOR_EXPENSIVE = (not (or (tag :unstable) (tag :nativecomp) (tag
:benchmark)))
+SELECTOR_ALL = (not (or (tag :nativecomp) (tag :benchmark)))
+SELECTOR_BENCHMARK = (and (tag :benchmark) (not (tag :nativecomp)))
endif
ifdef SELECTOR
SELECTOR_ACTUAL=$(SELECTOR)
@@ -154,6 +156,8 @@ SELECTOR_ACTUAL=
SELECTOR_ACTUAL=$(SELECTOR_DEFAULT)
else ifeq ($(MAKECMDGOALS),check-maybe)
SELECTOR_ACTUAL=$(SELECTOR_DEFAULT)
+else ifeq ($(MAKECMDGOALS),check-benchmark)
+SELECTOR_ACTUAL=$(SELECTOR_BENCHMARK)
else
SELECTOR_ACTUAL=$(SELECTOR_EXPENSIVE)
endif
@@ -323,6 +327,11 @@ .PHONY:
check-all: mostlyclean check-no-automated-subdir
@${MAKE} check-doit SELECTOR="${SELECTOR_ALL}"
+## Run all benchmark tests, regardless of tag.
+.PHONY: check-benchmark
+check-benchmark: mostlyclean check-no-automated-subdir
+ @${MAKE} check-doit SELECTOR="${SELECTOR_BENCHMARK}"
+
## Re-run all tests which are outdated. A test is outdated if its
## logfile is out-of-date with either the test file, or the source
## files that the tests depend on. See test_template.
--
2.47.0
- Re: Improving EQ, (continued)
- New "make benchmark" target, Stefan Kangas, 2024/12/12
- Re: New "make benchmark" target, Andrea Corallo, 2024/12/12
- Re: New "make benchmark" target, Pip Cet, 2024/12/12
- Re: New "make benchmark" target, Stefan Kangas, 2024/12/12
- Re: New "make benchmark" target, Andrea Corallo, 2024/12/13
- Re: New "make benchmark" target, Stefan Kangas, 2024/12/14
- Re: New "make benchmark" target, Stefan Monnier, 2024/12/14
- Re: New "make benchmark" target,
Pip Cet <=
- Re: New "make benchmark" target, Stefan Kangas, 2024/12/14
- Message not available
- Re: New "make benchmark" target, João Távora, 2024/12/14
Re: Improving EQ, Óscar Fuentes, 2024/12/12