[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Robust timings in unit tests
From: |
Greg Chicares |
Subject: |
Re: [lmi] Robust timings in unit tests |
Date: |
Thu, 11 May 2017 14:57:18 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 |
On 2017-05-07 22:14, Vadim Zeitlin wrote:
> On Sun, 7 May 2017 17:49:29 +0000 Greg Chicares <address@hidden> wrote:
[...]
> GC> An argument might be made for reporting the lowest measurement rather
> GC> than the mean.
>
> This seems a good argument to me and this is exactly what I do when
> measuring CPU-bound code.
Commit 1a629bf changed AliquotTimer so that it reports the minimum.
Here are some results. First, 'timer_test' for two architectures,
tabulating the first three lines, which measure
- inline void do_nothing() {}
- a hundred logarithm calculations written to volatile storage
- ten calls to that hundred-logarithm function
min mean min mean min mean
i686-w64-mingw32
300 353 8500 8587 41400 64280
300 336 8500 8579 41400 65340
300 312 8500 8564 41400 50950
300 337 8500 8776 82800 83060
300 338 8500 8581 41400 51390
300 332 8500 9117 41400 60440
300 336 8500 8579 41400 77610
100 175 4200 4288 41400 41770
100 299 4200 4288 41400 41570
300 312 8500 8581 82800 83160
x86_64-linux-gnu
0 353 4000 8587 40000 64280
0 336 8000 8579 81000 65340
0 312 4000 8564 40000 50950
0 337 4000 8776 40000 83060
0 338 8000 8581 40000 51390
0 332 8000 9117 40000 60440
0 336 4000 8579 40000 77610
0 175 4000 4288 40000 41770
0 299 4000 4288 40000 41570
0 312 8000 8581 40000 83160
At first, I questioned whether this change is really an improvement.
First of all, precision has apparently been lost; but it was false
precision, because nanosecond measurements were being reported for
timers with a resolution of about a millisecond. Second, the middle
columns of the msw results suggest that the minimum and mean aren't
really much different; but the GNU/Linux results don't. Third, the left
column of the GNU/Linux results is uniformly zero, which just feels
wrong, because the mean seemed to report information while the minimum
doesn't; but the question actually posed is how long it takes to do
nothing on a multiple-GHz machine, measured to the nearest microsecond,
and the answer really is zero: the zero minimum is information, but the
reported mean is noise.
I was especially reluctant to give up the "Third" illusion above for
'expression_template_0_test', because I had once set great store by its
results, though that was probably with gcc-3.0 on a one-core 3.5 GHz
CPU...because now there's no difference to measure on x64_86-linux-gnu
except for the "STL" methods that are known to be awful:
Speed tests: array length 1000
C : 9.877e-07 s mean; 0 us least of 10125 runs
STL plain : 4.006e-06 s mean; 3 us least of 2496 runs
STL fancy : 1.462e-06 s mean; 1 us least of 6840 runs
valarray : 9.952e-07 s mean; 0 us least of 10048 runs
uBLAS : 1.020e-06 s mean; 0 us least of 9801 runs
PETE : 8.727e-07 s mean; 0 us least of 11459 runs
...and the corresponding results for array lengths of {1, 10, 100} just
look ridiculous now: they're all zero. As seen above, the mean is still
reported (it costs nothing) along with the minimum (which is expressed
now in microseconds, not nanoseconds).