[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Robust timings in unit tests
From: |
Vadim Zeitlin |
Subject: |
Re: [lmi] Robust timings in unit tests |
Date: |
Mon, 8 May 2017 00:14:21 +0200 |
On Sun, 7 May 2017 17:49:29 +0000 Greg Chicares <address@hidden> wrote:
GC> It would be much more convenient to get a robust measurement in one
GC> step. I summarize my efforts below, mainly to show that everything I
GC> tried doesn't work so that we don't go down this road again.
I'm too drunk with relief currently (sometimes even expected happy
outcomes can be powerfully joyful, especially when the alternative is so
dire) to address all the points, but I'd just like to say one thing:
[...huge snip...]
GC> An argument might be made for reporting the lowest measurement rather
GC> than the mean.
This seems a good argument to me and this is exactly what I do when
measuring CPU-bound code.
GC> Provided that a function takes much longer than the
GC> clock resolution we're using, it seems reasonable to imagine that the
GC> "true" timing is that minimum, and each observation is that "true"
GC> value plus some positive amount of noise, assuming that "noise" can
GC> never be negative--which seems open to question (what if the high-
GC> resolution timer is "noisy"?).
Sorry, I don't see how can this be open to question. In the very worst
case, it might be off by one high-resolution clock cycle, but this is
0 at our time scales anyhow and if you're measuring anything where each
cycle is important, the only choice is to use something like Linux perf or
similar Intel/AMD tools for other OSs.
GC> Yet observe that, in the first of the five sets of results reported
GC> above, the minimum on the third line is almost as far from the
GC> long-term average as the mean or the median.
That's bad luck and just means that each measure needs to be repeated many
times. I typically perform it 20 or at least 10 times, otherwise I know,
from bad past experience, that the results can be misleading, which will
cost more time eventually than time spent on rerunning the test, because
believing that some optimization resulted in a slowdown instead of speedup
or vice versa can be really costly.
But this doesn't negate the fact that looking at the minimal duration of
running something taking a reasonably long (1-10 seconds) amount of time
still seems like the best way to measure macro-performance of CPU bound
code in practice to me.
Regards,
VZ