[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Parallel blues
From: |
Greg Chicares |
Subject: |
Re: [lmi] Parallel blues |
Date: |
Tue, 20 Jun 2017 23:10:03 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 |
On 2017-06-20 17:09, Vadim Zeitlin wrote:
> On Tue, 20 Jun 2017 16:01:08 +0000 Greg Chicares <address@hidden> wrote:
>
> GC> On 2016-07-18 19:55, Vadim Zeitlin wrote:
> GC> >
> GC> > So I tried to parallelize test_coding_rules. As expected, doing it
> turned
> GC> > out to be very simple and it took me about 30 minutes to write the code
> GC>
> GC> I'd like to do something about calendar_date_test, because it takes
> GC> eleven seconds to run,
>
> This surprised me, as I've never noticed it taking so long. So I've just
> rerun it on my (6+ years old) i7-2600 Linux box and it took 3.4 seconds
> here. I might have an answer explaining why is it so much slower for you,
> even though your machine is supposed to be much faster than mine, below...
At first I thought you must be using a different compiler. But I
happen to have a native ELF version already built:
file ../build/lmi/Linux/gcc/native/calendar_date_test
../build/lmi/Linux/gcc/native/calendar_date_test: ELF 64-bit LSB executable,
x86-64, version 1 (GNU/Linux), dynamically linked, interpreter
/lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32,
BuildID[sha1]=49585e26624344a5864797b78faa16f9edbd9e91, not stripped
time ../build/lmi/Linux/gcc/native/calendar_date_test -a
...
.... 109927 tests succeeded
!!!! no errors detected
../build/lmi/Linux/gcc/native/calendar_date_test -a 3.96s user 0.02s system
99% cpu 3.985 total
...so I guess the timing you report might be for gcc after all.
I'm not sure my machine is at all faster. I have two "E5-2630 v3"
CPUs, but for this measurement it wouldn't matter if I had 2^10
CPUs because we're comparing single-core performance--for which
this page:
http://cpuboss.com/cpus/Intel-Xeon-E5-2630-v3-vs-Intel-Core-i7-2600
rates your machine 9.6, and mine 6.7 .
> GC> while no other i686 lmi unit test takes more
> GC> than six seconds...and, since I run them in parallel, if I could run
> GC> this one test in half the time, then I could run the whole unit-test
> GC> suite in half the time. I'm considering two approaches, and it seems
> GC> like one must be far better than the other, but I'm not sure which
> GC> is which, and hope you have a clearer opinion than I.
>
> If there is one thing I've never lacked, it's clear, strong (and some
> would say mistaken) opinions about a lot of things. And in this case I
> definitely have one: I prefer (1). But see below.
Thanks. Option (2) seemed so 1990s.
> GC> This would be my first attempt at writing multithreaded code
> GC> myself, so it has great educational value.
>
> I started writing that I was afraid it was too simple for this and even
> wrote some trivial code parallelizing this test, here it is with "-w" diff
> option to suppress changes in indentation that would normally go with it:
[...quoting just one part to highlight how easy it is...]
> + std::thread t2([]()
> + {
> TestBirthdateLimitsExhaustively(oe_age_last_birthday);
> + });
> As you can see, the code is indeed trivial and it even compiled from the
> first try.
That's very cool. Thanks.
> Unfortunately, it doesn't work and it took me slightly longer to find out
> why does it crash when I run it. It turns out, that test code calls
> {min,max}imum_birthdate(), which calls birthdate_limit::operator(), which
> calls decimal_root() which outputs some (debugging?) messages to the
> provided iteration_stream which is by default just null_stream(). And
> null_stream() returns a reference to a static object, which ends up being
> used from multiple threads at once -- hence the crash. I'm not sure what
> would be the best way to fix it properly, for now I've simply commented out
> the use of this stream. And I also had to apply another fix to suppress the
> last error from thread sanitizer (how did we live without it before?):
> ---------------------------------- >8 --------------------------------------
> diff --git a/test_main.cpp b/test_main.cpp
> index dca5894b3..79e4cc5b4 100644
> --- a/test_main.cpp
> +++ b/test_main.cpp
> @@ -63,6 +63,7 @@
> #include "miscellany.hpp" // stifle_warning_for_unused_value()
> #include "test_tools.hpp"
>
> +#include <atomic>
> #include <iostream>
> #include <ostream>
> #include <regex>
> @@ -75,8 +76,8 @@
> {
> namespace test
> {
> - int test_tools_errors = 0; // Count of errors detected.
> - int test_tools_successes = 0; // Count of successful tests.
> + std::atomic<int> test_tools_errors{0}; // Count of errors detected.
> + std::atomic<int> test_tools_successes{0}; // Count of successful tests.
>
> class test_tools_exception : public std::runtime_error
> {
> ---------------------------------- >8 --------------------------------------
> which, I think, is pretty self-explanatory: if we run tests from more than
> one thread, we can't increment test_tools_xxx variables if they're plain
> ints, so we need to either use a mutex around them or, simpler and more
> efficiently, make them atomic.
[std::atomic discussed below.] You've diagnosed the real problem:
> After making all these changes, I could finally run the test and it did
> run much faster: 0.7 seconds only. However, it was a bit too early to
> declare victory because I didn't forget that I disabled the use of
> iteration_stream. So I've retried running the test *without* threading, but
> also without iteration_stream and the result was 0.9 seconds. Hence it
> seems that most of the time spent by this test is actually wasted writing
> to this useless stream and, perhaps, it's even worse with MinGW standard
> library implementation, which could explain why it takes so long for you.
gcc-4.9.x, i686, HEAD:
time make $coefficiency unit_tests 2>&1 | tee >(grep '\*\*\*') >(grep '????')
>(grep '!!!!' --count | xargs printf "%d tests succeeded\n") >../log
63 tests succeeded
make $coefficiency unit_tests 2>&1 34.57s user 5.76s system 357% cpu 11.293
total
[command repeated]
make $coefficiency unit_tests 2>&1 34.59s user 5.75s system 356% cpu 11.300
total
same, but with iteration_stream #ifdef'd out in 'calendar_date_test.cpp', three
runs:
make $coefficiency unit_tests 2>&1 27.88s user 5.84s system 484% cpu 6.956
total
make $coefficiency unit_tests 2>&1 27.19s user 5.41s system 529% cpu 6.159
total
make $coefficiency unit_tests 2>&1 26.89s user 5.79s system 507% cpu 6.444
total
...and now each TestBirthdateLimitsExhaustively() call takes about
one-sixth of a second instead of two and a half seconds, so that
function runs fifteen times faster.
> Finally, my recommendation would be to test whether the test runs
> acceptably fast for you without tracing. Maybe it does, and then you don't
> need to bother with threading at all. Also, if null_stream() is really the
> reason for the problem, it would seem better to get rid of it entirely: it
> could be a relatively nice abstraction, but its cost is too high compared
> to the low-tech solution of using "std::ostream* s = nullptr".
I really do want to preserve the possibility of writing "idiosyncrasyT"
in the "Comments" input field in order to see decimal_root()'s iterands.
But I don't want to pay this much for it. I'll find a way that doesn't
involve pointers.
> To summarize, this was quite educational, after all, although for
> completely different reasons -- and I don't think you can extract any more
> educational value from it from the point of view of writing MT code in
> C++11. What would still be very educational and also useful, would be to
> have a testing framework that would run different tests in parallel
> automatically. Unfortunately, neither (the real) Boost.Test nor my personal
> favourite CATCH support this
With a name like that, I thought I'd never find it, but here it is:
https://github.com/philsquared/Catch
At a casual glance, it looks good, though I'm not looking for another
unit-testing framework at the moment.
> and lmi testing framework is too threadbare to
> have any chance of implementing it, because it doesn't even support
> registering tests automatically. If we were serious about speeding up the
> tests, I'd strongly consider implementing test registration and running
> them in parallel using a thread pool. But it would be a big project with
> relatively modest benefits, so I don't think we're actually going to do
> this in the observable future.
Agreed.
> GC> (2) Split calendar_date_test into several separate tests
[...]
> To return to my initially held opinion: (2) is ugly and I'd avoid it if
> possible. (1) is not ideal, but perfectly acceptable, IMO, and fixing
> MT-safety problems uncovered by it is good hygiene anyhow.
Okay, thanks, adding std::atomic to 'test_main.cpp' can't hurt; and,
even if it did, it wouldn't affect the production system.
- Re: [lmi] Parallel blues, Greg Chicares, 2017/06/20
- Re: [lmi] Parallel blues, Vadim Zeitlin, 2017/06/20
- Re: [lmi] Parallel blues,
Greg Chicares <=
- Re: [lmi] Parallel blues, Vadim Zeitlin, 2017/06/20
- Re: [lmi] Parallel blues, Greg Chicares, 2017/06/21
- Re: [lmi] Parallel blues, Vadim Zeitlin, 2017/06/21
- Re: [lmi] Parallel blues, Greg Chicares, 2017/06/22
- Re: [lmi] Parallel blues, Vadim Zeitlin, 2017/06/22
- Re: [lmi] Parallel blues, Greg Chicares, 2017/06/27
- Re: [lmi] Parallel blues, Vadim Zeitlin, 2017/06/27