[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Parallel blues
From: |
Greg Chicares |
Subject: |
Re: [lmi] Parallel blues |
Date: |
Sun, 24 Jul 2016 15:11:14 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.8.0 |
On 2016-07-18 19:55, Vadim Zeitlin wrote:
>
[...problems parallelizing test_coding_rules...]
>
> First of them, already seen with MSVC build, is that even though I used a
> mutex to serialize the error messages logged to standard output and error
> streams, the relative order of these messages is not defined any more and
> so test_coding_rules.sh is broken by this change. This could be fixed by
> storing all error messages in a map indexed by the order of the
> corresponding file on the command line, but I'd rather like to propose to
> change test_coding_rules.sh to sort both the expected and observed output
> because it doesn't actually matter in which order it's given, I think, and
> it doesn't seem worth complicating the code just to generate it in the same
> order.
It emits a stream of diagnostics, which have a structure that isn't
quite trivial to parse when different diagnostics are interleaved, e.g.:
File 'eraseme_log_002.Log' violates seventy-character limit:
0000000001111111111222222222233333333334444444444555555555566666666667
1234567890123456789012345678901234567890123456789012345678901234567890
This line's length is slightly over the limit, so it must be diagnosed.
If we pass the test results through 'sort', and observe one more
'1234567890...' line than expected, then it's hard to guess where it
came from, because the last three lines above don't indicate which file
they pertain to.
However, most diagnostics take a single line and mention the file name:
Exception--file 'a_nonexistent_file': File not found.
File 'an_expungible_file.bak' ignored as being expungible.
and the order of lines like that doesn't matter. Perhaps we should just
do something a little different for the special case above, such as
emitting the extra three lines only if a '--verbose' option is specified.
I don't know whether there are other special cases like this, but they
shouldn't be very difficult obstacles to overcome.
> What was really bad is that I discovered that the compiler we use doesn't
> support C++11 thread library *at all*. It comes with the required headers,
> such as <thread>, <mutex>, ... but they don't actually define std::thread,
> std::mutex etc.
A Potemkin village. Strike one.
> So to use the code I've written we would need to switch to using the POSIX
> threads version and also distribute the POSIX threads emulation library
> libwinpthread-1.dll used by it to implement its (inefficient, but better
> than nothing)
Strike two.
> https://github.com/meganz/mingw-std-threads. As explained there, this
> provides a simple header-only implementation of C++11 threading reusing
> some of the classes provided by libstdc++. This is clearly a hack
Three strikes.
I feel like we're in the back old days when C++ compilers provided either
no STL or a wretched one, so we used the objectspace or roguewave
implementation, hacking it as necessary to overcome incompatibilities
between it and our compilers. These newly-standardized libraries don't
seem to be ready for production, at least not with MinGW-w64. Maybe other
libraries are production-ready, but not <regex> and obviously not <thread>.
> For completeness, I'd also like to mention that the decision to not go
> with TDM-GCC seems more and more regrettable retroactively as this compiler
> does support C++11 threading out of the box (and even links libwinpthread-1
> statically into its libstdc++, so saving the bother with distributing it)
> and just generally seems to be more better thought through. Basically
> whenever I am banging my head against the wall crying "What could MinGW-w64
> developers be possibly thinking?", I discover that the TDM-GCC maintainer
> has made a different, and better, choice, and it's a rather good sign,
> isn't it?
We could, of course, build TDM-GCC ourselves. Just an idea for the future
(and not until we've moved all development to GNU/Linux).
> Anyhow, back to the unexpected problems, knowing that I use POSIX compiler
> version when cross-compiling now and the mingw-std-threads hack when using
> the official build system.
>
> The next discovery is that Boost.Regex is not thread-safe when compiled
> with gcc.
That would be strike four. There is no strike four.
And even if we look past that, the speedup is not as dramatic as we
might have hoped:
> Threads Time (s)
> ----------------
> 1 27.7
> 2 16.4
> 3 11.5
[...]
> 12 8.8
This is a dead end. Threading turns out not to be a silver bullet.
If we want to make this faster, the most promising way is to avoid
checking files that have already been checked, perhaps using the
same method as the 'check_physical_closure' target...but running
that target when it has nothing to do still takes four seconds:
/lmi/src/lmi[0]$time make $coefficiency check_physical_closure
make[1]: Nothing to be done for 'check_physical_closure'.
make $coefficiency check_physical_closure 1.74s user 2.83s system 126% cpu
3.595 total
so at best that would roughly triple the speed of 'check_concinnity'.
It would also cause hundreds of sentinel files to be created, which
would make it more cumbersome to use in other directories (notably,
those that hold proprietary product files).
I think we should just put this back on the shelf and reconsider it
when the new standard libraries have matured.