[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] [PATCH] Fix build with g++5
From: |
Greg Chicares |
Subject: |
Re: [lmi] [PATCH] Fix build with g++5 |
Date: |
Fri, 18 Sep 2015 21:34:34 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.3.0 |
On 2015-09-18 02:16, Vadim Zeitlin wrote:
> On Thu, 17 Sep 2015 22:20:03 +0000 Greg Chicares <address@hidden> wrote:
[...'const' follows the type it modifies, so 'volatile' should too, globally...]
> But I can live with local consistency as well. OTOH it would be trivial to
> just exchange the order of "volatile" and "T" here and in a few other
> places to achieve uniform consistency...
Sure, but let's wait until after the Halloween release.
> GC> (2) the union trick prevents such optimizations, whereas the compiler
> GC> can recognize memcpy() and replace it with an optimization-friendly
> GC> intrinsic.
>
> Just to confirm: gcc5 does inline memcpy() with "-O2". For example, in
> md5_finish_ctx() the first (existing) memcpy() function call remains in the
> optimized version but the 2 my patch adds appear as just a single "mov"
> instruction.
I was curious about that, but didn't have the time to look into it,
so thanks for sharing this analysis.
> GC> [...] this begins to make the code a little harder
> GC> to understand, and causes me to think we might be better off writing an
> GC> inline (for optimizability) memcpy_cast() function.
>
> I tried to keep things as simple as possible but yes, we surely could have
> such a function and we could even have 2 template parameters for it to
> optionally to return a larger type, e.g. to allow writing
>
> int const index_table_number = memcpy_cast<int32_t, int>(index_record);
>
> Should I make a patch for this?
Yes, please. I'm in a quandary. The original reinterpret_cast lines seem
much clearer to me than the memcpy() replacements; but if their behavior
is undefined, then they have to change. I'm hoping that memcpy_cast<>()
might preserve code clarity while slaying the UB demon.
> GC> We may wish we had done that if g++-6 or -7 gets more aggressive about
> GC> these warnings and forces us to make lots more changes elsewhere.
>
> I don't think there are any other places where we would need this.
I didn't think there were formerly any aliasing problems. Generally, I
expect one surprise to foreshadow more surprises to come. But no matter
how different our personal outlooks, we're scientists, so maybe we can
agree if we can find some data.
To that end, I tried building with
tutelary_flag='-Wstrict-aliasing=1'
(which is appended to any builtin default CFLAGS or CXXFLAGS), and found no
further issues. However, by happy accident, I had first tried
tutelary_flag='-Wstrict-aliasing=n'
and that worked just as smoothly, so I'm guessing that gcc-3.4.5 doesn't
offer different levels for this warning--as the manual seems to confirm:
https://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Warning-Options.html#index-Wstrict_002daliasing-227
Perhaps you could try -Wstrict-aliasing=1 or -Wstrict-aliasing=2 and
let me know what you see. AIUI, LLVM doesn't make any attempt to diagnose
aliasing problems. In a quick search, I found no other tool for this.
> GC>
> GC> for(int j = 0; j < number_of_values; ++j)
> GC> {
> GC> is.read(z, sizeof(double));
> GC> - data_[j] = *reinterpret_cast<double*>(z);
> GC> + double d;
> GC> + std::memcpy(&d, z, sizeof(double));
> GC> + data_[j] = d;
> GC> }
> GC>
> GC> I'm sure you don't like that any more than I do. Merely reading the code
> GC> can't be enough to make me feel comfortable accepting it, because this is
> GC> a time-critical loop, and creating a temporary variable just looks like it
> GC> might slow the system down...so let's measure it:
>
> Sorry, I should have mentioned that I did compare the benchmark before and
> after my changes and didn't see any statistically significant differences.
>
> I also didn't realize this loop was time-critical. If it is, I think it
> would be much faster to just read all doubles at once instead of doing it
> one by one. Would you be interested in making the measurements showing it
> it's worth doing this or not?
Okay--results below.
> GC> But still I balk at introducing that temporary. Would something like
> GC> this example (which I even tested, with g++-3.4.5 only):
> GC>
> GC> - char z[sizeof(double)];
> GC> for(int j = 0; j < number_of_values; ++j)
> GC> {
> GC> - is.read(z, sizeof(double));
> GC> - data_[j] = *reinterpret_cast<double*>(z);
> GC> + is.read(reinterpret_cast<char*>(&data_[j]), sizeof(double));
> GC> }
> GC>
> GC> be acceptable to g++-5?
>
> Yes, it would. But if you're willing to write code like this, then IMHO we
> should just directly go to
>
> is.read(static_cast<char*>(static_cast<void*>(&data_[0])),
> number_of_values * sizeof(double));
>
> and remove the loop completely.
Using the same testing command as in my preceding email, here I'll report
only the timings for five trials:
1386664 1383332 1390109 1393764 1383697 median 1386664 HEAD
1389036 1500248 1403038 1406045 1408138 median 1406045 my long read()
statement above
1369771 1322875 1298282 1311516 1292876 median 1311516 your loopless read()
As expected, the loopless read() wins. But was I speaking carelessly when
I said this loop is time critical? I'm sure, from past measurements, that
making this code much slower creates a bottleneck that end users can feel.
But let's test overall system speed in a meaningful way:
for z in a b c d e; do make $coefficiency cli_tests 2>/dev/null | grep
iterations; done
condensing the results as above:
24378031 24497276 24303548 24760823 24346506 median 24378031 my long
read() statement above
24375800 24426109 25250134 26510577 24376345 median 24426109 your loopless
read()
The approach that performed best in the unit test, performed worse in the
overall test. But the differences are slight: we're just measuring noise.
Conclusion: yes, I did misspeak. Thanks for pointing out the loopless read()
technique--I wouldn't have though of leveraging std::vector's contiguous-
storage guarantee that way--but I don't think the gain in performance is
worth the cost in lost clarity. Neither is my long read() statement worth
the obscurity it adds, if memcpy_cast<>() can rescue HEAD from UB without
harming clarity.
- [lmi] [PATCH] Fix build with g++5, Vadim Zeitlin, 2015/09/15
- Re: [lmi] [PATCH] Fix build with g++5, Greg Chicares, 2015/09/17
- Re: [lmi] [PATCH] Fix build with g++5, Vadim Zeitlin, 2015/09/17
- Re: [lmi] [PATCH] Fix build with g++5,
Greg Chicares <=
- Re: [lmi] Working with and around strict aliasing (was: Fix build with g++5), Vadim Zeitlin, 2015/09/18
- Re: [lmi] Working with and around strict aliasing, Greg Chicares, 2015/09/19
- Re: [lmi] Working with and around strict aliasing, Vadim Zeitlin, 2015/09/19
- Re: [lmi] Working with and around strict aliasing, Greg Chicares, 2015/09/20
- Re: [lmi] Working with and around strict aliasing, Vadim Zeitlin, 2015/09/20