[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Class wx_table_generator
From: |
Vadim Zeitlin |
Subject: |
Re: [lmi] Class wx_table_generator |
Date: |
Tue, 1 May 2018 14:53:04 +0200 |
On Tue, 1 May 2018 02:28:07 +0000 Greg Chicares <address@hidden> wrote:
GC> I'll take care of everything. I now perceive the unity and conflict of
GC> opposites, so where now we're witnessing the transformation of quantitative
GC> into qualitative changes, soon the negation of the negation will manifest
GC> itself and all will be rendered clear to both of us. We have nothing to
GC> lose but our confusion.
I'm almost crying from revolutionary joyfulness after reading this (and
thinking about writing a blog post about refactoring being the opium of the
developers).
GC> > GC> But you use at() very often. Do you prefer at() to operator[] in all
cases,
GC> > GC> because it replaces a segfault with orderly termination?
GC> >
GC> > Yes (except that an exception doesn't necessarily lead to termination,
GC> > although it might). I think the optimization of not checking the index
GC> > offered by operator[] can almost never be really justified.
GC> Interesting. I pushed a simple timing test:
GC> git push origin odd/vector-at-speed
GC> and here are the reformatted results for
GC> -O3, vs.
GC> -O0 with libstdc++ debug mode...
GC>
GC> /opt/lmi/src/lmi[0]$make unit_tests unit_test_targets=sandbox_test.exe
GC>
GC> 39 milliseconds with at()
GC> 12 milliseconds with []
GC> 12 milliseconds with ranged-for
It's a bit surprising that the compiler couldn't pull the index check out
of the loop even in this simplest possible case. But it also means that
this test is probably representative of the true overhead, in general, when
the loop is less trivial and so the compiler can't be expected to hoist the
check out.
But FWIW here are my results for a very similar benchmark (basically I
just put all the code in a single file, as it's more convenient to compile
one file when testing different compilers/options rather than using make)
and I can't reproduce the difference above. My number (in seconds rounded
to centiseconds because the benchmark is not precise enough to use
milliseconds) are:
Compiler + options [] at() ranged-for
--------------------------------------------------
g++-7 -O0 0.25 0.32 0.28
g++-7 -O2 0.05 0.05 0.05
g++-7 -O3 0.04 0.04 0.04
g++-8 -O0 0.25 0.32 0.28
g++-8 -O2 0.05 0.05 0.04
g++-8 -O3 0.04 0.04 0.03
clang-7 -O0 0.26 0.32 0.29
clang-7 -O2 0.03 0.03 0.03
clang-7 -O3 0.03 0.03 0.03
Moreover, I looked at the generated code and it's identical between [] and
at() versions when using optimizations, so it's not surprising the times
are the same. Also, I have no idea what clang does because it generates
much more complex code than gcc, but it's clearly worth it as it somehow
manages to outperform it even though gcc code seems (clearly erroneously)
optimal to me...
GC> Optimized results: at() does take longer, which surprised me a little
GC> because I thought the compiler might optimize its overhead away. But
GC> the test does so many indexing operations that the time difference
GC> would probably be slight if we actually did anything interesting with
GC> the indexed elements--this test just accumulates them.
Yes. However, just to be a devil's advocate, I think the difference could
be much more dramatic if using operator[] allows auto-vectorization and
at() does not. However I couldn't find a test case for this and, rather
amazingly, it looks like the compiler is better at auto-vectorizing ranged
for loops than manual loops using either [] or at(). I.e. testing the
following, trivially vectorizable, loops:
for(unsigned j = 0; j < v.size(); ++j) v[j] *= 2;
for(unsigned j = 0; j < v.size(); ++j) v.at(j) *= 2;
for(auto& n: v) n *= 2;
and a loop using raw pointer:
for(unsigned j = 0; j < n; ++j) data[j] *= 2;
I get the following results:
Compiler + options [] at() ranged-for raw
------------------------------------------------------------
g++-8 -O2 0.08 0.07 0.05 0.06
g++-8 -O3 0.07 0.06 0.05 0.04
clang-7 -O2 0.04 0.04 0.04 0.03
clang-7 -O3 0.04 0.04 0.03 0.03
This is probably a bad benchmark as the numbers are too volatile but,
still, there is no doubt that using operator[] is _not_ slower than using
at() when vectorization is used and that using ranged for can somehow be
faster than both of them. Clang results are also nice as there is almost no
abstraction penalty, while with gcc there is still a noticeable difference
between a loop using the raw pointer and the other ones at -O3,
unfortunately.
GC> But I won't object to your practice in code you've written, which
GC> generally does complex graphical operations where at()'s cost is
GC> relatively negligible.
I'd say that, more generally, at() overhead is negligible when it's called
once, and not inside the loop. And loops should use operator[] when it's
clear that the index is always valid or, ideally, ranged-for, to avoid
worrying about the indices completely.
Anyhow, considering the results above, I'm afraid the conclusion is the
same unexciting one as always: when you have performance-sensitive code,
you need to benchmark it to find out whether it's really worth it to use
operator[] or not. But, having said this, ranged for seems to be almost as
fast without optimizations and as fast if not faster with them, so it
should still be IMO preferred whenever it can be used because the code
using it is much more clear and less error-prone.
Regards,
VZ
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [lmi] Class wx_table_generator,
Vadim Zeitlin <=