lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Performance of bourn_cast


From: Vadim Zeitlin
Subject: Re: [lmi] Performance of bourn_cast
Date: Sat, 18 Mar 2017 17:39:07 +0100

On Sat, 18 Mar 2017 16:17:01 +0000 Greg Chicares <address@hidden> wrote:

GC> Here are optimized and unoptimized speed measurements for the new cast
GC> added in commit 5e36b0a6bc9fed0788b6f74a9896387c8a729cb4:

 It's going to get me some time to get used to the name of the new cast as
it's not a word I use daily (and it doesn't help that it's much more common
in French, but spelt differently there). It's probably not a very useful
question, but why couldn't it be called lossless_cast<> or something like
that instead (unfortunately my preferred name, "exact_cast" is already
taken by something else)?

GC> Reverting the patch above, let's try everything we can think of to
GC> hand-optimize bourn_cast:
GC> 
GC> 
---------8<--------8<--------8<--------8<--------8<--------8<--------8<-------
GC> diff --git a/bourn_cast.hpp b/bourn_cast.hpp
GC> index 87b61bb..6d4dffe 100644
GC> --- a/bourn_cast.hpp
GC> +++ b/bourn_cast.hpp
GC> @@ -80,12 +80,18 @@ To bourn_cast(From from)
GC>      using from_traits = std::numeric_limits<From>;
GC>      static_assert(  to_traits::is_specialized, "");
GC>      static_assert(from_traits::is_specialized, "");
GC> +    static constexpr bool to_is_unsigned = !to_traits::is_signed;
GC> +    static constexpr bool from_is_signed = from_traits::is_signed;
GC> +    static constexpr To lower_bound = to_traits::lowest();
GC> +    static constexpr To upper_bound = to_traits::max();
GC> +    static constexpr bool must_test_lower = from_traits::lowest() < 
lower_bound;
GC> +    static constexpr bool must_test_upper = upper_bound < 
from_traits::max();
GC>  
GC> -    if(! to_traits::is_signed && from < 0)
GC> +    if(to_is_unsigned && from_is_signed && from < 0)
GC>          throw std::runtime_error("Cast would convert negative to 
unsigned.");
GC> -    if(from_traits::is_signed && from < to_traits::lowest())
GC> +    if(from_is_signed && must_test_lower && from < lower_bound)
GC>          throw std::runtime_error("Cast would transgress lower limit.");
GC> -    if(to_traits::max() < from)
GC> +    if(must_test_upper && upper_bound < from)
GC>          throw std::runtime_error("Cast would transgress upper limit.");
GC>      return static_cast<To>(from);
GC>  #   if defined __GNUC__
GC> 
--------->8-------->8-------->8-------->8-------->8-------->8-------->8-------
...
GC> Are any of the hand optimizations above worth the complexity they add?
GC> Here's a comparison where each line is the median of five runs, both
GC> using the astonishing speedup technique:
GC> 
GC> '-O2', "hand optimized" code:
GC>   direct: 4.184e-004 s =     418430 ns, mean of 100 iterations
GC>   S to U: 4.214e-004 s =     421380 ns, mean of 100 iterations
GC>   U to S: 4.920e-004 s =     491967 ns, mean of 100 iterations
GC> 
GC> '-O2', original commit:
GC> 
GC>   direct: 4.190e-004 s =     418974 ns, mean of 100 iterations
GC>   S to U: 4.208e-004 s =     420830 ns, mean of 100 iterations
GC>   U to S: 5.151e-004 s =     515096 ns, mean of 100 iterations
GC> 
GC> There does seem to be a slight improvement. I wouldn't suppose that
GC> caching numeric_traits values in static constexpr statements has any
GC> benefit. Perhaps 'must_test_lower' and 'must_test_upper' actually
GC> do help,

 AFAICS they can help if they are "false", i.e. if the corresponding checks
are unnecessary as the compiler really ought to be able to discard all the
code, including both the test and the (dead) throw following it, in this
case. But OTOH I don't think we are actually going to use bourn_cast<> in
any cases when both of them are false, so there will still be some overhead
in this function and considering how little difference this optimization
makes in practice, I'm not sure it's worth it.

GC> though the timing differences are so small that they don't
GC> really prove anything; yet they might make a measurable difference in
GC> some case other than those tested. Vadim, what do you think?

 I'd expect this to make a difference when bourn-casting from int to long,
for example, but we're probably never going to do this anyhow, so the point
is moot.

GC> SPOILER: The amazing secret? Add 'inline'. I had supposed that these
GC> days that keyword is mainly useful for avoiding ODR violations, and
GC> that the compiler would perform inlining whenever it would help,
GC> without any hint from me--so I was surprised.

 Yes, I admit I was too. But looking at the function, it does seem quite
complicated, with all these multiple "if"s, so I guess it's understandable
that it's not obvious to the compiler that it ought to be inlined.

 BTW, have you tried making the function constexpr? Regardless of the
runtime performance, this would seem to be the right thing to do (although,
admittedly, it risks being a little painful in C++11...).

VZ


reply via email to

[Prev in Thread] Current Thread [Next in Thread]