lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] MinGW-w64 anomaly?


From: Greg Chicares
Subject: Re: [lmi] MinGW-w64 anomaly?
Date: Wed, 21 Dec 2016 22:49:50 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.4.0

On 2016-12-21 17:33, Vadim Zeitlin wrote:
> On Wed, 21 Dec 2016 16:37:09 +0000 Greg Chicares <address@hidden> wrote:
> 
> GC> On 2016-12-21 14:23, Vadim Zeitlin wrote:
> GC> > On Wed, 21 Dec 2016 02:09:09 +0000 Greg Chicares <address@hidden> wrote:
> GC> [...SSE...]
> GC> > GC> I'd like to measure the effect on our 32-bit msw production release.
> GC> > GC> How might I do that?
> GC> > 
> GC> >  I think the first step would be for me to finish my changes making 
> things
> GC> > work with SSE (or, rather, with any standard-compliant C++11
> GC> > implementation) by getting rid of all x87-specific code "properly".
> GC> 
> GC> How about conditionalizing it instead of "getting rid of" it?
> 
>  In the long term, I don't think there is any benefit in keeping both
> versions. The standard doesn't cover all the functionality of fenv_lmi.hpp
> but fenv_precision() is not really used anywhere (only in the tests) and

The precision setting is implicitly relied on throughout lmi, except
where it's perturbed deliberately and temporarily for testing.

The problem is that the C99 drafters omitted precision control--as
lmi's 'fenv_lmi_x86.hpp' says:

///   1999: C finally gets fesetenv(), but without precision control *
...
/// * "without precision control"
/// According to Goldberg:
///   http://docs.sun.com/source/806-3568/ncg_goldberg.html#4130
/// "fegetprec and fesetprec functions" were recommended in early
/// drafts, but "this recommendation was removed before the changes
/// were made to the C99 standard." The Rationale:
///   www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf
/// suggests that IEC 60559 is "ambivalent" as to whether precision
/// control must be dynamic. Yet IEEE 754r Draft 1.2.5 [G.2] says:
/// "changing the rounding direction or precision during execution may
/// help identify subprograms that are unusually sensitive to roundoff"

I want to set the precision bits to the maximum and make sure they
stay set that way. But std::fesetenv() is not guaranteed to do that.
C99 (and thus C++11) give us no way of knowing what precision is
specified by FE_DFL_ENV, or whether it's the same across compilers.
I would bet that free-world compilers use the same intel default
lmi uses, but that msvc sets the FE_DFL_ENV differently. I'm not
even sure the standard guarantees that the precision bits are
included in type fenv_t: ms might decide not to recognize them
as part of its floating-point model.

That's why I'm not willing to remove the present implementation.
I am willing to consider allowing another C99 fenv one in parallel,
and to let it be the only one that's used in SSE builds--unless
there's a simpler alternative that accomplishes the same thing
(see below).

> I think a better way to implement fenv_validate() would be by performing
> some computation(s) with known good answer(s) and comparing that their
> results match the expected ones.

If fenv_t (and therefore std::fegetenv()) always reliably includes
all the contents of the x87 control word, then comparing whatever
fegetenv() returns to a saved default (which might even be the
same as FE_DFL_ENV) would be equivalent to comparing the x87
control word directly.

Testing some actual computation or computations instead seems like
a strange and roundabout way of performing the same test. Granted,
it has a certain charm--kind of like autoconf in that it probes
the actual capabilities. But designing this would be tricky, and
proving it to be equivalent to testing the x87 code word would be
a challenge. I don't think it's worth the effort.

>  But for now, yes, we could make the choice between the 2 implementations a
> compile-time option for validation/testing purposes.

All right, then we can move forward.

> GC> Okay, so if I revert that, conditional on LMI_X86, that will undo
> GC> any damage? Or should I use something like
> GC>   #if defined __SSE__ || (defined _M_IX86_FP && _M_IX86_FP)
> GC> instead? Or does C++11 offer a standard way of doing this?
> 
>  There is no standard way of checking for this to the best of my knowledge
> (it would be really surprising if there were, seeing how this is entirely
> architecture-specific) and so you would indeed need to test for __SSE__ for
> gcc and, if you're so inclined, for _M_IX86_FP > 0 for MSVC (where this is
> more complicated because this one is only defined if _M_IX86 is defined).

Okay, I believe that's what the snippet above does. Well, actually
it tests whether _M_IX86 is defined and nonzero, as opposed to
positive; but if you take the value to be unsigned, then that
condition is the same.

> GC> Can we make everything work with msvc and with 64-bit gcc by
> GC> conditionalizing any x87-specific code?
> 
>  Well, yes, but why? If we replace x87-specific code with the standard
> functions we could (and would) still compile it using x87 instructions in
> 32 bits with gcc if/as you wish to continue doing it, so why keep it?

Because the C99 committee chose to omit the precision bits. I'd say
their mistake is somewhere between ghastly and unconscionable, but
I'm trying to suppress my emotions.

> GC> >  And while the build, per se, is not broken, lmi is broken in the sense
> GC> > that the computations it performs presumably give incorrect results.
> GC> 
> GC> I'm not sure whether you're saying the 32-bit production system is
> GC> "broken" in this sense; I think that would be too strong a statement,
> GC> like saying that all programs were "broken" before SSE.
> 
>  This was because I thought we did use different rounding styles in lmi
> code (which would have made it broken). But I was wrong...
> 
> GC> > Unless the rounding mode is never changed while doing them? But this
> GC> > is probably not the case, otherwise why would we have all the code
> GC> > dealing with it in the first place...
> GC> 
> GC> We check that the control word always remains 0x037f:
> GC>  03 round to nearest, extended precision
> GC>  7f mask all hardware exceptions
> 
>  ... because apparently we don't ever do this. But now I really wonder why
> do we have all this code in the first place?

Because if two users run lmi with the same input, we want them to
get the same output. But msvc decided to poison the control word
even for programs that do no floating point calculations, as part
of the startup code for any 'exe' and the initialization code for
any 'dll', and they decided not to virtualize the x87 control
word across task switches, so any program started or any dll
initialized while lmi is running could change lmi's results:
that's why this code exists.

> GC> >  Anyhow, for me the important question is whether you'd like me to 
> produce
> GC> > a reasonable patch (right now I just have a dirty hack) allowing the
> GC> > rounding tests to pass when not using x87 by replacing x87-specific code
> GC> > with the standard functions (based on/inspired by my ~10 year old IEC 
> 559
> GC> > patch) or if it's not worth doing it, either because it won't get done 
> at
> GC> > all (which would be sad) or because you prefer to do it yourself?
> GC> 
> GC> If you're asking me to decide now to give up extended precision
> GC> unconditionally and forever, then I'd probably have to decline.
> 
>  No, sorry, this is not at all the question. I ask you to only give up
> *explicit* use of x87 instructions. This doesn't preclude us from compiling
> the code to use them -- but it also allows compiling the code to _not_ use
> them, which is impossible now.

Then we do this once, in 'config.hpp':

  #if defined __SSE__ || (defined _M_IX86_FP && _M_IX86_FP)
  #   define LMI_SSE
  #endif //  defined __SSE__ || (defined _M_IX86_FP && _M_IX86_FP)

and wherever we now use x87-specific instructions, we conditionalize
them like this:

  #if !defined LMI_SSE
      asm volatile("fstcw %0" : : "m" (control_word));
  #else // SSE
      std::fe...something();
  #endif

and I suppose we similarly conditionalize compiler-specific functions
like _control87(). (But see below for a suggestion to use STMXCSR
in the #else clause instead of std::fewhatever.)

That's fine with me.

>  To be precise, I suggest:
> 
> 1. Add an alternative implementation of all fenv_xxx() functions except for
>    fenv_precision() for which this is impossible, but which is not really
>    used anywhere anyhow, using only the standard functions, with a

It is used for the crucial purposes of setting its initial value and
maintaining that setting as an invariant.

>    condition allowing to select it at compile time (possibly activated
>    automatically when SSE is used, i.e. if __SSE__ or whatever is defined).

Does lmi actually set the rounding mode anywhere? This:
  $grep fenv_rounding *.?pp
suggests we use it only
 - in lmi's full-featured alternative to the C99 stuff; and
 - in the 'round*test.cpp' unit tests.
Significantly, the production system distributed to end users never
uses any rounding mode other than round-to-nearest-or-even. Thus,
it seems that only those two unit tests would need any of this.
And it is noteworthy that 'round_test' doesn't actually test any
lmi code: instead, it tests the compiler's or RTL's std::round().
So the only present need for any of this is in 'round_to_test'.

> 2. Compare the results and performance of the build using the standard
>    functions (but still using x87 instructions!) with the current version.
> 3. If both are the same, as expected, replace the current code with the
>    alternative implementation and remove the condition added in (1).
> 
>  Notice that at the end of all this, you will still be building lmi using
> x87 instructions in 32 bits, the only change will be that the code will be
> much smaller and cleaner and the tests will now also pass in 64 bit builds.


Now that we've arrived at this point, let's step back and reconsider
what we're trying to accomplish, and how best to accomplish it. We
see that C99's fenv section crucially lacks precision control, which
is a prerequisite for making lmi results reproducible, which is an
imperative. I think we can also conclude that we already have a full
replacement of C99's fenv for x86/x87, which could be extended to
include x86_64 as well in either of two ways:

(A) use C99's fenv for x86_64, and lmi's for x86

(B) extend lmi's implementation, e.g.:

+ #if defined LMI_X86
      asm volatile("fstcw %0" : : "m" (control_word));
+ #else if defined LMI_SSE
+     asm volatile("ldmxcsr ... ...
+ #else // !defined LMI_X86 && !defined LMI_SSE
+ #   error Unknown platform
+ #endif // !defined LMI_X86 && !defined LMI_SSE

and likewise for this:
    asm volatile("fldcw %0" : : "m" (control_word));

(A) might work automatically on ARM CPUs, though I'm not sure we
need to care about that. Otherwise, (B) seems much simpler than (A).

>  If you agree with the above, could you please tell me which tests should
> be used for the checks in (2)?

If we choose (B), then I don't think that question even arises.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]