Re: [lmi] MinGW-w64 anomaly?

lmi
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] MinGW-w64 anomaly?

From:	Greg Chicares
Subject:	Re: [lmi] MinGW-w64 anomaly?
Date:	Thu, 22 Dec 2016 01:24:58 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.4.0
On 2016-12-21 23:43, Vadim Zeitlin wrote:
> On Wed, 21 Dec 2016 22:49:50 +0000 Greg Chicares <address@hidden> wrote:
> 
> GC> > I think a better way to implement fenv_validate() would be by performing
> GC> > some computation(s) with known good answer(s) and comparing that their
> GC> > results match the expected ones.
> GC> 
> GC> If fenv_t (and therefore std::fegetenv()) always reliably includes
> GC> all the contents of the x87 control word
> 
>  I'm not sure if we can count on this but I could check.

On second thought, even if that's true for some set of compilers today,
they might change in the future. Even gcc might change.

> GC> Testing some actual computation or computations instead seems like
> GC> a strange and roundabout way of performing the same test.
> 
>  Really? It seems like much more direct way to me: after all, we're not
> interested in preserving x87 control word, we just want to have the correct
> results. And for the code compiled to use SSE instead of x87 instructions
> these are not at all the same thing, which is the source of the problem.

Yet the underlying problem is avoided completely, AIUI, with SSE.
Therefore, the crucial change needed for the production system is
simply this:

 void Skeleton::UponTimer(wxTimerEvent&)
 {
+#if defined LMI_X86 && !defined LMI_SSE
     if(0 == fenv_guard::instance_count())
         {
         if(!fenv_is_valid())
             {
             status() << "Resetting floating-point control word. " << 
std::flush;
             }
         fenv_validate(e_fenv_indulge_0x027f);
         }
+#endif // defined LMI_X86 && !defined LMI_SSE
 }

and we don't need a parallel implementation that works for SSE too,
because we wouldn't have any reason to use it--in the production
system today.

OTOH, for 'round_to_test' and any future work like that, there is
a case for preferring the C99 IEC_559 stuff.

> If
> we checked the computation results directly we wouldn't have this problem
> in the first place and if we use SSE-specific instruction instead of doing
> it now, we'll just have the same problem again when porting to ARM or even
> the next generation of Intel CPUs which provides some better way of doing
> floating point arithmetic (128 bit doubles in hardware?).

The problem that fenv_validate() addresses isn't an x87 problem:
it's an msw problem. I rather doubt that it would arise on GNU/Linux,
though I haven't checked that. We could probably suppress the msw-
specific workaround above also in the case that the underlying OS
is anything but msw, unless 'wine' has carefully emulated this awful
defect of msw (that's a philosophical point, not a suggestion that
we'd actually spend time doing any such thing).

> GC> But designing this would be tricky, and proving it to be equivalent to
> GC> testing the x87 code word would be a challenge. I don't think it's
> GC> worth the effort.
> 
>  How much effort would it really take? Testing the rounding mode is
> trivial, AFAICS we just need 2 tests using std::rint() for -0.5 and 0.5 to
> ensure that FE_TONEAREST is in effect. And for the precision, we could just
> find something that would be 0 with 64 bit doubles but non-0 with 80 bits.

I'm sure I'm more pessimistic than you in general. But, that aside,
any code that we don't write cannot introduce defects. And I don't
think this code needs to be written at all.

> GC> > GC> Okay, so if I revert that, conditional on LMI_X86, that will undo
> GC> > GC> any damage? Or should I use something like
> GC> > GC>   #if defined __SSE__ || (defined _M_IX86_FP && _M_IX86_FP)
> GC> > GC> instead? Or does C++11 offer a standard way of doing this?
> GC> > 
> GC> >  There is no standard way of checking for this to the best of my 
> knowledge
> GC> > (it would be really surprising if there were, seeing how this is 
> entirely
> GC> > architecture-specific) and so you would indeed need to test for __SSE__ 
> for
> GC> > gcc and, if you're so inclined, for _M_IX86_FP > 0 for MSVC (where this 
> is
> GC> > more complicated because this one is only defined if _M_IX86 is 
> defined).
> GC> 
> GC> Okay, I believe that's what the snippet above does.
> 
>  Oops, sorry, I've somehow glossed over the second part of it. However I
> still don't think it's quite correct as the condition above is false for 64
> bit platforms (_M_X64 defined, but not _M_IX86) or, in fact, any other
> architecture. So I'd write it as
> 
>       defined _MSC_VER && (!defined _M_IX86 || _M_IX86_FP)

I'd be happy enough to copy and paste that and let you test it,
because I don't have that compiler. Yet I did have the possibly
mistaken impression from some stackoverflow article that those
*IX* macros were defined, with the value zero, when SSE is not
used.

> GC> Because the C99 committee chose to omit the precision bits. I'd say
> GC> their mistake is somewhere between ghastly and unconscionable, but
> GC> I'm trying to suppress my emotions.
> 
>  I think it's understandable that they didn't want to standardize
> functionality available for only a single process on the market (or are
> there any other ones like it?), even if it's the most dominant one.

I'm saying they should have let it be set dynamically in conformity
to the IEEE754 vision, regardless of intel's design.

> GC> Because if two users run lmi with the same input, we want them to
> GC> get the same output. But msvc decided to poison the control word
> GC> even for programs that do no floating point calculations, as part
> GC> of the startup code for any 'exe' and the initialization code for
> GC> any 'dll', and they decided not to virtualize the x87 control
> GC> word across task switches,
> 
>  I like blaming Microsoft as much as any other person, but I don't think
> the last part is correct for any version of MSW from this millennium.

We did have at least one actual report of this fenv_validate() stuff
failing in the msw-xp era, though I'm not sure which OS release was
involved. We have no control over what OS end users in remote locations
run, and it is not yet utterly irrational to run msw-xp in a VM with no
network connectivity--there are people on the cygwin mailing list who
do that in 2016 and aren't ashamed to admit it.

Anyway, I'm as willing to remove this code as I am to trust ms, which
is to say not at all. "Fooled me once, shame on you; fooled me twice,
shame on me." They might have failed to virtualize the CW properly in
all cases, or they might decide not to do it again in the future. Or
a new "anti-virus" program could be installed that messes this stuff
up in a novel way.

> GC> so any program started or any dll initialized while lmi is running
> GC> could change lmi's results: that's why this code exists.
> 
>  And, of course, it's impossible for any compiler/CRT to prevent a DLL
> loaded into the process from changing any process-wide parameters, such as
> x87 (or, indeed, SSE) control word. But I think badly behaved shell
> extensions doing this are much more rare nowadays than before. Out of
> curiosity, when was the last time you received a report about the floating
> point environment failure from one of lmi users?

Early in this millennium.

> GC> Then we do this once, in 'config.hpp':
> GC> 
> GC>   #if defined __SSE__ || (defined _M_IX86_FP && _M_IX86_FP)
> GC>   #   define LMI_SSE
> GC>   #endif //  defined __SSE__ || (defined _M_IX86_FP && _M_IX86_FP)
> GC> 
> GC> and wherever we now use x87-specific instructions, we conditionalize
> GC> them like this:
> GC> 
> GC>   #if !defined LMI_SSE
> GC>       asm volatile("fstcw %0" : : "m" (control_word));
> GC>   #else // SSE
> GC>       std::fe...something();
> GC>   #endif
> 
>  I don't like explicitly testing for SSE. I really, really don't understand
> why do you consciously lock yourself into the choice between x87 and SSE
> instead of making a much more natural choice between x87 and
> standard-conforming implementation.

Because the standard doesn't do everything I need, but I have an x87
implementation that does; and because the only reason our production
system bothers doing this is to address a problem that the standard
does not. With SSE, I assume, there's no such problem, and no need
for any of this--so we could just conditionally #ifdef it out and
we'd be done, with no other change to the production release.

The only thing that doesn't cover is the single special case of
'round_to_test', and that's not a compelling reason to try to use
the standard approach across the board when we know it's not fit
for the dominant purpose of guarding against msw dlls resetting
the x87 control word.

Well, that's the thesis and the antithesis, but below I'll make
an attempt at synthesis.

>  I propose to have LMI_USE_X87 which would be set like this:
> 
>       #if defined __GNUC__
>           #if defined LMI_X86 && !defined __SSE__
>               #define LMI_USE_X87
>           #endif
>       #elif defined _MSC_VER
>           #if defined _M_IX86 && _M_IX86_FP == 0
>               #define LMI_USE_X87
>           #endif
>       #endif
> 
> and then use it in the following way
> 
>       #ifdef LMI_USE_X87
>           asm volatile("fxxx");
>       #else
>           std::fexxx();
>       #endif
> 
> GC> >  To be precise, I suggest:
> GC> > 
> GC> > 1. Add an alternative implementation of all fenv_xxx() functions except 
> for
> GC> >    fenv_precision() for which this is impossible, but which is not 
> really
> GC> >    used anywhere anyhow, using only the standard functions, with a
> GC> 
> GC> It is used for the crucial purposes of setting its initial value and
> GC> maintaining that setting as an invariant.
> 
>  Yes, but only as part of fenv_initialize() and fenv_validate().

Exactly. I have validated code for that particular specialized purpose,
and I'm not eager to throw it away. Yet I feel no compulsion to require
code written for any other purpose to follow the same nonstandard path.

> GC> Now that we've arrived at this point, let's step back and reconsider
> GC> what we're trying to accomplish, and how best to accomplish it. We
> GC> see that C99's fenv section crucially lacks precision control, which
> GC> is a prerequisite for making lmi results reproducible, which is an
> GC> imperative. I think we can also conclude that we already have a full
> GC> replacement of C99's fenv for x86/x87, which could be extended to
> GC> include x86_64 as well in either of two ways:
> GC> 
> GC> (A) use C99's fenv for x86_64, and lmi's for x86
> GC> 
> GC> (B) extend lmi's implementation, e.g.:
> GC> 
> GC> + #if defined LMI_X86
> GC>       asm volatile("fstcw %0" : : "m" (control_word));
> GC> + #else if defined LMI_SSE
> GC> +     asm volatile("ldmxcsr ... ...
> GC> + #else // !defined LMI_X86 && !defined LMI_SSE
> GC> + #   error Unknown platform
> GC> + #endif // !defined LMI_X86 && !defined LMI_SSE
> GC> 
> GC> and likewise for this:
> GC>     asm volatile("fldcw %0" : : "m" (control_word));
> GC> 
> GC> (A) might work automatically on ARM CPUs, though I'm not sure we
> GC> need to care about that. Otherwise, (B) seems much simpler than (A).
> 
>  Sorry, but I don't follow at all. Since when is writing inline assembly
> simpler than using a standard function!? It's obviously more difficult (gcc
> asm statement has its own DSL that has nothing to do with the standard C++
> and must be learnt separately), less readable (are all C++ programmers
> supposed to know assembly, including rarely used instructions such as
> those, now?) and not portable, neither between architectures nor compilers
> (gcc asm obviously doesn't work for MSVC nor probably any other compiler
> with the exception of clang). I really see no advantage whatsoever to
> resorting to inline assembly here, but tons of disadvantages.

We have two asm statements that were written probably in the last
millennium. They're probably the two most expensive statements in
lmi, but...sunk costs are sunk.

Adding two more, for the SSE case, would entail considerable cost
(we'd want to make *very* sure they're perfect), but then we'd be
all done. I'm guessing that the cost of replacing proven code with
a new IEC_559 implementation would probably be greater, and the
risk would certainly be greater.

Yet I understand how you would recoil from the idea of writing more
asm, and I can't say I consider it lovely myself.

> GC> >  If you agree with the above, could you please tell me which tests 
> should
> GC> > be used for the checks in (2)?
> GC> 
> GC> If we choose (B), then I don't think that question even arises.
> 
>  Sorry again but how so? The question is completely orthogonal. It would be
> very useful to ensure that nothing got broken after doing any non-trivial
> changes to the code and we still want to compare performance of x87 and SSE
> builds regardless of whether we choose (A) or (B). Let me reproduce the
> "checks in (2)" for reference here:
> 
> GC> > 2. Compare the results and performance of the build using the standard
> GC> >    functions (but still using x87 instructions!) with the current 
> version.

Oh. I was thinking that if we didn't do (1), then we wouldn't need (2)
to test it. Yet of course any change must be tested. Okay, I would say
that if you found no problems with the (all public) unit tests, and I
found none with the (proprietary) regression tests, that would be
strong evidence.

>  I sincerely hope we're not going to choose the (B) route. It might solve
> the problem of failing tests in 64 bit builds, but it will make the code
> even more complex and less standard-conforming and portable than it is now
> which is certainly exactly the opposite of my intentions.

What if we do this:

- Use IEC_559 for all purposes except fenv_validate(): i.e., for
  'round_to_test' as well as any future work that needs anything in
  the scope of IEC_559, such as twiddling the rounding direction.

- Retain the present code for fenv_validate() purposes only: it's
  proven code, so not touching it means introducing no possible
  error; and what it does is sadly outside the scope of IEC_559.
  Conditionalize all of the 'fenv_lmi*' code, and everything that
  uses it, on LMI_MSW and !LMI_SSE in addition to all conditionals
  already used--then, instead of being an integral part of lmi, it
  becomes merely support code for class MswDllPreloader, visible
  only when building a 32-bit lmi for msw with x87.

Would that make both of us happy?
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [lmi] MinGW-w64 anomaly?, (continued)
Prev by Date: Re: [lmi] MinGW-w64 anomaly?
Next by Date: [lmi] Optimized integral power [Was: MinGW-w64 anomaly?]
Previous by thread: Re: [lmi] MinGW-w64 anomaly?
Next by thread: Re: [lmi] MinGW-w64 anomaly?
Index(es):
- Date
- Thread