[lmi] gcc -fprofile-generate and -fprofile-use [Was: gcc -flto]

lmi
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lmi] gcc -fprofile-generate and -fprofile-use [Was: gcc -flto]

From:	Greg Chicares
Subject:	[lmi] gcc -fprofile-generate and -fprofile-use [Was: gcc -flto]
Date:	Tue, 27 Dec 2016 09:34:14 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.4.0
On 2016-12-24 22:41, Greg Chicares wrote:
> On 2016-12-24 18:37, Vadim Zeitlin wrote:
[...]
>>  If you're experimenting with these options, I wonder if it might be useful
>> to build with -fprofile-generate and then use "make system_test" to
>> generate the data to be used with -fprofile-use. Could this perhaps give
>> some at least slightly more exciting results?
>> 
>>  Probably not, but who knows...
> 
> I haven't yet reverted the experimental makefile changes from the earlier
> in this thread, so it's easy to try this.

That may have been unwise, because those experimental changes removed code
that validates regression tests. Reverting those changes and repeating the
same steps, I see some discrepancies [proprietary information redacted]:

REDACTED.000000000.test   Summary: max abs diff: 1.20053e-010 max rel err:  
1.09009e-014
REDACTED.000000001.test   Summary: max abs diff: 2.29193e-010 max rel err:  
1.08945e-014
REDACTED.000000000.test   Summary: max abs diff: 9.31323e-010 max rel err:  
2.47703e-011
REDACTED.000000001.test   Summary: max abs diff: 9.31323e-010 max rel err:  
2.47703e-011
REDACTED.000000000.test   Summary: max abs diff: 1.16415e-010 max rel err:  
9.55125e-014
REDACTED.000000001.test   Summary: max abs diff: 1.16415e-010 max rel err:  
9.55125e-014
REDACTED.000000000.test   Summary: max abs diff: 1.18234e-010 max rel err:  
2.18925e-014
REDACTED.000000001.test   Summary: max abs diff: 3.41061e-011 max rel err:  
1.35197e-014
REDACTED.000000002.test   Summary: max abs diff: 1.18234e-010 max rel err:  
2.18925e-014
Files /opt/lmi/test/md5sums-20161226T2028Z and /opt/lmi/touchstone/md5sums 
differ
*** System test failed ***
  1507 system-test files compared
  1421 system-test files match
  86 system-test files differ
  0 system-test files missing
...system test completed.

The line-by-line error analysis filters out any file whose greatest
relative error is less that 1e-14 because of our heuristic criterion
that an error of one ulp in an early year can easily become that large
after many years of compound interest. Examining the two files with a
discrepancy greater than 1e-13 (ten times as high), I see that their
highest relative errors occur only with 'ExperienceReserve', e.g.:
  12 3456789A            count of matching significant digits
  37.5983887715874516289 new value
  37.5983887706561290543 old value
That field is a difference between two quantities that happen to become
nearly equal, so this is probably just catastrophic cancellation.

This leads me to suspect that '-fprofile-generate' and '-fprofile-use'
perform optimizations that change values in a way that we would
generally accept if caused by a source-code modification. However,
these discrepancies are caused by optimizations that AFAICS are not
guaranteed to be identical on different machines. A fundamental
invariant that we have always enforced rigorously is that Kim and I
must observe absolutely identical system-test results; those results
constitute about 120 MB of data, which is inconvenient to share, so
we maintain an 80 KB file of each test's md5sums, which have always
matched perfectly for two decades. I don't think it would be a good
idea to give up that assurance for a ten-percent speedup.

There are about 3 MB of '.gcda' files. In theory, we could share them,
and then hope to have the same optimizations on different machines.
However, that's cumbersome, and if, say, Kim creates them in a cygwin
environment, I'm not sure I could just drop them into my debian system:
e.g., if they "know" their own path somehow, then their location on
cygwin isn't likely to match what 'wine' expects here.

We might be able to prevent optimization from changing regression-test
results, perhaps by using SSE, or more certainly by using a currency
class instead of floating point. If we can accomplish that someday,
then we can reconsider these optimizations.

Meanwhile, I question something else. When I repeat these steps now...

[spoiler: resolved toward the bottom]

> /opt/lmi/src/lmi[0]$make clean
> rm --force --recursive /opt/lmi/src/lmi/../build/lmi/Linux/gcc/ship
> /opt/lmi/src/lmi[0]$make debug_flag= gprof_flag="-fprofile-generate" 
> $coefficiency install check_physical_closure >../log 2>&1 
> /opt/lmi/src/lmi[0]$make system_test 
> System test:

I get exactly the same results as reported earlier: even the numerical
discrepancies reported are identical. Yet I question the last command:
shouldn't it include the same flags? I.e., I'd suppose it should be:

  make debug_flag= gprof_flag="-fprofile-generate" system_test
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

However, if I run that "improved" command, I run into problems. Here
are consecutive lines copied and pasted in a single operation from a
terminal:

/opt/lmi/src/lmi[0]$make clean
rm --force --recursive /opt/lmi/src/lmi/../build/lmi/Linux/gcc/ship
/opt/lmi/src/lmi[0]$make debug_flag= gprof_flag="-fprofile-generate" 
$coefficiency install check_physical_closure >../log 2>&1
/opt/lmi/src/lmi[0]$make debug_flag= gprof_flag="-fprofile-generate" 
$coefficiency system_test 2>&1 |less -S          
/opt/lmi/src/lmi[0]$make debug_flag= gprof_flag="-fprofile-generate" 
system_test 2>&1 |less -S 
/opt/lmi/src/lmi[0]$make debug_flag= gprof_flag="-fprofile-generate" 
system_test >../log 2>&1

Each of the 'system_test' commands gave about 1500 lines of diagnostics,
of which only these three are unique:

profiling:/opt/lmi/src/build/lmi/Linux/gcc/ship/alert_cli.gcda:\
Data file mismatch - some data files may have been concurrently updated without 
locking support

profiling:/opt/lmi/src/build/lmi/Linux/gcc/ship/calendar_date.gcda:\
Data file mismatch - some data files may have been concurrently updated without 
locking support

profiling:/opt/lmi/src/build/lmi/Linux/gcc/ship/product_data.gcda:\
Merge mismatch for function 116

I pasted the exact text of those error messages into a search engine:
  gcda "Merge mismatch for function"
but found mostly error logs without discussion, and copies of gcc
source files. Even this:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65313
doesn't seem to help.

Did I follow the advice to use '-fprofile-generate' for both compiling
and linking? Yes, as these excerpts from the build log show:

i686-w64-mingw32-g++ -MMD -MP -MT CgiEnvironment.o ... -fprofile-generate ...
i686-w64-mingw32-g++ -o liblmi.dll -shared ... -fprofile-generate
i686-w64-mingw32-g++ -o lmi_wx_shared.exe ... -fprofile-generate

Did the '.gcda' files get written in some unexpected place because I
didn't explicitly override their location with '-fprofile-dir'? No:
I find them only in my build directory; and after 'make clean' above,
`locate calendar_date.gcda` returns 1 (no file found).

As a last resort, I read the diagnostic carefully:
  data files may have been concurrently updated without locking support
and then looked back and realized that my first 'system_test' run used
parallelism:
  /opt/lmi/src/lmi[0]$echo $coefficiency 
  --jobs=32
It seems plausible that parallelism caused overlapping updates to the
'.gcda' files, and that once they were corrupted, they remained corrupt.
Indeed, this set of commands works without producing those diagnostics:

/opt/lmi/src/lmi[0]$make clean
rm --force --recursive /opt/lmi/src/lmi/../build/lmi/Linux/gcc/ship
/opt/lmi/src/lmi[0]$make debug_flag= gprof_flag="-fprofile-generate" 
$coefficiency install check_physical_closure >../log 2>&1
/opt/lmi/src/lmi[0]$make debug_flag= gprof_flag="-fprofile-generate" 
system_test >../log 2>&1

So the problem with making 'system_test' wasn't '-fprofile-generate';
it was parallelism. Maybe '-fprofile-correction' would have fixed the
problem. Maybe '-fprofile-update=atomic' would have prevented it and
let us run regression tests in parallel; I'm not going to look into
that now.
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [lmi] MinGW-w64 anomaly?, (continued)
Prev by Date: Re: [lmi] MinGW-w64 anomaly?
Next by Date: [lmi] gcc -fprofile-generate and -fprofile-use [Was: gcc -flto]
Previous by thread: Re: [lmi] gcc -flto
Next by thread: [lmi] gcc -fprofile-generate and -fprofile-use [Was: gcc -flto]
Index(es):
- Date
- Thread