[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Benchmarking: gcc-8 beats gcc-10 soundly?
From: |
Greg Chicares |
Subject: |
Re: [lmi] Benchmarking: gcc-8 beats gcc-10 soundly? |
Date: |
Sat, 19 Sep 2020 20:37:59 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 |
On 2020-09-19 15:48, Vadim Zeitlin wrote:
> On Sat, 19 Sep 2020 15:15:48 +0000 Greg Chicares <gchicares@sbcglobal.net>
> wrote:
>
> GC> It looks like gcc-10 gives us slower lmi binaries. Picking
> GC> the third '--selftest' scenario as an index of performance
> GC> (results in microseconds--less is better):
> GC>
> GC> gcc-10 gcc-8 ratio
> GC> ------ ----- -----
> GC> 102659 84947 1.21 32-bit
> GC> 50121 37410 1.34 64-bit
> GC>
> GC> The fourth scenario is even worse:
> GC>
> GC> 33250 20654 1.61 32-bit
> GC> 24616 13009 1.89 64-bit
With -O3, the 64-bit build performs thus on those two scenarios:
naic, ee prem solve : 5.001e-02 s mean; 49710 us least of 20 runs
finra, no solve : 2.483e-02 s mean; 24580 us least of 41 runs
Thus, the -O3 to -O2 speed ratio is
49710 / 50121 = .992
24580 / 24616 = .999
which isn't work the extra build time (82.89 vs 72.76 seconds).
Data below.
> I've already seen performance regressions in newer g++ versions, but I
> don't think I've seen anything nearly like 89% slowdown, so it's indeed
> very astonishing. But I have trouble seeing how could it be not true, if
> you consistently obtain such results. And you're not the only one, see e.g.
> this bug report https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337
I had the thought that perhaps this is a MinGW-w64 snafu, which
would explain why they haven't officially released anything
beyond 8.x yet. Yet the bugzilla report doesn't seem to specify
a platform, while the phoronix link in that report specifies:
| Ubuntu 20.04 with the Linux 5.8 kernel
I guess I'd better try the flags phoronix tested:
| "-O3 -march=native", and "-O3 -march=native -flto"
Right now, lmi looks like the "SciMark" benchmark here:
https://www.phoronix.com/scan.php?page=article&item=gcc-10900k-compiler&num=2
so maybe this will resolve the anomaly.
Am I reading that benchmark right? It seems to say that
-O3 -march=native
greatly outperforms
-O3 -march=native -flto
Okay, I am reading it right:
| For the very basic SciMark 2 benchmarks the LTO build hurt the
| performance compared to "-O3 -march=native" but this was another
| test where the -O2 performance is much slower on GCC 10
so maybe LTO is not yet ready for prime time...so I won't even
ask about its "WHOPR" mode, which seems to be an allusion to a
"two-fisted burger" at some US fast-food restaurant.
> Unfortunately there is no clear conclusion there, as gcc developers can't
> reproduce the problem.
It seems really strange that they would say that. I guess
phoronix is just one guy, but he seems to be a serious person
with a serious audience.
> They do say that -O2 has been changed in 10.x, so it
> could be worth using -O3 with it and see if it helps. Should I/we do it or
> will you test this yourself?
We seem to have a test case that should be reproducible,
though it's far from ideally minimal. Here's what I did:
/opt/lmi/src/lmi[0]$grep O2 workhorse.make
optimization_flag := -O2 -fno-omit-frame-pointer
/opt/lmi/src/lmi[0]$sed -i workhorse.make -e's/O2/O3/'
/opt/lmi/src/lmi[0]$grep O2 workhorse.make
/opt/lmi/src/lmi[1]$grep O3 workhorse.make
optimization_flag := -O3 -fno-omit-frame-pointer
/opt/lmi/src/lmi[0]$make clean
rm --force --recursive /opt/lmi/gcc_x86_64-w64-mingw32/build/ship
/opt/lmi/src/lmi[0]$time make $coefficiency --output-sync=recurse install
check_physical_closure 2>&1 | tee eraseme | less -SN
make $coefficiency --output-sync=recurse install check_physical_closure 2>&1
1721.58s user 80.19s system 2173% cpu 1:22.89 total
tee eraseme 0.00s user 0.01s system 0% cpu 1:22.89 total
less -SN 0.03s user 0.02s system 0% cpu 1:32.48 total
/opt/lmi/src/lmi[0]$wine /opt/lmi/bin/lmi_cli_shared.exe --accept
--data_path=/opt/lmi/data --selftest
Test speed:
naic, no solve : 3.704e-02 s mean; 36788 us least of 27 runs
naic, specamt solve : 5.292e-02 s mean; 52692 us least of 19 runs
naic, ee prem solve : 5.001e-02 s mean; 49710 us least of 20 runs
finra, no solve : 2.483e-02 s mean; 24580 us least of 41 runs
finra, specamt solve: 3.943e-02 s mean; 39101 us least of 26 runs
finra, ee prem solve: 3.769e-02 s mean; 37410 us least of 27 runs
/opt/lmi/src/lmi[0]$git checkout -- workhorse.make
/opt/lmi/src/lmi[0]$make clean
rm --force --recursive /opt/lmi/gcc_x86_64-w64-mingw32/build/ship
/opt/lmi/src/lmi[0]$time make $coefficiency --output-sync=recurse install
check_physical_closure 2>&1 | tee eraseme | less -SN
make $coefficiency --output-sync=recurse install check_physical_closure 2>&1
1549.20s user 77.85s system 2236% cpu 1:12.76 total
tee eraseme 0.00s user 0.01s system 0% cpu 1:12.76 total
less -SN 0.02s user 0.01s system 0% cpu 1:14.99 total
Re: [lmi] Benchmarking: gcc-8 beats gcc-10 soundly?, Greg Chicares, 2020/09/20