[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Benchmarking: gcc-8 beats gcc-10 soundly?
From: |
Greg Chicares |
Subject: |
Re: [lmi] Benchmarking: gcc-8 beats gcc-10 soundly? |
Date: |
Sun, 20 Sep 2020 09:59:31 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 |
On 2020-09-19 15:15, Greg Chicares wrote:
> It looks like gcc-10 gives us slower lmi binaries. Picking
> the third '--selftest' scenario as an index of performance
> (results in microseconds--less is better):
>
> gcc-10 gcc-8 ratio
> ------ ----- -----
> 102659 84947 1.21 32-bit
> 50121 37410 1.34 64-bit
>
> The fourth scenario is even worse:
>
> 33250 20654 1.61 32-bit
> 24616 13009 1.89 64-bit
I've discovered a way to make one of those scenarios
much better and the other simultaneously much worse....
> Vadim--Does this seem so astonishing that it can't be
> true? These results are all observed on the same machine.
> The only real difference I can think of is that one is a
> new debian bullseye within centos within debian buster chroot
> while the other is an
> old debian bullseye within debian buster chroot [my old gcc-8]
> both of which identify themselves as:
> Debian GNU/Linux bullseye/sid
> But I've never noticed any penalty before for nested chroots.
This called for investigation. Below, I build lmi in the nested
chroot and move it to the non-nested one, and it's both better
and worse as mentioned above. Then I rebuild lmi without
'-O3 -march=native' and repeat, and the same anomaly persists.
So the cause isn't that '-march' detected a different
architecture in the nested chroot, because removing '-march'
doesn't make the problem go away.
And I used 'make fardel', which packages lmi, all libraries,
and the compiler runtime files, so it can't be a mismatch
of those things. For years we've used that makefile target
to prepare distributions, and if it didn't do those things
correctly, surely we'd know by now.
Now, what could possibly explain this? It seems that there's
a big difference between these chroots. The nested-ness is
not a plausible explanation: 'schroot' is a pretty thin
wrapper around chroot(2), in which any serious regression
would have been noticed quickly. Besides, it's the same
version of schroot:
/home/greg[0]$schroot --version
schroot (Debian sbuild) 1.6.10 (04 May 2014)
/home/greg[0]$schroot --chroot=centos7lmi
/home/greg[0]$schroot --version
schroot (Debian sbuild) 1.6.10 (04 May 2014)
In both chroots, `cat /etc/os-release` returns
PRETTY_NAME="Debian GNU/Linux bullseye/sid"
although one is more up to date. But any regression in
the debian 'testing' base system would have been noticed.
They're both on the same drive. One is a 'directory'
chroot, while the other is a 'plain' one, but the only
difference that makes is that things like /dev/pts and
/proc are mounted automatically in the former, but
manually in the latter.
It can hardly be `zsh --version`, which differs slightly:
zsh 5.7.1 (x86_64-debian-linux-gnu)
zsh 5.8 (x86_64-debian-linux-gnu)
And running a program is just a call to exec(3), which
doesn't depend on the shell, and in which any regression
would have been noticed in 'testing' by now.
Except...'wine'. My older non-nested chroot still has
wine-4.0.3 (Debian 4.0.3-1)
while the up-to-date one has
wine-5.0 (Debian 5.0-4)
Running 'winecfg' in both chroots shows that they use the same
settings, except for DPI (192 vs 96) and "Windows Version"
("Windows XP" vs "Windows 7"). But changing both those settings
to XP and 192 has no effect on the observed anomaly.
So is it time to quote "The Adventure of the Beryl Coronet":
| It is an old maxim of mine that when you have excluded
| the impossible, whatever remains, however improbable,
| must be the truth.
even though this seems like an outlandish 'wine' regression?
All the gory details:
* NESTED: debian bullseye chroot within centos chroot on debian buster host
/opt/lmi/src/lmi[0]$make fardel
Generating product files.
All product files written.
Created 'lmi-20200920T0840Z' archive in '/opt/lmi/fardels'.
/opt/lmi/src/lmi[130]$wine /opt/lmi/bin/lmi_cli_shared.exe --accept
--data_path=/opt/lmi/data --selftest
Test speed:
naic, no solve : 6.723e-02 s mean; 66980 us least of 15 runs
naic, specamt solve : 1.112e-01 s mean; 110724 us least of 9 runs
naic, ee prem solve : 1.028e-01 s mean; 102521 us least of 10 runs
finra, no solve : 3.332e-02 s mean; 33180 us least of 31 runs
finra, specamt solve: 7.371e-02 s mean; 73324 us least of 14 runs
finra, ee prem solve: 6.904e-02 s mean; 68846 us least of 15 runs
* FLAGS: that was built with '-O3 -march=native'
/opt/lmi/src/lmi[0]$git --no-pager diff workhorse.make
diff --git a/workhorse.make b/workhorse.make
index 3448bc8c..ddd421ec 100644
--- a/workhorse.make
+++ b/workhorse.make
@@ -687,7 +687,7 @@ else ifeq (safestdlib,$(findstring
safestdlib,$(build_type)))
optimization_flag := -O0 -fno-omit-frame-pointer
libstdcxx_warning_macros := $(every_libstdcxx_warning_macro)
else
- optimization_flag := -O2 -fno-omit-frame-pointer
+ optimization_flag := -O3 -march=native -fno-omit-frame-pointer
endif
* TRANSPLANT: command run on host
$mv
/srv/chroot/centos7lmi/srv/chroot/lmi_bullseye_3/opt/lmi/fardels/lmi-20200920T0840Z
/srv/chroot/bullseye0/tmp
* NOT NESTED: debian bullseye chroot on debian buster host
/tmp/web-cvs/lmi[0]$pushd /tmp/lmi-20200920T0840Z
/tmp/lmi-20200920T0840Z /tmp/web-cvs/lmi /opt/lmi/fardels/lmi-20200820T1335Z
/tmp/lmi-20200920T0840Z[0]$wine /opt/lmi/bin/lmi_cli_shared.exe --accept
--data_path=/opt/lmi/data --selftest
Test speed:
naic, no solve : 1.047e-01 s mean; 103639 us least of 10 runs
naic, specamt solve : 1.850e-01 s mean; 183295 us least of 6 runs
naic, ee prem solve : 1.675e-01 s mean; 166175 us least of 6 runs
finra, no solve : 2.705e-02 s mean; 26650 us least of 37 runs
finra, specamt solve: 9.838e-02 s mean; 97834 us least of 11 runs
finra, ee prem solve: 9.079e-02 s mean; 89717 us least of 12 runs
* WEIRD: transplanted third scenario is much worse (166175 vs 102521),
but transplanted second scenario is much better (26650 vs 33180).
*** REPEAT EVERYTHING WITHOUT UNUSUAL FLAGS
* NESTED: debian bullseye chroot within centos chroot on debian buster host
/opt/lmi/src/lmi[0]$git checkout -- workhorse.make
/opt/lmi/src/lmi[0]$git --no-pager diff
/opt/lmi/src/lmi[0]$env |grep LMI_
LMI_COMPILER=gcc
LMI_TRIPLET=i686-w64-mingw32
/opt/lmi/src/lmi[0]$make clean
rm --force --recursive /opt/lmi/gcc_i686-w64-mingw32/build/ship
/opt/lmi/src/lmi[0]$time make $coefficiency --output-sync=recurse install
check_physical_closure 2>&1 | tee eraseme | less -SN
make $coefficiency --output-sync=recurse install check_physical_closure 2>&1
1609.64s user 77.79s system 1566% cpu 1:47.73 total
tee eraseme 0.00s user 0.01s system 0% cpu 1:47.78 total
less -SN 0.04s user 0.00s system 0% cpu 1:53.84 total
/opt/lmi/src/lmi[0]$file /opt/lmi/bin/lmi_cli_shared.exe
/opt/lmi/bin/lmi_cli_shared.exe: PE32 executable (console) Intel 80386, for MS
Windows
/opt/lmi/src/lmi[0]$ls -l /opt/lmi/bin/lmi_cli_shared.exe
-rwxrwxr-x 1 greg lmi 1816244 Sep 20 08:56 /opt/lmi/bin/lmi_cli_shared.exe
/opt/lmi/src/lmi[0]$date
Sun Sep 20 08:57:46 UTC 2020
/opt/lmi/src/lmi[0]$wine /opt/lmi/bin/lmi_cli_shared.exe --accept
--data_path=/opt/lmi/data --selftest
Test speed:
naic, no solve : 6.714e-02 s mean; 66724 us least of 15 runs
naic, specamt solve : 1.106e-01 s mean; 110242 us least of 10 runs
naic, ee prem solve : 1.025e-01 s mean; 102042 us least of 10 runs
finra, no solve : 3.351e-02 s mean; 33213 us least of 30 runs
finra, specamt solve: 7.334e-02 s mean; 72956 us least of 14 runs
finra, ee prem solve: 6.888e-02 s mean; 68256 us least of 15 runs
/opt/lmi/src/lmi[0]$make fardel
Generating product files.
All product files written.
Created 'lmi-20200920T0858Z' archive in '/opt/lmi/fardels'.
/opt/lmi/src/lmi[0]$wine --version
wine-5.0 (Debian 5.0-4)
* TRANSPLANT: command run on host
$mv
/srv/chroot/centos7lmi/srv/chroot/lmi_bullseye_3/opt/lmi/fardels/lmi-20200920T0858Z
/srv/chroot/bullseye0/tmp
* NOT NESTED: debian bullseye chroot on debian buster host
/tmp/lmi-20200920T0840Z[0]$cd ../lmi-20200920T0858Z
/tmp/lmi-20200920T0858Z[0]$wine /opt/lmi/bin/lmi_cli_shared.exe --accept
--data_path=/opt/lmi/data --selftest
Test speed:
naic, no solve : 1.044e-01 s mean; 103404 us least of 10 runs
naic, specamt solve : 1.837e-01 s mean; 182003 us least of 6 runs
naic, ee prem solve : 1.669e-01 s mean; 166075 us least of 6 runs
finra, no solve : 2.708e-02 s mean; 26698 us least of 37 runs
finra, specamt solve: 9.852e-02 s mean; 97226 us least of 11 runs
finra, ee prem solve: 9.022e-02 s mean; 89106 us least of 12 runs
/tmp/lmi-20200920T0858Z[128]$wine --version
wine-4.0.3 (Debian 4.0.3-1)
Re: [lmi] Benchmarking: gcc-8 beats gcc-10 soundly?,
Greg Chicares <=