lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] gcc -flto


From: Greg Chicares
Subject: Re: [lmi] gcc -flto
Date: Sat, 24 Dec 2016 22:41:02 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.4.0

On 2016-12-24 18:37, Vadim Zeitlin wrote:
> On Sat, 24 Dec 2016 14:14:31 +0000 Greg Chicares <address@hidden> wrote:
> 
> GC> On 2016-12-19 15:10, Vadim Zeitlin wrote:
> GC> [...]
> GC> >  BTW, another thing that I thought about while discussing this: the 
> -flto
> GC> > option also came up and I wrote that it indeed allowed the compiler to
> GC> > compute the result at compile-time in this simple example, but that this
> GC> > wouldn't work in the real program. However now I'm not so sure: if 
> you're
> GC> > using pow() just to build the cache of the powers of 10 for not too many
> GC> > exponents, wouldn't gcc be indeed smart enough to precompute all of 
> them at
> GC> > compile-time? Of course, lmi doesn't use LTO currently, but perhaps it
> GC> > could be worth testing turning it on and checking how it affects the
> GC> > performance? We can clearly see that it allows for impressive 
> optimizations
> GC> > in simple examples and while nothing guarantees that it would be also 
> the
> GC> > case in real code, it might be worth trying it out.
> GC> 
> GC> It seemed simple enough to try.
> ... snip ...
> GC> and without any "$coefficiency" parallelism, with '-flto' we get:
> GC> 
> GC> /opt/lmi/src/lmi[0]$time make system_test 
> GC> System test:
> GC> make system_test  119.40s user 17.83s system 93% cpu 2:27.30 total
> GC> 
> GC> while without '-flto' it's:
> GC> 
> GC> /opt/lmi/src/lmi[0]$time make system_test
> GC> System test:
> GC> make system_test  120.00s user 17.86s system 93% cpu 2:28.21 total
> GC> 
> GC> Improvement: (148.21 - 147.30) / 148.21 = six tenths of a percent, which
> GC> doesn't justify significantly slower builds and giving up '-ggdb'. Alas:
> GC> I really hoped to put those idle cores to good use when linking.
> 
>  Yes, thanks for doing this but the results are very underwhelming. Being
> optimistic, this could indicate that lmi code is already modularized so
> well that there is nothing to be gained by using LTO.
> 
>  If you're experimenting with these options, I wonder if it might be useful
> to build with -fprofile-generate and then use "make system_test" to
> generate the data to be used with -fprofile-use. Could this perhaps give
> some at least slightly more exciting results?
> 
>  Probably not, but who knows...

I haven't yet reverted the experimental makefile changes from the earlier
in this thread, so it's easy to try this.

/opt/lmi/src/lmi[0]$make clean
rm --force --recursive /opt/lmi/src/lmi/../build/lmi/Linux/gcc/ship
/opt/lmi/src/lmi[0]$make debug_flag= gprof_flag="-fprofile-generate" 
$coefficiency install check_physical_closure >../log 2>&1 
/opt/lmi/src/lmi[0]$make system_test 
System test:

Now I manually remove everything in the build directly except the
116 '.gcda' files that total 2.7 MB, and...

/opt/lmi/src/lmi[0]$time make gprof_flag="-fprofile-use" $coefficiency install 
check_physical_closure >../log 2>&1 
make gprof_flag="-fprofile-use" $coefficiency install check_physical_closure   
1165.13s user 60.12s system 2309% cpu 53.050 total
/opt/lmi/src/lmi[0]$time make system_test
System test:
make system_test  107.10s user 16.91s system 92% cpu 2:14.13 total

Compared to the result above without any novel optimization:
> GC> make system_test  120.00s user 17.86s system 93% cpu 2:28.21 total

(148.21 - 134.13) / 148.21 = 9.5% faster

That seems worthwhile. I'll try to work out a way to use this for
regular distribution.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]