lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] gcc -flto


From: Vadim Zeitlin
Subject: Re: [lmi] gcc -flto
Date: Sat, 24 Dec 2016 19:37:02 +0100

On Sat, 24 Dec 2016 14:14:31 +0000 Greg Chicares <address@hidden> wrote:

GC> On 2016-12-19 15:10, Vadim Zeitlin wrote:
GC> [...]
GC> >  BTW, another thing that I thought about while discussing this: the -flto
GC> > option also came up and I wrote that it indeed allowed the compiler to
GC> > compute the result at compile-time in this simple example, but that this
GC> > wouldn't work in the real program. However now I'm not so sure: if you're
GC> > using pow() just to build the cache of the powers of 10 for not too many
GC> > exponents, wouldn't gcc be indeed smart enough to precompute all of them 
at
GC> > compile-time? Of course, lmi doesn't use LTO currently, but perhaps it
GC> > could be worth testing turning it on and checking how it affects the
GC> > performance? We can clearly see that it allows for impressive 
optimizations
GC> > in simple examples and while nothing guarantees that it would be also the
GC> > case in real code, it might be worth trying it out.
GC> 
GC> It seemed simple enough to try.
... snip ...
GC> and without any "$coefficiency" parallelism, with '-flto' we get:
GC> 
GC> /opt/lmi/src/lmi[0]$time make system_test 
GC> System test:
GC> make system_test  119.40s user 17.83s system 93% cpu 2:27.30 total
GC> 
GC> while without '-flto' it's:
GC> 
GC> /opt/lmi/src/lmi[0]$time make system_test
GC> System test:
GC> make system_test  120.00s user 17.86s system 93% cpu 2:28.21 total
GC> 
GC> Improvement: (148.21 - 147.30) / 148.21 = six tenths of a percent, which
GC> doesn't justify significantly slower builds and giving up '-ggdb'. Alas:
GC> I really hoped to put those idle cores to good use when linking.

 Yes, thanks for doing this but the results are very underwhelming. Being
optimistic, this could indicate that lmi code is already modularized so
well that there is nothing to be gained by using LTO.

 If you're experimenting with these options, I wonder if it might be useful
to build with -fprofile-generate and then use "make system_test" to
generate the data to be used with -fprofile-use. Could this perhaps give
some at least slightly more exciting results?

 Probably not, but who knows...
VZ


reply via email to

[Prev in Thread] Current Thread [Next in Thread]