performance of m4-1.9a (was: popdef(undefined), __m4_version_

m4-patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

performance of m4-1.9a (was: popdef(undefined), __m4_version__)

From:	Ralf Wildenhues
Subject:	performance of m4-1.9a (was: popdef(undefined), __m4_version__)
Date:	Mon, 11 Aug 2008 21:44:17 +0200
User-agent:	Mutt/1.5.18 (2008-05-17)

Hello again, and apologies for breaking the threading,

Ralf Wildenhues writes:
> Eric Blake writes:
> > 
> > ~   autoconf 2.62.45-65ff0  +[a]    +[b]    +[a] and [b]
> > m4-1.4.11           20.171  19.858  20.858  19.357
> > patched branch-1.6  21.294  20.154  22.202  20.250
> > argv_ref branch             19.013  17.750  19.219  17.266
> > master branch               29.966  28.948  32.417  27.904
> > 
> > The differences in base compile times are due somewhat to differing
> > compiler options, but it is also a bit depressing that the master branch
> > is 50% slower than 1.6.  Someday I'd like to profile that, and figure out
> > why the discrepancy.
> 
> Position-independent code.  I haven't measured, but would bet on it.

I've done a wee bit of measuring now.  Time for running autoconf in OpenMPI
is 15s with branch-1_4 and branch-1.6, 27s with master, and 23s when master
is configured --disable-shared.

Then, a gprof comparison between 1.6 and master shows that a significant other
part of the slowdown is due to the fact that master has to do an indirect
function call to for every character in next_char.  Can't the module interface
use larger boundaries than character for its interface, like reading a whole
token or so?  I mean, we're talking about roughly 140M function calls here.

Then, I saw that debug stuff like m4_set_current_{file,line} was called veeery
often (more than once per character).  Rebuilding optimized with -DNDEBUG got
master to 18s (with --disable-shared).

The gprof output files seem to indicate that next_char is called much more
often m4__next_token in master than next_char_1 is from next_token in
branch-1.6. However, gcov output does not confirm this, so I guess this is
an artifact from finite sampling density (and the amount that next_char_1
is faster) or inlining artifacts.

All tests done with -O2 -g.

Cheers,
Ralf

profile-m4-branch-1.6.gz
Description: branch-1.6 profile

profile-m4-master-nonshared.gz
Description: master profile --disable-shared

profile-m4-master-nonshared-NDEBUG.gz
Description: master profile NDEBUG --disable-shared

input.c.gcov.gz
Description: branch-1.6 input.c gcov

input.c.gcov.gz
Description: master input.c gcov

[Prev in Thread]

Current Thread

[Next in Thread]

performance of m4-1.9a (was: popdef(undefined), __m4_version__), Ralf Wildenhues <=
- Re: performance of m4-1.9a (was: popdef(undefined), __m4_version__), Eric Blake, 2008/08/11

Prev by Date: Re: [m4-1.4.11] build feedback
Next by Date: Re: performance of m4-1.9a (was: popdef(undefined), __m4_version__)
Previous by thread: Re: [m4-1.4.11] build feedback
Next by thread: Re: performance of m4-1.9a (was: popdef(undefined), __m4_version__)
Index(es):
- Date
- Thread