m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

performance of m4-1.9a (was: popdef(undefined), __m4_version__)


From: Ralf Wildenhues
Subject: performance of m4-1.9a (was: popdef(undefined), __m4_version__)
Date: Mon, 11 Aug 2008 21:44:17 +0200
User-agent: Mutt/1.5.18 (2008-05-17)

Hello again, and apologies for breaking the threading,

Ralf Wildenhues writes:
> Eric Blake writes:
> > 
> > ~   autoconf 2.62.45-65ff0  +[a]    +[b]    +[a] and [b]
> > m4-1.4.11           20.171  19.858  20.858  19.357
> > patched branch-1.6  21.294  20.154  22.202  20.250
> > argv_ref branch             19.013  17.750  19.219  17.266
> > master branch               29.966  28.948  32.417  27.904
> > 
> > The differences in base compile times are due somewhat to differing
> > compiler options, but it is also a bit depressing that the master branch
> > is 50% slower than 1.6.  Someday I'd like to profile that, and figure out
> > why the discrepancy.
> 
> Position-independent code.  I haven't measured, but would bet on it.

I've done a wee bit of measuring now.  Time for running autoconf in OpenMPI
is 15s with branch-1_4 and branch-1.6, 27s with master, and 23s when master
is configured --disable-shared.

Then, a gprof comparison between 1.6 and master shows that a significant other
part of the slowdown is due to the fact that master has to do an indirect
function call to for every character in next_char.  Can't the module interface
use larger boundaries than character for its interface, like reading a whole
token or so?  I mean, we're talking about roughly 140M function calls here.

Then, I saw that debug stuff like m4_set_current_{file,line} was called veeery
often (more than once per character).  Rebuilding optimized with -DNDEBUG got
master to 18s (with --disable-shared).

The gprof output files seem to indicate that next_char is called much more
often m4__next_token in master than next_char_1 is from next_token in
branch-1.6. However, gcov output does not confirm this, so I guess this is
an artifact from finite sampling density (and the amount that next_char_1
is faster) or inlining artifacts.

All tests done with -O2 -g.

Cheers,
Ralf

Attachment: profile-m4-branch-1.6.gz
Description: branch-1.6 profile

Attachment: profile-m4-master-nonshared.gz
Description: master profile --disable-shared

Attachment: profile-m4-master-nonshared-NDEBUG.gz
Description: master profile NDEBUG --disable-shared

Attachment: input.c.gcov.gz
Description: branch-1.6 input.c gcov

Attachment: input.c.gcov.gz
Description: master input.c gcov


reply via email to

[Prev in Thread] Current Thread [Next in Thread]