Re: Fwd: curious...

libtool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: curious...

From:	Kyle Sallee
Subject:	Re: Fwd: curious...
Date:	Sat, 21 Oct 2006 10:52:17 -0700

Okay, the 26M trace was probably typical of almost any
of the libtool invocations during the long parts of compiling koffice.
I simply took one of the invocations of libtool I found in the compile log
and then ran it with the same shell as shown, /bin/sh
but immediately followed it with a -x and at the very end a
&>/tmp/read.me
I was surprised by the very large trace.

The huge amount of savings of the bzip2 compressed trace file
suggests repetitiveness that can be eliminated.
The 123K bzip2 compressed log is available for download at
http://ppr.aakin.net/files/.misc/libtool.trace.26m.txt.bz2

What I commented about first was the overall snowballing effect
as temp_deplibs is assigned repeatedly inside of "for; do" loops.
Secondly, I commented on the case statements which do matching to
see if a word has occured more than once in a string are very CPU
intensive in bash.
Thirdly, I commented on tests that are done repeatedly inside of loops
when the outcome of the test would be the same every time.
Lastly, I knowticed 2432 lines containing /bin/sed
That alone suggests that attempts to avoid forking on
complex libtool tasks actually yields huge amounts of forking.

On some platforms, such as cygwin,
where you can only fork about 100 times
per second, it is true in ALL instances.
Sticking to shell builtins IS
faster, unless the shell has buggy algorithms.


Only 100 times a second?
Then, the execution of just /bin/sed during that
run of libtool alone would take 24 seconds!

Here is my perspective.
The use of the bash code to avoid forking to external utilities
is definitely, optimial.  It is definitely optimal when handling
the most simple and least complex libtool requests.
When doing the fastest case scenarios libtool operates as fast as possible.
I agree that this is good.

However, because libtool is optimized for top performance on the
least complex cases it also has horrific performance on the more
complex and lengthy requests.  The disparity is horrible.
While libtool execution should take less than a second I am seeing
several seconds of near full CPU usage by libtool, which is rather
high considering the large amount of forks it makes to external
programs like sed.

The way most of the libtool algorithims are designed is to permit
between 0 and an infinite amount of forks during the execution of a loop.
The no fork conditions execute very quickly also due in part
to the low amount of iterrations of the loops
and therefore a very small amount of snowballing of variables.
But when the iterrations are many then the loops are slow
and the snowballing is heavy and the forks become many.

So the current optimization is for best case scenario.
What that means is that between 0 and infinite amount of forks
happen and between a little and a lot of variable snowballing happens.
What I suggest is that we take those points of code and nail them down.
Write them to use constantly between 3 and 5 forks by using invocations
of shell utilities such as sort, uniq, grep, sed.
Completely eliminate whittling and snowballing of variables.
That way there will be no 2400+ some forks to sed.
Yes, optimizing for worst case scenario would make the optimial
small task of libtool execution almost immeasurably slower.
However, it would make the complex and lengthy libtool task,
such as compiling koffice, nearly the same speed as if libtool
were operating on the simpliest case scenario.

Basically, by optimizing for worst case scenario in a manner
that explicitly uses a specific and unvaring amount of forks
will gain far more CPU savings for length tasks
than it will cost in additional CPU usage for simple tasks.

Building of specific variable lists as it is done now can use
a lot of forks to sed...
Just look at function func_ ...oh the one that can strip off the leading
-R from a variable... sorry I forgot to add the name of that one to the notes.
Everytime it executes it forks sed.
Also it executes inside of a loop.
If nothing else the variable list could be created first with
each member having the leading -R and then a single invocation
of sed could be used to strip the -R off of each item in the list all
in a single invocation of sed.

But more to the point is that lists of specific variables can be created
inside of a subshell quickly with a few and specific amount of forks...
something="` echo "$raw" | grep ... | sed ... `"
Yes in that instance you do use 3 forks.
But no matter how long $raw is you only
use 3 forks and there is no variable snowballing.
Or it could be done using just two forks and by piping
the output to a file using >

I understand how many may be against writing explicit code that
uses forks where slow executing bash code can be used instead.
If only dealing with 2 to 5 iterrations in a loop
then the bash code might execute faster than 2 forks.
In theory that is good.
But in reality the trace log shows something different.
It shows alone more than 2000 invocations of sed
much of which probably happens inside of loops.
It demonstrates that while the intention of libtool authors
has been to minimize forking during the execution of
the libtool script that they have to the contrary created a script
that is super forky when used for compiling sources like koffice.

All I am saying is that if optimized for worst case scenario then
the best case scenario runs what 0.1 seconds slower and the
worst case scenario runs minutes faster?
If we are willing to conceede a little we stand to gain a lot.
libtool can be made to execute 2 perhaps 3 magnitudes
faster when compiling object files for koffice.
But to do so, variable snowballing must be eliminated and
the construction of variable lists must be written to use a
dedicated and decided amount of forks instead of algorithims
that permit between 0 and an infinite amount of forks.

Even on platforms that permit merely 100 forks per second
I would rather definitely spend the cost for between 2 and
5 forks per construction of variable lists than take the
gamble on between 0 and 100 forks.

Here is another way of looking at it.
Currently, when libtool executions last less than a second
nobody knowtices it, nobody thinks
"wow I wonder why libtool is taking so long."
Even if the fastest execution cases of libtool
became a little slower still nobody would knowtice.
But when I am watching a compilation and seeing
libtool nearly peg the CPU for seconds at a time
I am thinking to myself...
"Hmmm, I bet libtool could be made to run a lot faster."
And consequently, here I am on the list asking if the libtool
authors have tried this and that and considered this and that.
If those lengthy running invocations of libtool can be eliminated
entirely then I would not be here and nobody would ever have
cause to think that libtool is slow.

If the changes I propose are made then libtool will not run
faster in all cases, nor will it run faster on all platforms.
But the huge disparity in resource usage and speed
between the very simplex libtool invocations and
very complex libtool invocations would be eliminated.
libtool would no longer run very fast in some cases
and yet very slow in other cases.
Overall, I would expect that most people who knowtice
the change would state, "Wow libtool works so much faster now."
That is because they would not knowtice the insignificant slowdown
during the simple invocations of libtool.
However they would definitely knowtice the speed gained
when libtool is being run during compilation of complex sources such as koffice.
libtool improvements could reduce the compile time
for koffice to 1/2 perhaps 1/3 of what it is now.

If you to abandon simplicity of having just one way to do it
then a more complex approach can be used where the amount
of parameters libtool is invoked with is multiplied with some value
specific to the forking performance of the platform and then compared
to a value that specifies whether or not the invocation of libtool should
use the historic slow bash encoded algorithims that attempt to avoid
forking, or the newer algorithims that create the variable lists but do
so with low static and predictible cost in forking.

However, I do not expect it would be necessary to have it both ways.
I do not know how many major variable lists are constructed.
But if the construction of each one cost on average a static of 3 forks
then would that not be an acceptible cost in performance even on
platforms that can only fork 100 times per second?
If that is not enough convincing think of running alone the 2000+
forks of /bin/sed on cygwin now just for a single invocation libtool.
I would rather pay the average cost of 3 definite forks than gamble
on between 0 and 100.

However, the proposed change goes against the tradition.
The tradition is to use bash code, no matter how slow,
if it it can be written in a way to accomplish the task
with the possibility of not forking to external utilities.
However, those tasks could also be written with 2 to 3
intentional forks and eliminate the potential of
variable snowballing, slow case statements to find duplicates,
and a series of forks to programs like /bin/sed inside of loops.
If it is perceived that proposed changes compared to staying
with tradition would produce an overall beneficial change
then the contributions would be accepted.
However, I would not expect that just yet.
If I submitted a patch that knowticibly built variable lists
using back ticks and pipes and obvious intentional forks
then I would expect such patch to be immediately thrown
out without ever testing to see how much faster it is when
used for compiling koffice, for example.
Someone would evaluate the patch and say...
"I see 3 forks here.  This is slower than the bash method.  Patch rejected."
But the obvious assessment of inefficiency would be incorrect
when running libtool for the more complex tasks that are required
when compiling koffice.

That is why I have opened discussion about this
if it is not already currently ongoing.
Obviously the libtool shell script has a certain
feel to it, a certain style, a certain obviousness
about how the authors believe tasks should be accomplished.
I am proposing an almost a 100% opposite solution.
Therefore, I can not simply submit such a contribution and
expect it to be accepted without sufficient debate and explanation
of why 3 definite forks is better than an unknown possibility ranging
between 0 and an infinite amount.

If we are not open to the possibility of libtool using more temporary files
and obvious use of pipes and forks to external utilities in places before
where complex bash "for do case esac done" constructs were used before
then libtool is not going to be able to achive a fast and equivilent performance
on both simple and complex invocations.

If the use of forks are evil
then I am proposing the lesser of two evils.
The way it is now, with some 2000+ invocations of /bin/sed
I am supposing that libtool source can be considerable evil
when used invoked during koffice compilation.
If suggested changes are made then the minimal amount of
forking libtool does will increase, but the average amount
of forking will decrease, because it will not have forking
executing inside of for loops. Also the amount of forking
that the most complex invocations of libtool does compared
to the least complex invocations of libtool would be probably
be between 5% and 10%  Therefore, the performance of libtool
would be about the same in all circumstances.

Please consider it.
The changes to libtool script are so trivial that I need not do them.
They would make the libtool shell script smaller and less complex.
Instead of requiring several lines maybe close to 20
to build each variable list they could be put out like...

var1="`...` | ... | ..."
var2="`...` | ..."
var3="`...` | ..."

It would be very simple to understand when a
person understand the syntax of sed, grep, sort, uniq.
Overall it could make libtool easier to understand and hack,
because it could eliminate both the whittling down of variable
lists as first the -R are extracted in one loop and then in a
following loop the -L are extracted.
If for example a list of words that do not start with hyphens is desired
then a single invocation of sed can eliminate all such words that
begin with a hyphen.
That would be faster than whittling down long variable lists
and snowballing up new variable lists.

So as I was saying, if we can gain consensus that an
intentional and specific minimal amount of forking for
construction of variable lists in libtool is the lesser
of two evils then we can get started on hacking the changes.
But if we can not get past intentional forking and piping of
commands goes against tradition in a bad way
then there is no sense in hacking changes that will not be accepted
into libtool.

I do not expect that I presented an unbias opinion.
So I welcome counter discussion, please.
Even at the gain of signifigant speed I am not inclined to favor
my proposed changes without a clear consensus to break the tradition.
If the consensus does not desire more balanced libtool performance
then I am all for abandoning the new ideas until consensus is achieved.

P.S.

I do no propose destablizing libtool before the 2.0 release.
I expect the revisions I suggest, if implemented, would cause
such a large change to the libtool shell script that it would have
to be tested and postponed to the 3.0 release of libtool.
Afterall, I am suggesting changes that will yield between a
2 and 3 magnitude increase in speed and possibily 4 magnitudes
of increase on the most extremely complex libtool invocations/tasks.
Everyone of those loops that whittles or snowballs variables would
be subject to change, and every one of those loops that contain
case statements, and so forth...
I am guessing that it would be a signifigant amount of change?



On 10/21/06, Ralf Wildenhues <address@hidden> wrote:

Hello Kyle,

* Kyle Sallee wrote on Sat, Oct 21, 2006 at 12:48:24AM CEST:
> There was a signifigant delay between when
> I sent a request to join the list and the confirmation email.

I think there was a server outage sometime in the last couple of days.
I've discarded your earlier, pending message now.

> I was looking at a 26M sh -x trace of an invocation of libtool
> while compiling koffice to see why so many CPU cycles were
> being spent on running the libtool shell script.

Please post the link command line that caused this.  If you have web
space, put up the --debug log somewhere, bzip2'ed.  Post the URL of the
package (or how to get it).  How long did the script take?

> Is it out of the question to keep lists separated by line feeds
> instead of spaces and then use the sort and uniq instead
> of case statements?

You cannot reorder the arguments in general.  $libs: No go.
That $special_deplibs may be reordered (at least I *think*
they may) is a rather subtle point.

> There are also some parts that look obviously slow...

Possibly moving a test from inside to outside may help things.
But have you measured that it helps?  Let's avoid optimization
that isn't relevant in practice (for at least some case), it's
not worth risking a regression.

> Does the use of backticks or $( subshells break portability of libtool?

Portability is explained quite well in the POSIX standard with most
additional restrictions described in the Shell Portability section
of the Autoconf 2.60 manual.  The list of allowed tools is in the GCS.

(Although there are some bits in ltmain of the form: "this comment was
added because otherwise shell xyz coredumps[1].)

> However, some modification could probably yield
> several magnitudes of execution speed increase.

I posted some patches last year on the libtool-patches list (one issue
of which has since been applied) that would reduce complexity for
linking libjava.  They were not concerned with a large number of deplibs
though, rather a large number of objects.
http://lists.gnu.org/archive/html/libtool-patches/2005-05/msg00153.html

The ensuing discussion helps to show some obstackles.

> Would you like me to hack a little on libtool just to
> increase it's execution speed or are such changes
> trivial to accomplish now that I mentioned it?

Foremost we would like to release 2.0 eventually, and thus avoid any
destabilizing of the code.  Other than that, we welcome contributions,
of course.

Cheers,
Ralf

[1] speaking of which: some ksh variants dump core with CVS HEAD libtool
currently, but only in some occasions.  :-/

[Prev in Thread]

Current Thread

[Next in Thread]

Fwd: curious..., Kyle Sallee, 2006/10/20
- Re: Fwd: curious..., Eric Blake, 2006/10/20
  - Message not available
    - Re: Fwd: curious..., Eric Blake, 2006/10/21
- Re: Fwd: curious..., Ralf Wildenhues, 2006/10/21
  - Re: Fwd: curious..., Kyle Sallee <=
    - Re: Fwd: curious..., Ralf Wildenhues, 2006/10/22

Prev by Date: Re: fork costs...
Next by Date: Re: fork costs...
Previous by thread: Re: Fwd: curious...
Next by thread: Re: Fwd: curious...
Index(es):
- Date
- Thread