avr-libc-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[avr-libc-dev] -O3? -Os?


From: Joerg Wunsch
Subject: [avr-libc-dev] -O3? -Os?
Date: Mon, 16 Dec 2002 16:06:49 +0100
User-agent: Mutt/1.2.5i

I've always been curious what we actually gain by using -O3 for the
`larger' AVR devices when compiling the library.  So i finally wrote a
test case, and ran it on an ATmega128.  In order to create a test job
that might profit as best as possible from any speed enhancement made
inside avr-libc, i decided that sorting strings would serve this task
quite well: it contains calls to library functions that are
optimizable, and that take a bit of CPU in order to execute (qsort()).
I used qsort to sort an array of strings (boldly borrowed the first
lines from the famous "Bastard Operator from Hell" for it :), once
using the normal strcmp() function, and another time using a function
effectively sorting the array by string size.

The resulting object file has been linked against a current avr-libc,
where the library was configured and compiled with different
optimization options (avrlib_opt_speed in configure.in).

Here's the results:

-O3:

% avr-size test.out
   text    data     bss     dec     hex filename
   6898    1980      10    8888    22b8 test.out

time for qsort(strcmp): 0.000903 seconds.
time for qsort(strlencmp): 0.019705 seconds.
done.

-mcall-prologues -Os:

% avr-size test.out
   text    data     bss     dec     hex filename
   6474    1980      10    8464    2110 test.out

time for qsort(strcmp): 0.000972 seconds.
time for qsort(strlencmp): 0.020069 seconds.
done.

-Os:

% avr-size test.out
   text    data     bss     dec     hex filename
   6618    1980      10    8608    21a0 test.out

time for qsort(strcmp): 0.000955 seconds.
time for qsort(strlencmp): 0.020069 seconds.
done.

-O2:

% avr-size test.out
   text    data     bss     dec     hex filename
   6666    1980      10    8656    21d0 test.out

time for qsort(strcmp): 0.000972 seconds.
time for qsort(strlencmp): 0.020069 seconds.


It's interesting to note that all attempts to modify the flags except
-O3 basically gain nothing at all in terms of speed, with
-mcall-prologues -Os (our default for the `small' AVR devices)
yielding the smallest code size.  (The difference between 955 µs and
972 µs ist just a single timer-tick only, so take that with a grain of
salt.)

For -O3, the code size is ~ 6 % larger (even more bloat if you
consider that vfprintf() & Co. take up about 25 % of the text segment
and are unaffected by the global -O settings since they use private,
hand-crafted optimization flags).  The speed gain is between 2 and 6 %.


My vote would be to use -mcall-prologues -Os for any of our targets.

-- 
J"org Wunsch                                           Unix support engineer
address@hidden        http://www.interface-systems.de/~j/

Attachment: test.c
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]