[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[avr-gcc-list] Re: Missed optimization or am *I* missing something?
From: |
David Brown |
Subject: |
[avr-gcc-list] Re: Missed optimization or am *I* missing something? |
Date: |
Thu, 23 Sep 2010 17:24:57 +0200 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4 |
On 23/09/2010 15:57, Paulo Marques wrote:
Weddington, Eric wrote:
Hi Johannes,
Hi, Eric
My best guess is that this is a missed optimization. The compiler is
probably selecting different patterns based on the different code. It
recognizes the redundant adding of zero in the operations below when
WEIRD is 0 and rightly throws them out. When WEIRD is 1, those
operations have to be kept in, and so different patterns are selected,
which are either more optimal, or triggers different kinds of
optimizations to take place afterwards, hence producing better code even
though there are more operations. I noticed that the longer code
generates a lot more code with the pointer registers (Z and X in this case).
Yes, it seems that in the WEIRD=0 case, the compiler thinks the loop
gets simple enough to be unrolled. If you look at the long version,
there is no loop at all in it.
That's what is happening here.
If this is performance-critical code, it could be very much faster with
some "hand-optimised" C code. I don't like doing this sort of thing -
it's the compiler's job, not the programmer's. But compiler writers are
only human, and there are while the compiler could /theoretically/
generate better code, there is a limit to how much the compiler can
figure out in practice.
Normally it's best to access arrays as arrays, rather than manually
convert them into pointers. But we can get better code here by being a
little bit smarter about the array offset calculations. This is
especially true on the 8-bit avr, since gcc promotes all array offset
arithmetic to "int" at an early stage, making it particularly inefficient.
uint16_t FOO(void) {
uint16_t boundedBufferValueSum = 0;
uint8_t offset = DMA_CH0_TRFCNT;
uint16_t *p = &fooBoundedBuffer[DMA_CH0_TRFCNT];
for (uint8_t i = 0; i < 4; i++) {
boundedBufferValueSum += *p++;
if (++offset >= FOOBUFSIZE) {
p -= FOOBUFSIZE;
}
}
return boundedBufferValueSum;
}
I noticed too that the more optimal case below is still not the best
code: it looks like the index variable 'i' in your for loop is being
treated as a 16-bit value, when you explicitly declare it as an 8-bit
unsigned char. Fixing that could save another 3 instructions.
One thing I noticed a long time ago with avr-gcc (I haven't checked more
recent versions for this) is that splitting expressions into simpler
ones helps the compiler detect that it doesn't need to upgrade the
variables to 16 bit int's.
For instance, if you write the code as:
unsigned char tmp;
...
offset = DMA.CH0.TRFCNT;
offset -= WEIRD;
boundedBufferValueSum = 0;
for (i = 0; i< 4; i++) {
tmp = offset;
tmp += i;
tmp += WEIRD;
tmp %= FOOBUFSIZE;
boundedBufferValueSum += fooBoundedBuffer[tmp];
}
you'll probably get smaller code. Yes, it's ugly, but if that function
is in some hot path, it might be worth it...