(this is gcc-optimize-bug.txt)
I have this relatively straighforward implementation of a a couple of pins
worth of software PWM:
void pwmcycle(void)
{
unsigned char pwm1, pwm2, pwm3, pwm4, pwm5, level_delay;
unsigned char pwm_delay;
getbright();
pwm1 = bright1;
pwm2 = bright2;
pwm3 = bright3;
pwm4 = bright4;
pwm5 = bright5;
led_all_on();
for (pwm_delay = 128; pwm_delay !=0; pwm_delay--) {
/*
* Rather standard software PWM loop.
*/
if (--pwm1 == 0) {
led1_off();
}
if (--pwm2 == 0) {
led2_off();
}
if (--pwm3 == 0) {
led3_off();
}
if (--pwm4 == 0) {
led4_off();
}
if (--pwm5 == 0) {
led5_off();
}
}
}
When compiled with avr-gcc 4.6.2, it produces rather strange (but correct) code
for the loop:
/usr/local/CrossPack-AVR-20121207/bin/avr-gcc -c -mmcu=atmega8 -g -Os \
gcc-optimize-bug.c -save-temps=obj -o gcc-optimize-bug-Os.o
c: 00 d0 rcall .+0 ; 0xe <pwmcycle+0xe>
e: c0 91 00 00 lds r28, 0x0000 ;;pwm1
12: f0 90 00 00 lds r15, 0x0000 ;;pwm2
16: 00 91 00 00 lds r16, 0x0000 ;;pwm3
1a: 10 91 00 00 lds r17, 0x0000 ;;pwm4
1e: d0 91 00 00 lds r29, 0x0000 ;;pwm5
22: 00 d0 rcall .+0 ; 0x24 <pwmcycle+0x24>
24: 80 e8 ldi r24, 0x80 ; 128
26: e8 2e mov r14, r24
28: fc 1a sub r15, r28
2a: 0c 1b sub r16, r28
2c: 1c 1b sub r17, r28
2e: dc 1b sub r29, r28
30: c1 50 subi r28, 0x01 ; 1
32: 01 f4 brne .+0 ; 0x34 <pwmcycle+0x34>
34: 00 d0 rcall .+0 ; 0x36 <pwmcycle+0x36>
36: 8f 2d mov r24, r15
38: 8c 0f add r24, r28
3a: 01 f4 brne .+0 ; 0x3c <pwmcycle+0x3c>
3c: 00 d0 rcall .+0 ; 0x3e <pwmcycle+0x3e>
3e: 80 2f mov r24, r16
40: 8c 0f add r24, r28
42: 01 f4 brne .+0 ; 0x44 <pwmcycle+0x44>
44: 00 d0 rcall .+0 ; 0x46 <pwmcycle+0x46>
:
I guess this is some sort of loop optimization. I don't like that it's so
obscured from the original, but it's also not very "good." I can get more
obvious, and significantly smaller/faster code by turning off
tree-loop-optimize:
(note that -ftree-loop-optimize is turned ON by default starting at -O1)
/usr/local/CrossPack-AVR-20121207/bin/avr-gcc -c -mmcu=atmega8 -g -Os \
gcc-optimize-bug.c -fno-tree-loop-optimize -save-temps=obj \
-o gcc-optimize-bug-notree.o
c: 00 d0 rcall .+0 ; 0xe <pwmcycle+0xe>
e: e0 90 00 00 lds r14, 0x0000
12: f0 90 00 00 lds r15, 0x0000
16: 00 91 00 00 lds r16, 0x0000
1a: 10 91 00 00 lds r17, 0x0000
1e: d0 91 00 00 lds r29, 0x0000
22: 00 d0 rcall .+0 ; 0x24 <pwmcycle+0x24>
24: c0 e8 ldi r28, 0x80 ; 128
26: ea 94 dec r14
28: 01 f4 brne .+0 ; 0x2a <pwmcycle+0x2a>
2a: 00 d0 rcall .+0 ; 0x2c <pwmcycle+0x2c>
2c: fa 94 dec r15
2e: 01 f4 brne .+0 ; 0x30 <pwmcycle+0x30>
30: 00 d0 rcall .+0 ; 0x32 <pwmcycle+0x32>
32: 01 50 subi r16, 0x01 ; 1
34: 01 f4 brne .+0 ; 0x36 <pwmcycle+0x36>
36: 00 d0 rcall .+0 ; 0x38 <pwmcycle+0x38>
:
I found
http://gcc.gnu.org/onlinedocs/gccint/Tree-SSA-passes.html where they
describe the optimizations done in tree_ssa_loop.c, which I assume is what
is controlled here. Some of them sound useful. But it also looks like a
case where high-level optimizations aimed at processors with vectorization
capabilities (?) are making it difficult for code generators on smaller
processors with the usual instruction sets to generate good code. Is there
anything that can be done? Can vectorizing optimizations (if they turn out
to be guilty) be turned off by processors that don't have any vectorization
ability?
Full source, intermediate, object, and list files on google docs.
https://docs.google.com/file/d/0B6dMB5dovDUZRlhzdlZWTk9mTWc/edit?usp=sharing
(FWIW, I get the same sort of non-optimal obfuscation using the msp430-gcc compiler,