[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [avr-libc-dev] [bug #21623] boot.h: Use the "z" register constraint
From: |
Wouter van Gulik |
Subject: |
Re: [avr-libc-dev] [bug #21623] boot.h: Use the "z" register constraint |
Date: |
Wed, 21 Nov 2007 22:47:58 +0100 |
User-agent: |
Thunderbird 1.5.0.14pre (X11/20071023) |
Shaun Jackman wrote:
I have also noticed that a series of
p = buf; *p++; *p++ *p++;
get's optimized to
buf[0]; buf[1]; buf[2];
which may be faster on some architectures, but loading constants is
quite expensive on the AVR.
Phew, I just tried this:
================================
extern unsigned char foo2(char *);
unsigned char bar2(char *p)
{
unsigned char tmp;
tmp = foo2(p++);
tmp += foo2(p++);
tmp += foo2(p++);
tmp += foo2(p++);
return tmp;
}
================================
Note this is compiled against avr-gcc-4.2.2 using:
avr-gcc -Wall -Os -mmcu=atmega16 -dp -S
================================
bar2:
/* prologue: frame size=0 */
push r13
push r14
push r15
push r16
push r17
/* prologue end (size=5) */
movw r16,r24 ; 52 *movhi/1 [length = 1]
subi r16,lo8(-(1)) ; 11 *addhi3/4 [length = 2]
sbci r17,hi8(-(1))
call foo2 ; 13 call_value_insn/3 [length = 2]
mov r13,r24 ; 14 *movqi/1 [length = 1]
movw r14,r16 ; 53 *movhi/1 [length = 1]
sec ; 16 *addhi3/5 [length = 3]
adc r14,__zero_reg__
adc r15,__zero_reg__
movw r24,r16 ; 17 *movhi/1 [length = 1]
call foo2 ; 18 call_value_insn/3 [length = 2]
mov r17,r24 ; 19 *movqi/1 [length = 1]
movw r24,r14 ; 21 *movhi/1 [length = 1]
call foo2 ; 22 call_value_insn/3 [length = 2]
mov r16,r24 ; 23 *movqi/1 [length = 1]
movw r24,r14 ; 54 *movhi/1 [length = 1]
adiw r24,1 ; 26 *addhi3/2 [length = 1]
call foo2 ; 27 call_value_insn/3 [length = 2]
add r17,r13 ; 30 addqi3/1 [length = 1]
add r17,r16 ; 32 addqi3/1 [length = 1]
add r17,r24 ; 33 addqi3/1 [length = 1]
mov r24,r17 ; 41 zero_extendqihi2/2 [length = 2]
clr r25
/* epilogue: frame size=0 */
pop r17
pop r16
pop r15
pop r14
pop r13
ret
================================
What is going on here? I can imagine gcc not finding the (register
allocation wise) optimal pattern:
movw r24, rtmp
adiw r24, 1
move rtmp, r24
But now it has the pointer twice! Why?!? It gets ok when doing the last
call, but there the ++ is useless. Note that it's functional equivalent
using ++p is slightly better.
It would of course be most optimal if using Y, then it would be a simple:
adiw 28, 1
movw r24, r28
call foo2
add 17, r24
adiw 28, 1
movw r24, r28
call foo2
Saving stack and code, but that's probably hard to figure out for gcc
since r28:r29 is normally the frame pointer... so only if it's unused it
could (and probably is always best) to allocate it for a 16 bit var or
pointer. Maybe this idea is worth a look?
I don't know a terrible lot about GCC
optimisations, but I suspect it would be related to the constant pool
management, to realise that we already have a 2 in the constant pool,
and we can best introduce a 3 to the constant pool by incrementing 2.
This could also be the avr implementation not being open enough about
the movhi insn for constants.
Since it's quite bad incrementing a 16 bit for non immediate capable
registers it actually is not such a bad idea to load it. But then again
we are talking about r30:r31 here... nevermind...
HTH,
Wouter