[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [avr-libc-dev] [bug #21623] boot.h: Use the "z" register constraint
From: |
Shaun Jackman |
Subject: |
Re: [avr-libc-dev] [bug #21623] boot.h: Use the "z" register constraint |
Date: |
Wed, 21 Nov 2007 11:04:33 -0700 |
(cc'ing address@hidden)
On Nov 21, 2007 2:38 AM, Wouter van Gulik <address@hidden> wrote:
> Also consider the fuse bit get routine. This scheme gives more knowledge
> to the compiler, unfortunately gcc fails to see the loading of r31 can
> done once:
>
> using this:
>
> =========================================================================
> static inline uint8_t boot_lock_fuse_bits_new(uint16_t address)
> {
> uint8_t result;
> register uint16_t adr asm("r30") = address; //make sure it's in z
> register aka r30:r31
>
> asm volatile(
> "sts %1, %2\n\t"
> "lpm %0, Z"
> : "=r" (result)
> : "i" (_SFR_MEM_ADDR(__SPM_REG)),
> "r" ((uint8_t)__BOOT_LOCK_BITS_SET),
> "z" (adr)
> : "r0"
> );
> return result;
> }
>
> uint8_t bar(void)
> {
> uint8_t temp;
> uint16_t adr = 0;
> temp = boot_lock_fuse_bits_new(adr++);
> temp += boot_lock_fuse_bits_new(adr++);
> temp += boot_lock_fuse_bits_new(adr++);
> temp += boot_lock_fuse_bits_new(adr++);
> return temp;
> }
>
> =========================================================================
>
> It gives this assembler output:
> .global bar
> .type bar, @function
> bar:
> /* prologue: frame size=0 */
> /* prologue end (size=0) */
> ldi r30,lo8(0) ; 8 *movhi/4 [length = 2]
> ldi r31,hi8(0)
> ldi r25,lo8(9) ; 10 *movqi/2 [length = 1]
> /* #APP */
> sts 87, r25
> lpm r24, Z
> /* #NOAPP */
> ldi r30,lo8(1) ; 16 *movhi/4 [length = 2]
> ldi r31,hi8(1)
> /* #APP */
> sts 87, r25
> lpm r30, Z
> /* #NOAPP */
> add r24,r30 ; 22 addqi3/1 [length = 1]
> ldi r30,lo8(2) ; 24 *movhi/4 [length = 2]
> ldi r31,hi8(2)
> /* #APP */
> sts 87, r25
> lpm r18, Z
> /* #NOAPP */
> ldi r30,lo8(3) ; 29 *movhi/4 [length = 2]
> ldi r31,hi8(3)
> /* #APP */
> sts 87, r25
> lpm r25, Z
> /* #NOAPP */
> add r25,r18 ; 36 addqi3/1 [length = 1]
> add r24,r25 ; 37 addqi3/1 [length = 1]
> clr r25 ; 45 zero_extendqihi2/1 [length = 1]
> /* epilogue: frame size=0 */
> ret
> /* epilogue end (size=1) */
> /* function bar size 30 (29) */
> .size bar, .-bar
>
>
> This is not smaller nor faster but it could have been. If gcc would
> leave r31, or do a adiw
> I tried against 4.1.2 using -Wall -Os -mmcu=atmega16. Maybe 4.2.2 or
> 4.3.0 is better?
>
> It does however use r30 as output which could save some speed and code
> when no other register is available.
>
> HTH,
>
> Wouter
I have also noticed that a series of
p = buf; *p++; *p++ *p++;
get's optimized to
buf[0]; buf[1]; buf[2];
which may be faster on some architectures, but loading constants is
quite expensive on the AVR. I don't know a terrible lot about GCC
optimisations, but I suspect it would be related to the constant pool
management, to realise that we already have a 2 in the constant pool,
and we can best introduce a 3 to the constant pool by incrementing 2.
Cheers,
Shaun