avr-libc-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-libc-dev] Can pgmspace.h __LPM_xxx__ macros become inlinefn's?


From: Bill Somerville
Subject: Re: [avr-libc-dev] Can pgmspace.h __LPM_xxx__ macros become inlinefn's?
Date: Tue, 05 Oct 2004 14:57:51 +0100

"Theodore A. Roth" wrote:
> 
> On Fri, 1 Oct 2004, Bill Somerville wrote:
> 
> > > I don't see anywhere that using static is not recommended. Do you have a
> > > reference for that?
> >
> > The penultimate para of the gcc man page "5.34 An Inline Function is As
> > Fast As a Macro" seemed to imply this, but after comments from Geoffrey
> > Wossum and some tests it seems that static __inline__ or extern
> > __inline__ are the only options in header files otherwise multiple
> > definitions occur.
> >
> > Unfortunately this has become academic as I cannot get the inline fn's
> > to generate the same code as the macros, also the inline versions
> > sometimes are bigger. This seems to be an optimiser problem where the
> > register choices made around inlined fn's are not as smart as they might
> > be. I suspect this is a quite obscure gcc bug/feature. The gcc man page
> > says that inlines may generate different code from macros (both larger
> > and smaller).
> >
> > Since I haven't found an example that generates smaller code, I suspect
> > that a community of embedded programmers are not going to be happy with
> > this change!
> 
> <snip>
> 
> >
> > In the inline version the second LPM result does an unnecessary register
> > shuffle that the macro version avoids. Note that the first LPM is OK so
> > the compiler can get it right sometimes.
> 
> Does changing "__asm__" to "__asm__ __volatile__" affect your results?

Yes, but it's not an improvement!

Here's the dumps of the relevant bits (same test code and compiler
switches as in previous mail):

Current macro version:-
=======================
00000056 <main>:
  56:   cf e5           ldi     r28, 0x5F       ; 95
  58:   d2 e0           ldi     r29, 0x02       ; 2
  5a:   de bf           out     0x3e, r29       ; 62
  5c:   cd bf           out     0x3d, r28       ; 61
  5e:   ea e1           ldi     r30, 0x1A       ; 26
  60:   f0 e0           ldi     r31, 0x00       ; 0
  62:   c8 95           lpm
  64:   40 2d           mov     r20, r0
  66:   31 96           adiw    r30, 0x01       ; 1
  68:   c8 95           lpm
  6a:   80 2d           mov     r24, r0
  6c:   28 2f           mov     r18, r24
  6e:   33 27           eor     r19, r19
  70:   82 2f           mov     r24, r18
  72:   99 27           eor     r25, r25
  74:   26 95           lsr     r18
  76:   26 95           lsr     r18
  78:   82 1b           sub     r24, r18
  7a:   91 09           sbc     r25, r1
  7c:   84 0f           add     r24, r20
  7e:   91 1d           adc     r25, r1
  80:   00 c0           rjmp    .+0             ; 0x82

Inline version with __asm__:-
=============================
00000056 <main>:
  56:   cf e5           ldi     r28, 0x5F       ; 95
  58:   d2 e0           ldi     r29, 0x02       ; 2
  5a:   de bf           out     0x3e, r29       ; 62
  5c:   cd bf           out     0x3d, r28       ; 61
  5e:   ea e1           ldi     r30, 0x1A       ; 26
  60:   f0 e0           ldi     r31, 0x00       ; 0
  62:   c8 95           lpm
  64:   30 2d           mov     r19, r0
  66:   31 96           adiw    r30, 0x01       ; 1
  68:   c8 95           lpm
  6a:   20 2d           mov     r18, r0
  6c:   82 2f           mov     r24, r18
  6e:   99 27           eor     r25, r25
  70:   26 95           lsr     r18
  72:   26 95           lsr     r18
  74:   82 1b           sub     r24, r18
  76:   91 09           sbc     r25, r1
  78:   83 0f           add     r24, r19
  7a:   91 1d           adc     r25, r1
  7c:   00 c0           rjmp    .+0             ; 0x7e

Inline version with __asm__ __volatile__:-
==========================================
00000056 <main>:
  56:   cf e5           ldi     r28, 0x5F       ; 95
  58:   d2 e0           ldi     r29, 0x02       ; 2
  5a:   de bf           out     0x3e, r29       ; 62
  5c:   cd bf           out     0x3d, r28       ; 61
  5e:   ea e1           ldi     r30, 0x1A       ; 26
  60:   f0 e0           ldi     r31, 0x00       ; 0
  62:   c8 95           lpm
  64:   80 2d           mov     r24, r0
  66:   48 2f           mov     r20, r24
  68:   55 27           eor     r21, r21
  6a:   31 96           adiw    r30, 0x01       ; 1
  6c:   c8 95           lpm
  6e:   80 2d           mov     r24, r0
  70:   99 27           eor     r25, r25
  72:   c8 95           lpm
  74:   20 2d           mov     r18, r0
  76:   26 95           lsr     r18
  78:   26 95           lsr     r18
  7a:   82 1b           sub     r24, r18
  7c:   91 09           sbc     r25, r1
  7e:   84 0f           add     r24, r20
  80:   95 1f           adc     r25, r21
  82:   00 c0           rjmp    .+0             ; 0x84

Note that the non-volatile version uses poor a register choice for the
second lpm, the volatile version uses a poor register choice for the
first lpm and manages to use an extra instruction over the non-volatile
version as well.

> 
> ---
> Ted Roth
> PGP Key ID: 0x18F846E9
> Jabber ID: address@hidden

Bill Somerville




reply via email to

[Prev in Thread] Current Thread [Next in Thread]