avr-libc-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-libc-dev] [patch #7909] Adding __volatile__ to __asm__ within p


From: Georg-Johann Lay
Subject: Re: [avr-libc-dev] [patch #7909] Adding __volatile__ to __asm__ within pgmspace header
Date: Wed, 02 Jan 2013 23:41:30 +0100
User-agent: Thunderbird 2.0.0.24 (Windows/20100228)

Alex Crawford schrieb:
Here is a boiled down example of the issue I was seeing:

while (pgm_read_word_far(0x00000000) != 0xFFFF) {
    avr109_process();
}

Just to make sure everybody has a valid test case (here witha byte access for simplicity):


#include <avr/pgmspace.h>

extern void avr109_process (void);

void foo (void)
{
  while (pgm_read_byte_far (0) != 0xFF)
    avr109_process();
}


and compile with


$ avr-gcc-4.7.2 -S -Os -mmcu=atmega128 foo.c


foo:
        push r28
        ldi r24,0
        ldi r25,0
        movw r26,r24
        out 59, r26
        movw r30, r24
        elpm r28, Z+
        rjmp .L2
.L3:
        call avr109_process
.L2:
        cpi r28,lo8(-1)
        brne .L3
        pop r28
        ret


As seen in the assembly, the compiler chooses to elpm only one time before
the loop (because it is not volatile). The avr109_process() function in the
example is a bootloader (surprise!) and will write to flash location 0x00.
Maybe I misunderstood the purpose of the pgm_read macros, but I feel like
these should be marked volatile since the data they are reading is subject
to change without their knowing. Please correct me if I am wrong (I'm
pretty new to this stuff).

It basically boils down to what avr-libc specifies...

Running through cpp, the above code basically yields:

typedef unsigned long uint32_t;
typedef unsigned int uint16_t;
typedef unsigned char uint8_t;

#define __SFR_OFFSET 0x20
#define _SFR_MEM_ADDR(sfr) ((uint16_t) &(sfr))
#define _SFR_IO_ADDR(sfr) (_SFR_MEM_ADDR(sfr) - __SFR_OFFSET)

#define RAMPZ (*(uint8_t volatile*)0x5b)

#define __ELPM_enhanced__(addr)                 \
    (__extension__({                            \
        uint32_t __addr32 = (uint32_t)(addr);   \
        uint8_t __result;                       \
        __asm__ ("out %2, %C1"  "\n\t"          \
                 "movw r30, %1" "\n\t"          \
                 "elpm %0, Z+"  "\n\t"          \
                 : "=r" (__result)              \
                 : "r" (__addr32), "I" (_SFR_IO_ADDR(RAMPZ)) \
                 : "r30", "r31");\
        __result;\
    }))

#define __ELPM(addr) __ELPM_enhanced__(addr)
#define pgm_read_byte_far(address_long) __ELPM((uint32_t)(address_long))

extern void avr109_process (void);

void foo (void)
{
  while (pgm_read_byte_far (0) != 0xFF)
    avr109_process();
}

The very problem is that these pgm_ macros are neither volatile nor do they describe the memory access. First conclusion is that a memory barrier in the application won't help.

David, you mentioned that flash memory is not volatile (and is treated as
such by avr-libc). Can you explain why this assumption is made? I agree
that flash memory will remain constant in the vast majority of cases, but
in the few cases that it does change, the program is probably checking
itself (via LPM calls) for changes. I may be way off base, but it seems
that these pgm_read macros would lend themselves most useful for self
modifying programs (where the inline assembly would need to be marked
volatile in order to work properly).

Sounds reasonable. I don't expect a gross performance loss (if at all) with volatile asm. The macros also could get a memory input to describe the memory action, here with an inline function:


#include <avr/io.h>

static __inline__ __attribute__((always_inline))
uint8_t __ELPM_enhanced__ (const uint32_t __addr32)
{
    uint8_t volatile * __addr16 = (uint8_t volatile*)(uint16_t) __addr32;
    uint8_t __result;
    __asm__ ("out %2, %C1"  "\n\t"
             "movw r30, %1" "\n\t"
             "elpm %0, Z"
             : "=r" (__result)
             : "r" (__addr32), "I" (_SFR_IO_ADDR(RAMPZ)),
               "m" (*__addr16)
             : "r30", "r31", "memory");
    return __result;
}

#define __ELPM(addr) __ELPM_enhanced__(addr)
#define pgm_read_byte_far(address_long) __ELPM((uint32_t)(address_long))

extern void avr109_process (void);

void foo (void)
{
  while (pgm_read_byte_far (0) != 0xFF)
    avr109_process();
}


Result is:


foo:
        push r12
        push r13
        push r14
        push r15
        mov r12,__zero_reg__
        mov r13,__zero_reg__
        movw r14,r12
        rjmp .L2
.L3:
        call avr109_process
.L2:
        out 59, r14
        movw r30, r12
        elpm r24, Z
        cpi r24,lo8(-1)
        brne .L3
        pop r15
        pop r14
        pop r13
        pop r12
        ret


Also notice that the original macro is not correct because it changes memory (RAMPZ) but does not describe that change. The new asm has a memory clobber for that reason. A volatile asm instead of the new input operand %3 would have a similar effect.

An appropriate way to supply such functionality would be a gcc built-in function that could describe all side effects.

Johann


FYI, in the case you use avr-gcc 4.7.1+ the following code will do:


#include <stdint.h>

extern void avr109_process (void);

void foo (void)
{
  const __memx volatile uint16_t *p = (const __memx volatile uint16_t*) 0;

  while (*p != 0xFFFF)
    avr109_process();
}


This compiles to


foo:
        push r12
        push r13
        push r14
        mov r12,__zero_reg__
        mov r13,__zero_reg__
        mov r14,__zero_reg__
        rjmp .L2
.L3:
        call avr109_process
.L2:
        movw r30,r12
        mov r21,r14
        call __xload_2
        cpi r22,-1
        sbci r23,-1
        brne .L3
        pop r14
        pop r13
        pop r12
        ret


__xload_2 is a support function from libgcc. But the read is from segment 0 and thus you can use address space __flash instead of __memx which yields:

foo:
        rjmp .L2
.L3:
        call avr109_process
.L2:
        ldi r30,0
        ldi r31,0
        lpm r24,Z+
        lpm r25,Z
        adiw r24,1
        brne .L3
        ret







reply via email to

[Prev in Thread] Current Thread [Next in Thread]