[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [avr-libc-dev] [patch #7909] Adding __volatile__ to __asm__ within p
From: |
Georg-Johann Lay |
Subject: |
Re: [avr-libc-dev] [patch #7909] Adding __volatile__ to __asm__ within pgmspace header |
Date: |
Wed, 02 Jan 2013 23:41:30 +0100 |
User-agent: |
Thunderbird 2.0.0.24 (Windows/20100228) |
Alex Crawford schrieb:
Here is a boiled down example of the issue I was seeing:
while (pgm_read_word_far(0x00000000) != 0xFFFF) {
avr109_process();
}
Just to make sure everybody has a valid test case (here witha byte
access for simplicity):
#include <avr/pgmspace.h>
extern void avr109_process (void);
void foo (void)
{
while (pgm_read_byte_far (0) != 0xFF)
avr109_process();
}
and compile with
$ avr-gcc-4.7.2 -S -Os -mmcu=atmega128 foo.c
foo:
push r28
ldi r24,0
ldi r25,0
movw r26,r24
out 59, r26
movw r30, r24
elpm r28, Z+
rjmp .L2
.L3:
call avr109_process
.L2:
cpi r28,lo8(-1)
brne .L3
pop r28
ret
As seen in the assembly, the compiler chooses to elpm only one time before
the loop (because it is not volatile). The avr109_process() function in the
example is a bootloader (surprise!) and will write to flash location 0x00.
Maybe I misunderstood the purpose of the pgm_read macros, but I feel like
these should be marked volatile since the data they are reading is subject
to change without their knowing. Please correct me if I am wrong (I'm
pretty new to this stuff).
It basically boils down to what avr-libc specifies...
Running through cpp, the above code basically yields:
typedef unsigned long uint32_t;
typedef unsigned int uint16_t;
typedef unsigned char uint8_t;
#define __SFR_OFFSET 0x20
#define _SFR_MEM_ADDR(sfr) ((uint16_t) &(sfr))
#define _SFR_IO_ADDR(sfr) (_SFR_MEM_ADDR(sfr) - __SFR_OFFSET)
#define RAMPZ (*(uint8_t volatile*)0x5b)
#define __ELPM_enhanced__(addr) \
(__extension__({ \
uint32_t __addr32 = (uint32_t)(addr); \
uint8_t __result; \
__asm__ ("out %2, %C1" "\n\t" \
"movw r30, %1" "\n\t" \
"elpm %0, Z+" "\n\t" \
: "=r" (__result) \
: "r" (__addr32), "I" (_SFR_IO_ADDR(RAMPZ)) \
: "r30", "r31");\
__result;\
}))
#define __ELPM(addr) __ELPM_enhanced__(addr)
#define pgm_read_byte_far(address_long) __ELPM((uint32_t)(address_long))
extern void avr109_process (void);
void foo (void)
{
while (pgm_read_byte_far (0) != 0xFF)
avr109_process();
}
The very problem is that these pgm_ macros are neither volatile nor do
they describe the memory access. First conclusion is that a memory
barrier in the application won't help.
David, you mentioned that flash memory is not volatile (and is treated as
such by avr-libc). Can you explain why this assumption is made? I agree
that flash memory will remain constant in the vast majority of cases, but
in the few cases that it does change, the program is probably checking
itself (via LPM calls) for changes. I may be way off base, but it seems
that these pgm_read macros would lend themselves most useful for self
modifying programs (where the inline assembly would need to be marked
volatile in order to work properly).
Sounds reasonable. I don't expect a gross performance loss (if at all)
with volatile asm. The macros also could get a memory input to describe
the memory action, here with an inline function:
#include <avr/io.h>
static __inline__ __attribute__((always_inline))
uint8_t __ELPM_enhanced__ (const uint32_t __addr32)
{
uint8_t volatile * __addr16 = (uint8_t volatile*)(uint16_t) __addr32;
uint8_t __result;
__asm__ ("out %2, %C1" "\n\t"
"movw r30, %1" "\n\t"
"elpm %0, Z"
: "=r" (__result)
: "r" (__addr32), "I" (_SFR_IO_ADDR(RAMPZ)),
"m" (*__addr16)
: "r30", "r31", "memory");
return __result;
}
#define __ELPM(addr) __ELPM_enhanced__(addr)
#define pgm_read_byte_far(address_long) __ELPM((uint32_t)(address_long))
extern void avr109_process (void);
void foo (void)
{
while (pgm_read_byte_far (0) != 0xFF)
avr109_process();
}
Result is:
foo:
push r12
push r13
push r14
push r15
mov r12,__zero_reg__
mov r13,__zero_reg__
movw r14,r12
rjmp .L2
.L3:
call avr109_process
.L2:
out 59, r14
movw r30, r12
elpm r24, Z
cpi r24,lo8(-1)
brne .L3
pop r15
pop r14
pop r13
pop r12
ret
Also notice that the original macro is not correct because it changes
memory (RAMPZ) but does not describe that change. The new asm has a
memory clobber for that reason. A volatile asm instead of the new input
operand %3 would have a similar effect.
An appropriate way to supply such functionality would be a gcc built-in
function that could describe all side effects.
Johann
FYI, in the case you use avr-gcc 4.7.1+ the following code will do:
#include <stdint.h>
extern void avr109_process (void);
void foo (void)
{
const __memx volatile uint16_t *p = (const __memx volatile uint16_t*) 0;
while (*p != 0xFFFF)
avr109_process();
}
This compiles to
foo:
push r12
push r13
push r14
mov r12,__zero_reg__
mov r13,__zero_reg__
mov r14,__zero_reg__
rjmp .L2
.L3:
call avr109_process
.L2:
movw r30,r12
mov r21,r14
call __xload_2
cpi r22,-1
sbci r23,-1
brne .L3
pop r14
pop r13
pop r12
ret
__xload_2 is a support function from libgcc. But the read is from
segment 0 and thus you can use address space __flash instead of __memx
which yields:
foo:
rjmp .L2
.L3:
call avr109_process
.L2:
ldi r30,0
ldi r31,0
lpm r24,Z+
lpm r25,Z
adiw r24,1
brne .L3
ret