avr-libc-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[avr-libc-dev] Smaller boot.h macros for AVRs with >64K Flash


From: Mike Perks
Subject: [avr-libc-dev] Smaller boot.h macros for AVRs with >64K Flash
Date: Thu, 30 Apr 2009 08:38:17 -0500
User-agent: Thunderbird 2.0.0.21 (Windows/20090302)

Elvind suggested I post this idea on this mailing list for inclusion in a future release of avrlibc.

I have a 512 word bootloader that I use for my om128 <http://www.avrfreaks.net/address@hidden&func=viewItem&item_id=822> and om644p <http://www.avrfreaks.net/address@hidden&func=viewItem&item_id=906> microcontroller devices. These bootloaders were compiled with an older version of the GCC compiler (3.4.5) and avrlibc (1.4.4) using WinAVR 20060125. This resulted in very compact code. A further constraint is that the bootloader is actually in two parts. A 128 word "stub" and a 384 word main loader. The stub can be used to provide a self-updating feature.

Later versions of GCC (i.e v4) produced much larger code that would not fit. This hasn't been a problem up to now because because I simply used the older version of GCC. However I am now working on some new devices that require a later compiler to support the underlying AVRs (e.g. mega328p and mega1284p).

On further examination the stub was too big to fit in 128 words for the 128K byte flash devices (mega128, mega1284) and main culprits were the boot.h macros that required 32-bit addresses:

   * boot_page_erase(address)
   * boot_page_fill(address, data)
   * boot_page_write(address)

The problem is that the 32-bit address needs 4 registers, plus another 4 for pointer addition plus all the pushes and pops needed for the additional registers - a net add of around 50 words.

I resisted the temptation to completely rewrite the stub in assembly and looked at the boot.h macro definitions. For the 128K devices, the RAMPZ register may need to be set. It seemed like a good idea to separate this out from the address calculation and set the RAMPZ register separately. This idea should also work for devices with >128K flash.

Here is a snippet of the bootloader that shows the calculation of the address and RAMPZ register from a flash page number:
*
*
  /* 256 byte page size */
  uint16_t address = Page << 8;

   /* erase page and wait for completion */
  #if defined(RAMPZ)
     #if defined(__AVR_ATmega128__) || defined(__AVR_ATmega1284P__)
     /* 256 byte page size */
     boot_page_erase_extended(address, (Page >> 8));
     #endif
#elif defined(__AVR_ATmega644__) || defined(__AVR_ATmega644P__) boot_page_erase(address); #endif boot_spm_busy_wait();

Obviously the generated code is quite nice for 128 word pages (shifting left or right by 8 bits is easy). The new macro boot_page_erase_extended is defined as follows.

#define boot_page_erase_extended(address, ramp)        \
(__extension__({                                 \
   __asm__ __volatile__                         \
   (                                            \
       "sts    %3, %4\n\t"                      \
       "sts    %0, %1\n\t"                      \
       "spm\n\t"                                \
       :                                        \
       : "i" (_SFR_MEM_ADDR(__SPM_REG)),        \
         "r" ((uint8_t)__BOOT_PAGE_ERASE),      \
         "z" ((uint16_t)address),               \
         "i" (_SFR_MEM_ADDR(RAMPZ)),            \
         "r" ((uint8_t)ramp)                    \
   );                                           \
}))

The "extended" boot_page_erase macro is very similar to the normal macro for 16-bit addresses except for setting the RAMPZ register i.e. the generated code is only a few words larger. The corresponding macros for boot_page_fill and boot_page_write functions are*
*
#define boot_page_fill_extended(address, data, ramp) \
(__extension__({                                 \
   __asm__ __volatile__                         \
   (                                            \
       "movw   r0, %3\n\t"                      \
       "sts    %4, %5\n\t"                      \
       "sts    %0, %1\n\t"                      \
       "spm    \n\t"                            \
       "clr    r1\n\t"                          \
       :                                        \
       : "i" (_SFR_MEM_ADDR(__SPM_REG)),        \
         "r" ((uint8_t)__BOOT_PAGE_FILL),       \
         "z" ((uint16_t)address),               \
         "r" ((uint16_t)data),                  \
         "i" (_SFR_MEM_ADDR(RAMPZ)),            \
         "r" ((uint8_t)ramp)                    \
       : "r0"                                   \
   );                                           \
}))


#define boot_page_write_extended(address, ramp)        \
(__extension__({                                 \
   __asm__ __volatile__                         \
   (                                            \
       "sts    %0, %1\n\t"                      \
       "sts    %3, %4\n\t"                      \
       "spm\n\t"                                \
       :                                        \
       : "i" (_SFR_MEM_ADDR(__SPM_REG)),        \
         "r" ((uint8_t)__BOOT_PAGE_WRITE),      \
         "z" ((uint16_t)address),               \
         "i" (_SFR_MEM_ADDR(RAMPZ)),            \
         "r" ((uint8_t)ramp)                    \
   );                                           \
}))

Here are the resultant code sizes for just the stub using the latest GCC compiler:

   * mega1284p - 108 words
   * mega128 - 96 words
   * mega644p - 99 words

The mega644p is 9 words smaller (3 per RAMPZ use) than the mega1284p. The mega128 is 12 words smaller than the mega1284p because it can use in/out instructions rather than the longer lds/sts instructions for I/O registers. This shows one of the benefits of keeping to C code as much as possible - the compiler can do a much better optimization job although as shown in this post, sometimes it needs some help.

Regards,
Mike



reply via email to

[Prev in Thread] Current Thread [Next in Thread]