|
From: | Thomas Watson |
Subject: | [avr-gcc-list] 16 Bit Store Optimizations |
Date: | Mon, 19 Dec 2016 15:24:03 -0600 |
Hello all, A frequent need in my code is to combine two 8 bit variables into a 16 bit variable. I am trying to determine the optimal way to do this. The naïve way and a more clever way both generate extra instructions that could be optimized away. I include a test case and comments which explain the setup and issue in more detail. It seems this is a missed opportunity for optimization in the compiler. Thomas #include <avr/io.h> #include <inttypes.h> #include <avr/interrupt.h> /* $ avr-gcc -v Using built-in specs. COLLECT_GCC=avr-gcc COLLECT_LTO_WRAPPER=/usr/local/Cellar/avr-gcc/4.9.2/libexec/gcc/avr/4.9.2/lto-wrapper Target: avr Configured with: ../configure --enable-languages=c,c++ --target=avr --disable-libssp --disable-nls --with-dwarf2 --prefix=/usr/local/Cellar/avr-gcc/4.9.2 --with-gmp=/usr/local/Cellar/gmp/6.0.0a --with-mpfr=/usr/local/Cellar/mpfr/3.1.2-p10 --with-mpc=/usr/local/Cellar/libmpc/1.0.2 --datarootdir=/usr/local/Cellar/avr-gcc/4.9.2/share --bindir=/usr/local/Cellar/avr-gcc/4.9.2/bin --with-as=/usr/local/bin/avr-as --with-ld=/usr/local/bin/avr-ld Thread model: single gcc version 4.9.2 (GCC) */ // compile like: // avr-gcc -mmcu=atmega328p -std=gnu99 -Os -Wall -DF_CPU=16000000 -Wa,-ahlmsd=sixteenbit_test.lst -o sixteenbit_test.elf sixteenbit_test.c // data storage memory (might be used in ISR for example) volatile uint16_t data; // read and return a byte from the serial port uint8_t read_byte() { while (!(UCSR0A & _BV(RXC0))); return (uint8_t)UDR0; } int main() { cli(); // init serial port // etc sei(); uint8_t temp_hi, temp_lo; // receive a word with | temp_hi = read_byte(); temp_lo = read_byte(); cli(); // not okay to get interrupted while assigning // in case an ISR comes and tries to read 'data' /* compiles to 47 0012 282F mov r18,r24 48 0014 30E0 ldi r19,0 49 0016 C901 movw r24,r18 50 0018 9C2B or r25,r28 51 001a 9093 0000 sts data+1,r25 52 001e 8093 0000 sts data,r24 where r28 is temp_hi and r24 is temp_lo 8 bytes and 4 cycles worse than the good solution */ data = "" | temp_lo; sei(); // can get interrupted again // receive a word with pointer assignment // ugly and still does not compile how I want temp_hi = read_byte(); temp_lo = read_byte(); cli(); // not okay to get interrupted while assigning // in case an ISR comes and tries to read 'data' /* compiles to 66 0030 E0E0 ldi r30,lo8(data) 67 0032 F0E0 ldi r31,hi8(data) 68 0034 8083 st Z,r24 69 0036 C183 std Z+1,r28 2 cycles worse than the good solution */ *((uint8_t*)&data) = temp_lo; *((uint8_t*)&data+1) = temp_hi; sei(); // can get interrupted again /* Why won't the compiler compile it as sts data, r24 sts data+1, r28 It will, in the second case, if it feels Z and Y are otherwise occupied. But I couldn't get to think that for a reasonably short sample. It also will work correctly in this case for -O2. */ } |
[Prev in Thread] | Current Thread | [Next in Thread] |