[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [avr-gcc-list] optimizer
From: |
Bernard Fouché |
Subject: |
RE: [avr-gcc-list] optimizer |
Date: |
Wed, 24 Nov 2004 15:08:25 +0100 |
Hi Björn.
Thanks for your answer. I've started to look at the generated code mainly
because of the cost of 32 bits variables. For instance if you apply bitmask
on 32 bits variables, the generated code can be very large:
C:
uint32_t bswap32(uint32_t x)
{
return ( (((x) & 0xff000000) >> 24)
| (((x) & 0x00ff0000) >> 8)
| (((x) & 0x0000ff00) << 8)
| (((x) & 0x000000ff) << 24));
}
Generated:
uint32_t bswap32(uint32_t x)
{
ca: ef 92 push r14
cc: ff 92 push r15
ce: 0f 93 push r16
d0: 1f 93 push r17
d2: 7b 01 movw r14, r22
d4: 8c 01 movw r16, r24
return ( (((x) & 0xff000000) >> 24)
d6: 89 2f mov r24, r25
d8: 99 27 eor r25, r25
da: aa 27 eor r26, r26
dc: bb 27 eor r27, r27
de: a8 01 movw r20, r16
e0: 97 01 movw r18, r14
e2: 20 70 andi r18, 0x00 ; 0
e4: 30 70 andi r19, 0x00 ; 0
e6: 50 70 andi r21, 0x00 ; 0
e8: 23 2f mov r18, r19
ea: 34 2f mov r19, r20
ec: 45 2f mov r20, r21
ee: 55 27 eor r21, r21
f0: 82 2b or r24, r18
f2: 93 2b or r25, r19
f4: a4 2b or r26, r20
f6: b5 2b or r27, r21
f8: a8 01 movw r20, r16
fa: 97 01 movw r18, r14
fc: 20 70 andi r18, 0x00 ; 0
fe: 40 70 andi r20, 0x00 ; 0
100: 50 70 andi r21, 0x00 ; 0
102: 54 2f mov r21, r20
104: 43 2f mov r20, r19
106: 32 2f mov r19, r18
108: 22 27 eor r18, r18
10a: 82 2b or r24, r18
10c: 93 2b or r25, r19
10e: a4 2b or r26, r20
110: b5 2b or r27, r21
112: 5e 2d mov r21, r14
114: 44 27 eor r20, r20
116: 33 27 eor r19, r19
118: 22 27 eor r18, r18
11a: 82 2b or r24, r18
11c: 93 2b or r25, r19
11e: a4 2b or r26, r20
120: b5 2b or r27, r21
| (((x) & 0x00ff0000) >> 8)
| (((x) & 0x0000ff00) << 8)
| (((x) & 0x000000ff) << 24));
}
122: bc 01 movw r22, r24
124: cd 01 movw r24, r26
126: 1f 91 pop r17
128: 0f 91 pop r16
12a: ff 90 pop r15
12c: ef 90 pop r14
12e: 08 95 ret
Of course I've instead taken the 32 bits swap shown as an example of
assembly language in the avr-libc documentation :-)
I know nothing of the compiler internals, I just see that 32 bits variable
can be really expensive and should be avoided as much as possible, but
sometimes you don't have any choice. From my own C code and the resulting
assembly code, I didn't see much effective optimizations for 32 bits
variables, rather situations where the cost of using them was very high.
I reached the point where it is more effective (for space saving) to write a
function to perform 32 additions in a single place rather than letting the
compiler generates each time the code for doing a 32 bits addition itself.
Or instead of checking 32 bits (I want to know if the value has changed by
one), I use a 8 bits pointer to the lowest byte.
That leads to C code difficult to read, designed just for gcc and in a few
months someone else will read it and think I had too much beer to write this
kind of things and will rewrite it to see that the object size explodes
otherwise. [depressive mode off]
Another optimization I saw on the ICC compiler (I think) was that the
compiler, when asked for space optimization, used if possible the end of
another function if the code was the same. For instance many functions end
with a series of 'pop', and since the register use order seems to be
designed for this purpose, it was possible to branch to the end of another
function to perform the same pops. (The same for stack manipulation: once
the new stack value is calculated, the code can branch to somewhere that
already updates SPL/SPH/SREG.)
At last I ran again into a situation where I have no .data segment but the
linker brings in the code to initialize this segment anyway.
Bernard
-----Message d'origine-----
De : Haase Bjoern (PT-BEU/MKP5) * [mailto:address@hidden
Envoyé : mercredi 24 novembre 2004 13:56
À : address@hidden; Bernard Fouché; address@hidden
Objet : AW: [avr-gcc-list] optimizer
Hi,
I have observed similar situations where the optimized generated code could
be realized
with much less registers: Mainly when dealing with global variables of more
than 8 bit
word length.
I also have been thinking about improving the compiler. I came
to the conclusion, that it is probably difficult to solve this problem:
The core problem seems to be that the compiler internally
considers r24:r25:r26:r27 to be one single logical
32 bit register r24. It seems that this logical 32 bit register is
broken down to 4x8 bit objects at the very last step only, i.e. when issuing
the
assembler instructions.
In order to implement your suggested optimizations, it would probably be
necessary, to convert
all the 32 bit objects to 8 bit objects already at an earlier stage during
the compilation, i.e.
at the RTL level. This, however, probably would make it almost impossible to
generate object code
that could be used in a debugger. This might also prevent a lot of other
useful optimization steps
that require the variables to be considered as monolithic 32 bit quantities.
I have come to the conclusion, that the possible benefit of an early 32
bit -> 4x8 Bit
splitting also mainly affects code that uses global variables and does not
help much when
dealing with the more commonly present case that variables are held in
registers. Possibly your code
could be improved if you try to avoid global variables.
IMHO the possible benefit of a 32-> 4x8 splitting at the RTL level does not
really justify
the required amount of changes in the compiler.
Björn
-----Original Message-----
From: address@hidden [mailto:address@hidden
On Behalf Of Bernard Fouché
Sent: Wednesday, 24 November 2004 7:18 PM
To: address@hidden
Subject: [avr-gcc-list] optimizer
Hi.
I'm compiling with -Os for atmega64 with avr-gcc 3.4.2. When I have
uint32_t var;
var=(uint32_t)function_returning_an_int8_t();
the generated code is, for instance:
var=(uint32_t)eeprom_read_byte((uint8_t *)EEPROM_PARM);
ldi r24, 0x36 ; 54
ldi r25, 0x00 ; 0
call 0xf9c0
eor r25, r25
eor r26, r26
eor r27, r27
sts 0x046B, r24
sts 0x046C, r25
sts 0x046D, r26
sts 0x046E, r27
Could it be instead:
ldi r24, 0x36 ; 54
ldi r25, 0x00 ; 0
call 0xf9c0
sts 0x046B, r24
sts 0x046C, r1
sts 0x046D, r1
sts 0x046E, r1
That would spare 6 bytes...
Bernard
_______________________________________________
avr-gcc-list mailing list
address@hidden http://www.avr1.org/mailman/listinfo/avr-gcc-list
_______________________________________________
avr-gcc-list mailing list
address@hidden
http://www.avr1.org/mailman/listinfo/avr-gcc-list