avr-libc-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-libc-dev] Request: gcrt1.S with empty section .init9


From: Georg-Johann Lay
Subject: Re: [avr-libc-dev] Request: gcrt1.S with empty section .init9
Date: Sat, 07 Jan 2017 13:05:44 +0100
User-agent: Thunderbird 2.0.0.24 (Windows/20100228)

Marko Mäkelä schrieb:
Hallo Johann,

When you need optimizations at a lever where 2 instructions matter, then it's very likely you need a project specific start-up code and linker description anyway. For example, you might truncate the vector table after the last vector used by the application.

Good idea, thanks! I did think about the interrupt vector table already, and that approach would allow me to trim it too.

For an easy fix, you can

1) Set up own start-up code omitting .init9

2) Provide own linker description file without input section .init9

3) Or, as a quick fix: 3a) Link with -Wl,--unique=.init9 so that
  .init9 becomes an output section, and then 3b) drop it by means
  of avr-objcopy --remove-section=.init9 foo.elf

All of these approaches require main in .init8 or earlier.

Right. I successfully had put main in .init3 already before posting.

The quick fix 3) works and shortens the program by 4 words and reduces stack usage by 2 bytes. The .init9 section would be emitted to the end of the ELF binary and indeed omitted from the avr-objcopy output.

[snip]
gcrt1.S would need yet another #if __GNUC__ >= 7 or so, and because toolchain distributors are usually not aware of such subtleties, you will observe complaints of "brain dead misoptimization" à la

 CALL main
 JMP exit
 CALL main
 JMP exit

throughout avr forums all over the net if someone bundles avr-gcc with the new feature together with avr-libc without conditional removal.

Right, so the risk could be greater than the savings.

One last note: As you are coming straight from asm programming, you will have a hard time reading the compiler generated code.

Actually I write C and C++ code bigger systems for living.

Ya, but on such systems you won't dive into generated assembly and
propose library changes when you come across a pair of instructions
you don't need in your specific context :-P

The 8-bit processors are just a hobby, and my ‘first love’ is the 6502, not the AVR. I was happy to learn that the avr-llvm changes were recently merged to the upstream repository. The experimental AVR target for clang generates some code, but it still needs work. I am hoping that one day clang generates similar code as avr-gcc. Also clang++ works, which is nice if you watched the CppCon 2016 talks touting zero-overhead abstraction, such as these:
https://www.youtube.com/watch?v=zBkNBP00wJE
https://www.youtube.com/watch?v=uzF4u9KgUWI
https://www.youtube.com/watch?v=D7Sd8A6_fYU

Maybe your are shocked enough to jump into contributing to GCC :-)

Not an impossible idea, but I find the idea of LLVM more promising,

Many developers find llvm more attractive than gcc because it is
not GPL and the newer code "pure doctrine" C++, whereas gcc might
deliberately use macros (for host performance) which many developers
find disgusting.  When you are coding a backend, it hardly matters
whether you write XX_FOO (y) or xx.foo (y), what's paramount is that
you are able to express what you want to express and get your job
done w.r.t. target features, target code performance, etc.

As gcc supports way more hosts and targets, in particular in realm of
embedded, I cannot well what's the best choice.  But yes, llvm
appears to be much more attractive and appealing these days.

because it could be easier to add other 8-bit processor targets there.

ymmv

So far I found the generated code surprisingly good. I feared that GCC would target a ‘virtual machine’ with 32-bit registers, but that does

GCC targets a target, and the description should match the real hardware
as close as it can :-)

not seem to be the case, or there are good peephole optimizations in place, and my input is so simple. I am using the Debian package gcc-avr 1:4.9.2+Atmel3.5.3-1.

avr-gcc implements some peepholes, but imho peepholes are a last resort
optimization to clean up mess from other passes which didn't perform as
expected.

My only complaint is that avr-gcc does not allow me to assign a pointer to the Z register even when my simple program does not need that register for anything else:

register const __flash char* usart_tx_next asm("r28"); // better: r30:31

This is not the Z pointer, R28 is the Y register, which in turn might be
the frame pointer.  Even if avr-gcc allowed to reserve it globally, you
would get non-functional code.  Same with reserving Z.  __flash will
try to access via Z, and if you take that register away by fixing it
then the compiler will no more be able to use Z for its job.

My strong impression is that you are inventing hacks to push the
compiler into generating the exact same sequence as you would write
down as a smart assembler programmer.  Don't do it. You will come
up with code that it cluttered up with hard-to-maintain kludges
or it might even be non-functional (as with globally reserving
registers indispensable to the compiler).


ISR(USART_TX_vect)
{
 char c;
 if (!usart_tx_next);
 else if ((c = *usart_tx_next++))
   UDR0 = c;
 else
   usart_tx_next = 0;
}

In its current form, this program is generating quite a few push/pop to preserve the value of the Z register while copying the Y register to it.

ISRs will come with some overhead, which will add some performance drop
which will be noticeable in particular with small ISRs.  Part of the
overhead is that R0 and R1 are fixed and the compiler don't track their
contents, hence with be saved / restored; same for SREG.

You could write that ISR and avoid push / pop SREG by using CPSE, but
that needs asm, of course. Cf. PR20296.  Everybody familiar with avr
is aware of this, but also aware that it will be quite some work to
come up with optimal code.  The general recommendation is to use
assembler if you need specific instruction sequences.

Even if the compiler generated code that's close to optimal, it
would be very hard to force CPSE and block any other comparison
instructions provided respective code exists.

I got the impression that LLVM is a 16-bit (or wider) virtual machine. It could be an acceptable design choice, given that 8-bit processors usually have a 16-bit or wider address space. But currently llc (the LLVM-to-AVR translator) is lacking optimizations, generating very bloated code.

Best regards,

    Marko





reply via email to

[Prev in Thread] Current Thread [Next in Thread]