[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-libc-dev] Interested in 64-bit printf support?

From: George Spelvin
Subject: Re: [avr-libc-dev] Interested in 64-bit printf support?
Date: 6 Dec 2016 10:50:29 -0500

Thanks for the comments!  May of your comments have been obviated by my
later work (particularly the assembler version), but I really appreciate
them anyway.

I should mention that the code you commented on was a proof of concept.
It's something that *could be adapated* to avr-libc, so a couple of your
code style comments are appreciated, but premature.

When writing the first version that you saw, I was trying to keep the asm()
blocks as small as practicable and let gcc handle the parts that could
be expressed well in C.

Then I had a look at the generated asm and was very sad. :-(
Gcc *could not* be convinced to put the output and input pointers in
the X and Z register pairs where they belonged, but kept copying
the values in and out.

So I rewrote the whole thing in asm and produced a massive improvement
in code size.

> It it possible to do it with the same code and not special-casing 
> different radices?  Even if the code runs slower (which is not really 
> important for output routines) the code size might drop.

That's certainly possible, but that uses many calls to a general 16/8->8+8
bit division algorithm and is *hugely* slower.  Still, I could try coding
it up for a "tiny" version.

> Usually, we are after code-size, not speed; in particular for such 
> output and conversion routines.

That part, I understand, although I'm so used to coding the other way
around that it may take me a while to unlearn bad habits. :-)
Thank you very much for pointing out my mistakes in this area!

>> #if __AVR_ARCH__

> In the library you'll have to protect the code against AVR_TINY, i.e.
> #ifndef __AVR_TINY__

Thank you!  This is one of those "premature" comments.  Helpful to me for
the future, but I wasn't trying to fit the library code style *at all* with
this code.

>> typedef _Bool bool;
>> typedef unsigned char uint8_t;
>> typedef unsigned short uint16_t;
>> #define false (bool)0
>> #define true (bool)1

> Dunno how libc's modules are compiled, but they are using stdint.h all 
> over the place, so stdbool.h and stdint.h should be in order, no?

At the time, I *only* had avr-gcc and not avr-libc installed, and *no*
#include directives worked (not even the compiler-supplied ones like
those), so I just stubbed them manually as you see.  For a "release"
version, you're absolutely right, of course.

> Again, we can safe code size by slightly slowing things down, e.g.
> mod5 (uint8_t x)
> {
> #if __AVR_ARCH__
>      asm ("0: $ subi %0,%1 $ brcc 0b $ subi %0,%n1" : "+d" (x) : "n" (35));
>      asm ("0: $ subi %0,%1 $ brcc 0b $ subi %0,%n1" : "+d" (x) : "n" (5));
>      return x;
> #else
>      ...
> The intermediate step via 35 is not essential, it's just a speed-up.

Damn, nice spotting!

When writing that, I though od the "subtract/skip-if-no-carry/add-back"
sequence, but it was the same three instructions as the "compare/skip/suntract"
sequence.  I totally missed that the former makes a *loop* possible!

(And the "n" modifier isn't documented either!  Damn gcc.)

And good call on "35".  Being the geometric average of 5 and 255, it's
obviously the "middle" value, but I did a more careful measurement.  I summed
the total number of backward branches for all x from 1..255, and tried
various values of the intermediate threshold.  It comes out:

20   1890
25   1686
30   1586
35   1554
40   1554
45   1586
50   1656
55   1686
60   1762
65   1890

>>              digit = 255;    /* Could also be zero */

> If 0 is just as fine, use 0.  I am getting better code size and less 
> stack usage (using -mcall-prologues).

It's the math geek in me liking the property that 1 <= digit <= 255.
That's preserved by the summation loop (overflow to 256 gets wrapped to
1), so it's "natural" to start the loop with that invariant even though
it doesn't really matter.

As long as it ends up in a caller-saved register, it's in the high registers
and there's no advantage to "clr" over "ldi".

(I wonder if I initialize it to 1, I could reduce to 1..5, which corresponds to
the desired digits 0..4, replacing some reduction with an unconditional
drcrement.  Could I fold the offset into the constant '0' or something?
Doesn't look like it would work, unfortunately...)

>>              asm(""
>>              "\n1:   ld __tmp_reg__,-%4"
>>              "\n     lsr %1"         /* lsbit to carry bit */
>>              "\n     ror __tmp_reg__"
>>              "\n     st %4,__tmp_reg__"
>>              "\n     rol %1"         /* carry bit to lsbit */
>>              "\n     add %0,__tmp_reg__"
>>              "\n     adc %0,__zero_reg__"    /* End-around carry */
>>              "\n     dec %2"
>>              "\n     brne 1b"
>>              "\n.ifnc %A3,%E4"
>>              "\n  .error \"GCC constraints not working as expected.\""
>>              "\n.endif"
>>              "\n.ifnc %B3,%F4"
>>              "\n  .error \"GCC constraints not working as expected.\""
>>              "\n.endif"
>>                      : "+&r" (digit), "+&r" (lsbit), "+&r" (i), "+e" (bin)
>>                      : "m" (*bin)
>>                      : "r0");

> I don't quite get this, shouldn't this be something like
>      asm ("\n1:  ld __tmp_reg__,-%a3"
>           "\n    lsr %1"                 /* lsbit to carry bit */
>           "\n    ror __tmp_reg__"
>           "\n    st %a3,__tmp_reg__"
>           "\n    rol %1"                 /* carry bit to lsbit */
>           "\n    add %0,__tmp_reg__"
>           "\n    adc %0,__zero_reg__"    /* End-around carry */
>           "\n    dec %2"
>           "\n    brne 1b"
>           : "+r" (digit), "+r" (lsbit), "+r" (i), "+e" (bin)
>           :
>           : "memory");
> IMO it's not worth the hassle to describe explicitly which portion of 
> memory gets changed, hence "memory".

Thank you!  I went to all that trouble because the 'a' modifier isn't 
documented in
https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/avr/avr.md?view=markup or
https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/avr/avr.c?view=markup and I 
know about it!  I could go from the X/Y/Z name to the rXX name with %E and %F, 
I didn't know any way to do the opposite!  The only way I could get X/Y/Z was 
to use
an 'm" constraint and then do all that Evil Hackery to try to tell gcc that I
was modifying the pointer.

Your way of expressing it is clearly much better.

> Try to avoid "m" constraints in your code, except for making side 
> effects on memory explicit, i.e. don't use the respective %-Operand in 
> the asm template as "m" allows much more address modes than any AVR 
> instruction can chew.

Um, really?  I know that it allows offset addressing modes as well,
but I thought that it matched exactly what the LD and ST instructions
could chew.

That is, X, -X, X+, Y, -Y, Y+, Y+{0..63}, Z, -Z, Z+, Z+{0..63}.

I know I was abusing it in my code (which wanted X, Y or Z only), but I thought
it *could* be used safely.

>>              if (digit > 4)
>>                      __builtin_unreachable();

> What's the purpose of this test? An optimization hint?

Exactly.  I knew the multiplication was easier because of the restricted
range of the input (in particular, since it's less than 15, you could
implement "<< 4" using a swap instruction) and wanted to tell gcc
about it.

> FYI, avrtest comes with such functions for you, you just have to link 
> against, say, exit-atmega128.o
> https://sourceforge.net/p/winavr/code/HEAD/tree/trunk/avrtest/
> avrtest is just a core simulator, i.e. has reduced functionality 
> compared to other simulators -- but it's /fast/ and well suited for such 
> algorithmic torture tests.

Ooh, thank you!  I hadn't looked at winavr at all, so I've been doing
everything with simulavr and it's definitely a pain.  I'm having to
count line numbers in the trace file.

Although since the existing avr-libc test suite is designed for simulavr,
I may end up sticking with it anyway.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]