[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-libc-dev] Interested in 64-bit printf support?

From: Georg-Johann Lay
Subject: Re: [avr-libc-dev] Interested in 64-bit printf support?
Date: Thu, 8 Dec 2016 14:16:17 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0

On 08.12.2016 00:34, George Spelvin wrote:
Georg-Johann Lay <address@hidden> wrote:
The algo is rather slow because it always iterates over all
digits, i.e. it won't run faster for small numbers.

Have fun!

Code size is ~140 bytes.

Well, it's bigger (140 > 124), slower, and doesn't handle sizes *other*
than 64 bits, so that's not terribly useful.

I think you could shrink it a bit, replacing these 16 instructions of messy
digit output code (why are you looping incrementing DIGIT2 when you know it
is never more than 1?):

It's a transcript from antique C-code for 32-bit values, which
don't have this nice property.  I shouldn't write code to late in
the evening.

And I didn't actually intend nor expect to beat your code, just was
interested in how far it can be pushed...

        clr     DIGIT2
        inc     DIGIT2
        subi    DIGIT, 10
        brcc    1b

        brts    2f
        ;; T = 0 is the first round.  Output the high digit if it's not '0'.
        subi    DIGIT2, 1-'0'
        ;; Initialize nonZero.  We only output digits if we saw a digit != '0'.
        mov     nonZero, DIGIT2
        cpi     nonZero, '0'
        breq    2f
        st      X+, DIGIT2
        ;; Output digits except the highest (except that for 10^19).
        subi    DIGIT, -10-'0'
        or      nonZero, DIGIT
        ;; We only output digits if we saw a digit != '0', i.e. strip leading 
        cpi     nonZero, '0'
        breq    3f
        st      X+, DIGIT

With these 9 instructions:
        cpi     DIGIT, 10       ;; First "digit" can be as high as 18
        brcs    2f
        ldi     nonZero, '1'    ;; '1' is non-zero, which is perfect
        st      X+, nonZero
        subi    DIGIT, 10
        or      nonZero, DIGIT
        breq    3f              ;; Don't print leading zeros
        subi    DIGIT, -10-'0'
        st      X+, DIGIT

With this, you can also delete the leading clt.  It eliminates DIGIT2,
but unfortunately that doesn't save a spill.  You also have to adapt
the final "lone zero" printing code to print if nonZero == 0, but that's
the same size.

Also, this is just silly:
    dec     Count
    cpse    Count, Zero
    rjmp    .Loop

"dec" sets the zero flag, so that can just be "dec Count $ brnz .Loop".

And finally, your multiply loop is wasting two instructions:

    mul A0,Ten  $  mov A0,r0  $  add A0,Cy  $  mov Cy,r1  $  adc Cy,Zero
    mov __tmp_reg__,A0
    mov A0,A1   $  mov A1,A2  $  mov A2,A3  $  mov A3,A4
    mov A4,A5   $  mov A5,A6  $  mov A6,A7  $  mov A7,__tmp_reg__

"mov A0,r0" and "mov __tmp_reg__,A0" are cancelling each other out

Nice spotting!

and should both be deleted (with the "A0 += Cy" adjusted to add to r0,
of course).  Just make it:

    mul A0,Ten  $  mov A0,r0  $  add r0,Cy  $  adc r1,Zero  $  mov Cy,r1
    mov A0,A1   $  mov A1,A2  $  mov A2,A3  $  mov A3,A4
    mov A4,A5   $  mov A5,A6  $  mov A6,A7  $  mov A7,r0

That saves 22 bytes, leaving it 6 bytes smaller than mine.  Nice to have 

The reworked version comes up with 110 bytes (still asserting MUL).

But I like your approach more, as it does not rely on special properties
of the numbers and comes with some nice ideas.

perf-metering with avrtest reveals a run time from ~3100 up to < 4800 ticks; high as expected.


Attachment: put64.S
Description: Text document

reply via email to

[Prev in Thread] Current Thread [Next in Thread]