avr-libc-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-libc-dev] Interested in 64-bit printf support?


From: Georg-Johann Lay
Subject: Re: [avr-libc-dev] Interested in 64-bit printf support?
Date: Tue, 6 Dec 2016 12:38:22 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0

On 04.12.2016 23:53, George Spelvin wrote:
The developers of the TICC time interval counter are having
a problem printing 64-bit integers:
https://github.com/TAPR/TICC/blob/master/TO-DO.txt

I went to work figuring out how to do that conversion, and after some
false starts, came up with some quite efficient arbitrary-precision
printing code.  It's small, fast, and works for any number of bytes.
(Implementation currently limited to 255 bytes.)

The code, in its current state, is appended; it's been tested extensively
on x86 and more lightly on simulavr.  Any comments from experienced AVR
programmers is definitely appreciated.  Like the current __ultoa_invert,
it formats the number backward in a caller-provided buffer.

It's (currently) slightly larger, but slightly faster than __ultoa_invert.
Timing in cycles for a simulated atmega1280:

Input           genprint        __ultoa_invert
0                               160
0xff            316             480
0xffff          584             792
0xffffff        1005            1260
0xffffffff      1434            1572
0xffffffffff    2024
48 ones         2626
56 ones         3286
64 ones         4103

I could just pass this over to the TICC project, but is there any interest
in me making the necessary overhauls to vfprintf to incorporate this?

I'd have to change vfprintf to pass around a pointer and length internally
rather than copying the value.  The printing code requires the input
in a modifiable little-endian buffer, but that's how varargs works on
AVR anyway.  (Adding the hex/octal and negative number handling is quite
simple.)

It it possible to do it with the same code and not special-casing different radices? Even if the code runs slower (which is not really important for output routines) the code size might drop.

If I get really sneaky, I could eliminate the output buffer by having
it push the formatted number on the stack and call down to an output
routine that supplies the necessary prefix padding once the length is
known. But that might not be doable in C unless there's a way to force
allocation of a stack frame.


(If anyone cares, I have some other code for 16- and 32-bit numbers based
on the ideas at http://homepage.divms.uiowa.edu/~jones/bcd/decimal.html.
It converts 2 digits at a time, using a 100-byte binary-to-BCD lookup
table, but is a lot larger, extending it to 64 bits is a mess, and doing
without a hardware multiplier bloats it with shift-and-add sequences.)

Usually, we are after code-size, not speed; in particular for such output and conversion routines.

#if __AVR_ARCH__

In the library you'll have to protect the code against AVR_TINY, i.e.

#ifndef __AVR_TINY__

typedef _Bool bool;
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
#define false (bool)0
#define true (bool)1

Dunno how libc's modules are compiled, but they are using stdint.h all over the place, so stdbool.h and stdint.h should be in order, no?

/* Reduce a byte mod 5. */
static uint8_t __attribute__((const))
mod5(uint8_t x)
{
#if __AVR_ARCH__
        uint8_t tmp;
        /* Reduce x mod 15.  We add the high halves, to get a carry. */
        asm(    "mov __tmp_reg__,%0"
        "\n        swap __tmp_reg__"
        "\n        cbr %0,15"
        "\n        add %0,__tmp_reg__"
        "\n        cbr %0,15"
        "\n        swap %0"   /* Swap and add carry bit to low */
        "\n        adc %0,__zero_reg__"       /* End-around carry */
                : "+d" (x) :: "r0");
#else
        x = (x >> 4) + (x & 0xf);     /* 1 <= x <= 30 */
        x = (x >> 4) + (x & 0xf);     /* 1 <= x <= 15 */
#endif
        /* 1 <= x <= 15; now for final reduction mod 5 */
        if (x >= 10)
                x -= 10;        /* 0 <= x <= 9 */
        if (x >= 5)
                x -= 5;         /* 0 <= x <= 4 */
        return x;
}

Again, we can safe code size by slightly slowing things down, e.g.

mod5 (uint8_t x)
{
#if __AVR_ARCH__
    asm ("0: $ subi %0,%1 $ brcc 0b $ subi %0,%n1" : "+d" (x) : "n" (35));
    asm ("0: $ subi %0,%1 $ brcc 0b $ subi %0,%n1" : "+d" (x) : "n" (5));
    return x;
#else
    ...

The intermediate step via 35 is not essential, it's just a speed-up.

        do {
                uint16_t accum;
                uint8_t i = len;
                bool lsbit = false;

                /*
                 * Pass 1, msb-to-lsb: divide bin[] by 2, and return the
                 * value mod 10 (the new value mod 5, combined with the
                 * previous lsbit).
                 */
                digit = 255;    /* Could also be zero */

If 0 is just as fine, use 0. I am getting better code size and less stack usage (using -mcall-prologues).

#if __AVR_ARCH__
                /*
                 * The one implementation hassle is the need for two
                 * carry bits.  One for the shift, and the other for
                 * the end-around carries modulo 255.
                 *
                 * Unfamiliar AVR inline asm feature: given a memory
                 * constraint, %2 prints X, Y or Z.  %E2 and %F2 are the
                 * low and high registers of the pointer, respectively.
                 * So if %2 = Z, then %E2 = r30 and %F2 = r31.
                 */
                asm(""
                "\n1:      ld __tmp_reg__,-%4"
                "\n        lsr %1"            /* lsbit to carry bit */
                "\n        ror __tmp_reg__"
                "\n        st %4,__tmp_reg__"
                "\n        rol %1"            /* carry bit to lsbit */
                "\n        add %0,__tmp_reg__"
                "\n        adc %0,__zero_reg__"       /* End-around carry */
                "\n        dec %2"
                "\n        brne 1b"
                "\n.ifnc %A3,%E4"
                "\n  .error \"GCC constraints not working as expected.\""
                "\n.endif"
                "\n.ifnc %B3,%F4"
                "\n  .error \"GCC constraints not working as expected.\""
                "\n.endif"
                        : "+&r" (digit), "+&r" (lsbit), "+&r" (i), "+e" (bin)
                        : "m" (*bin)
                        : "r0");

I don't quite get this, shouldn't this be something like

    asm ("\n1:  ld __tmp_reg__,-%a3"
         "\n    lsr %1"                 /* lsbit to carry bit */
         "\n    ror __tmp_reg__"
         "\n    st %a3,__tmp_reg__"
         "\n    rol %1"                 /* carry bit to lsbit */
         "\n    add %0,__tmp_reg__"
         "\n    adc %0,__zero_reg__"    /* End-around carry */
         "\n    dec %2"
         "\n    brne 1b"
         : "+r" (digit), "+r" (lsbit), "+r" (i), "+e" (bin)
         :
         : "memory");

IMO it's not worth the hassle to describe explicitly which portion of memory gets changed, hence "memory".

Try to avoid "m" constraints in your code, except for making side effects on memory explicit, i.e. don't use the respective %-Operand in the asm template as "m" allows much more address modes than any AVR instruction can chew.


                if (digit > 4)
                        __builtin_unreachable();

What's the purpose of this test? An optimization hint?


#if __AVR_ARCH__
#define READ_PORT 0x38
#define WRITE_PORT 0x39
#define EXIT_PORT 0x3a

static uint8_t inline __attribute__((always_inline))
inb(uint8_t port)
{
        uint8_t x;
        asm("in %0,%1" : "=r" (x) : "I" (port));
        return x;
}

static void inline __attribute__((always_inline))
outb(uint8_t port, uint8_t x)
{
        asm("out %0,%1" :: "I" (port), "r" (x));
}

static void
putch(char c)
{
        outb(WRITE_PORT, c);
}

static void
revwrite(char const *p, uint8_t len)
{
        while (len--)
                putch(*--p);
}

static void
revput(char *buf, uint8_t *bin, uint8_t len)
{
        char const *p = genprint(buf, bin, len);
        revwrite(p, p-buf);
        putch('\n');
}

static void
putst(char const __flash *p)
{
        char c;
        while ((c = *p++) != 0)
                putch(c);
        putch('\n');
}

static void __attribute__((noreturn))
my_exit(uint8_t status)
{
        outb(EXIT_PORT, status);
        __builtin_unreachable();
}

FYI, avrtest comes with such functions for you, you just have to link against, say, exit-atmega128.o

https://sourceforge.net/p/winavr/code/HEAD/tree/trunk/avrtest/

avrtest is just a core simulator, i.e. has reduced functionality compared to other simulators -- but it's /fast/ and well suited for such algorithmic torture tests.


Johann





reply via email to

[Prev in Thread] Current Thread [Next in Thread]