tinycc-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] Minimizing libtcc memory use


From: Eric Raible
Subject: Re: [Tinycc-devel] Minimizing libtcc memory use
Date: Wed, 6 Mar 2024 16:45:24 -0600

I'm sorry for this delayed response.  I've been camping, no wifi for a minute...

On Mon, Mar 4, 2024 at 3:47 AM grischka via Tinycc-devel <tinycc-devel@nongnu.org> wrote:
On 03.03.2024 21:26, Eric Raible wrote:
>  > isn't there a garbage collecting done at the end to remove all the unused stuff
>  > to produce a binary that contains only the necessary parts ?
>
> That very well might be the case, but given that tcc_get_symbol()
> can be used at any time between tcc_relocate() and tcc_delete(),
> it follows that _at least_ symbols are resident in the TCCState.
> What I'm wondering about is the feasibility of keeping just code and
> data, and flushing everything else.  This would require a new API -
> something like tcc_finalize(TCCState *) or perhaps
> tcc_finalize(TCCState *, flags), where flags specify what to flush.

Me thinks this discussion is going around in circles.

I'm sorry for that.  tcc as it has been has been tremendously useful to me,
and all I'm trying to do is help.  Btw, I was not advocating for any changes
to tcc_relocate() except that once it became a 1-arg function I suggested
that it could be eliminated and tcc_add_symbol() could call it on demand.

Your tcc_finalize() subject of obsession was what we already had
all the time since 0.9.25, that is the option to allocate the
executable code separately from the state and therefor the option
to delete (finalize) the state and still keep the code.

I understand that now.  I had previously used only automatic allocation
in tcc_relocate(), and since it worked I never explored further.  My main
interest was for tcc_set_realloc(), and I am happy that it was "approved".

...
At which point I found that maintaining two options for running
code might not be to the best for both sides, neither for tcc wrt.
maintainability nor for users wrt. avoidance of inconvenience.
...
Where simple plus complicated is more complicated than just
complicated btw.

I understand the frustration and agree that maintainability is a
crucial consideration.
 
> I don't know enough about the internals, but if I'm willing to run with
> CONFIG_RUNMEM_RO, it seems like the per TCCState memory use in my case
> could be decreased from something like 29K to 1K or 2K.
>
> I should mention that the memory usage in my case is 29K regardless
> of whether CONFIG_RUNMEM_RO is 0 or 1.

How do you know this?

B.c I installed a custom allocator with tcc_set_realloc().
My exact case is hard to reproduce except for me, but the
following standalone example illustrates that mem_cur_size
might be incorrect:
#if 0

Everything here assumes the following patch:

diff --git a/libtcc.c b/libtcc.c
index 6d720e74..332ec13d 100644
--- a/libtcc.c
+++ b/libtcc.c
@@ -844,6 +844,12 @@ LIBTCCAPI TCCState *tcc_new(void)
 
 LIBTCCAPI void tcc_delete(TCCState *s1)
 {
+    if (s1->do_bench) {
+        fprintf(stderr, "Top of tcc_delete()\n");
+        tcc_print_stats(s1, 0);
+        fprintf(stderr, "Early return from tcc_delete() - expect leaks!\n");
+        return;
+    }
     /* free sections */
     tccelf_delete(s1);
 
@@ -2209,5 +2215,6 @@ PUB_FUNC void tcc_print_stats(TCCState *s1, unsigned total_time)
     }
 #endif
     fprintf(stderr, " %d max (bytes)\n", mem_max_size);
+    fprintf(stderr, "mem_cur_size=%d (bytes)\n", mem_cur_size);
 #endif
 }
 

Running this program under valgrind on my arm debian with either these
configs seems to show that mem_max_size is undercounting.

./configure --extra-cflags="-DMEM_DEBUG"
./configure --extra-cflags="-DMEM_DEBUG -DCONFIG_RUNMEM_RO=0"

#endif

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include "libtcc.h"

typedef struct block {
    unsigned long size;
    char data[];
} block;

#define _B2S(block)        ((block) ? (block)->size : 0)
#define _B2D(block)        ((block) ? (block)->data : 0)
#define _D2B(data)        ((data)  ? (block *)(data) - 1 : 0)
#define _D2S(data)        _B2S(_D2B(data))

static unsigned int blocks, mcalls, rcalls, fcalls, bytes, max;

static void *reallocator(void *data, unsigned long want)
{
    block *b = _D2B(data);
    int had =  _D2S(data);

    if (want) {
        // some accounting
        if (had) {
            ++rcalls;
            bytes -= had + sizeof b;
        } else {
            ++blocks;
            ++mcalls;
        }
        bytes += want + sizeof b;
        if (max < bytes) max = bytes;

        b = realloc(b, want + sizeof b);
        b->size = want;
    } else if (had) {
        --blocks;
        ++fcalls;
        bytes -= had + sizeof b;
        free(b);
        b = 0;
    }
    return _B2D(b);
}

static void fatal(char *msg)
{
    fprintf(stderr, "%s\n", msg);
    exit(1);
}

int main()
{
    tcc_set_realloc(reallocator);
    TCCState *state = tcc_new();
    if (!state)
        fatal("excuse me");
    tcc_set_options(state, "-bench");
    tcc_set_output_type(state, TCC_OUTPUT_MEMORY);
    if (tcc_compile_string(state, "int x, y, z=1;\n"
                                  "int foo(int arg) { return arg + z; }"))
        fatal("sorry");
    if (tcc_relocate(state))
        fatal("so sorry");

    int (*foo)(int) = tcc_get_symbol(state, "foo");
    if (!foo || foo(41) != 42)
        fatal("I am truly sorry");

    printf("Custom 1: %d bytes, %d max, mcalls %d, rcalls %d, fcalls %d, blocks %d\n", bytes, max, mcalls, rcalls, fcalls, blocks);
    tcc_delete(state);

    // This one is only useful if the above patch is NOT applied
    printf("Custom 2: %d bytes, %d max, mcalls %d, rcalls %d, fcalls %d, blocks %d\n", bytes, max, mcalls, rcalls, fcalls, blocks);
    return 0;
}
That's it for now, sorry for the long message.
- Eric

reply via email to

[Prev in Thread] Current Thread [Next in Thread]