There is nothing wrong here: 11742 + 32 blocks * 96 = 14814
Where 96 is the size of tcc's mem_debug extra header.
If you want to see the 11742 from valgrind then you just need to
run the same example with a normal tcc compiled without MEM_DEBUG.
Which makes sense I would think.
100%
But when showing the example with MEM_DEBUG and -bench -vv I
did not expect you to doubt the numbers in the first place.
I shouldn't have. But once my allocator and valgrind agreed
I went down the wrong path, esp considering I was parsing the
output incorrectly.
Rather I just was trying to show how you could get some numbers
for your own real case instead. Which as you suspect could be
minimized from 29kB down to 1-2 kB. Most likely impossible but
if we had some numbers we could tell also why.
And showing me was helpful, I use it below to get some numbers.
I don't know enough about the internals of tcc to really even be having
this conversation. But I do know that my interactive application can
have many TCCStates, and with:
./configure --extra-cflags="-DMEM_DEBUG"
with a: tcc_set_options(state, "-Werror -vv -bench");
a simple hello world (single-TCCState) example reports:
---------------------------------------------
0: .text 0x14b57000 len 001bc align 1000
1: .
data.ro 0x14b571c0 len 00030 align 0008
2: .data 0x14b571f0 len 00078 align 0008
2: .bss 0x14b57268 len 00050 align 0008
2: .got 0x14b572b8 len 00030 align 0008
---------------------------------------------
protect rwx 0x14b57000 len 01000
---------------------------------------------
That looks great!
But tcc has actually allocated 21248, according to both my
(more sophisticated custom allocator) and valgrind
(for instance if I _don't_ delete the state).
So ~80% of the bytes are unaccounted for.
Scales exactly with the number of states, so it's not ideal.
My loaded C has a callback to register all of its functions
with the main application. After that, I have no need for
tcc_get_symbol() support, or in fact anything from libtcc
except for tcc_delete().
So I'm trying to pre-delete() all of the unneeded stuff that
tcc_delete() will eventually free anyway. I was calling that
tcc_finalize(). The goal is to reduce the minimal TCCState
size from ~21k to the 4K required for PROT_EXEC.
- Eric