bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#43389: 28.0.50; Emacs memory leaks using hard disk all time


From: Carlos O'Donell
Subject: bug#43389: 28.0.50; Emacs memory leaks using hard disk all time
Date: Wed, 25 Nov 2020 13:57:34 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0

On 11/25/20 1:03 PM, Eli Zaretskii wrote:
>> Cc: bugs@gnu.support, fweimer@redhat.com, 43389@debbugs.gnu.org,
>>  dj@redhat.com, michael_heerdegen@web.de
>> From: Carlos O'Donell <carlos@redhat.com>
>> Date: Wed, 25 Nov 2020 12:45:04 -0500
>>
>> On 11/24/20 11:07 AM, Eli Zaretskii wrote:
>>> Look at the large chunks in the tail of this.  Together, they do
>>> account for ~2GB.
>>>
>>> Carlos, are these chunks in use (i.e. allocated and not freed), or are
>>> they the free chunks that are available for allocation, but not
>>> released to the OS?  If the former, then it sounds like this session
>>> does have around 2GB of allocated heap data, so either there's some
>>> allocated memory we don't account for, or there is indeed a memory
>>> leak in Emacs.  If these are the free chunks, then the way glibc
>>> manages free'd memory is indeed an issue.
>>
>> These chunks are all free and mapped for use by the algorithm to satisfy
>> a request by the application.
> 
> So we have more than 1.5GB free memory available for allocation, is
> that right?

There are 3 malloc_info traces in the log.

1. Lines 47-219. Day 1: 1100MiB of RSS.
2. Lines 386-556. Day 4: 2.3GiB of RSS.
3. Lines 744-792. Day 5: 4.2GiB of RSS.

Lines are numbered for the log starting at 1.
 
> To make sure there are no misunderstandings, I'm talking about this
> part of the log:

Your analysis is for trace #2, lines 386-556.

My analysis was for trace #3, lines 744-792.

>   <heap nr="0">
>   <sizes>
>     [...]
>     <size from="10753" to="12273" total="11387550" count="990"/>
>     <size from="12289" to="16369" total="32661229" count="2317"/>
>     <size from="16385" to="20465" total="36652437" count="2037"/>
>     <size from="20481" to="24561" total="21272131" count="947"/>
>     <size from="24577" to="28657" total="25462302" count="958"/>
>     <size from="28673" to="32753" total="28087234" count="914"/>
>     <size from="32769" to="36849" total="39080113" count="1121"/>
>     <size from="36865" to="40945" total="30141527" count="775"/>
>     <size from="40961" to="65521" total="166092799" count="3119"/>
>     <size from="65537" to="98289" total="218425380" count="2692"/>
>     <size from="98321" to="131057" total="178383171" count="1555"/>
>     <size from="131089" to="163825" total="167800886" count="1142"/>
>     <size from="163841" to="262065" total="367649915" count="1819"/>
>     <size from="262161" to="522673" total="185347984" count="560"/>
>     <size from="525729" to="30878897" total="113322865" count="97"/>
>     <unsorted from="33" to="33" total="33" count="1"/>
>   </sizes>
> 
> If I sum up the "total=" parts of these large numbers, I get 1.6GB.
> Is this free memory, given back to glibc for future allocations from
> this arena, and if so, are those 1.6GB part of the 4.2GB total?

In trace #2 we have these final statistics:

549 <total type="fast" count="39" size="2656"/>
550 <total type="rest" count="44013" size="1755953515"/>
551 <total type="mmap" count="6" size="121565184"/>
552 <system type="current" size="2246778880"/>
553 <system type="max" size="2246778880"/>
554 <aspace type="total" size="2246778880"/>
555 <aspace type="mprotect" size="2246778880"/>
556 </malloc>

This shows ~1.7GiB of unused free chunks. Keep in mind glibc malloc is a
heap-based allocator so if you have FIFO usage pattern you won't see the kernel
heap decrease until you free the most recently allocated chunk. In trace #3 we 
*do* see that application demand consumes all these free chunks again, so
something is using them in the application. There are none left reported in
the malloc_info statistics (could also be chunk corruption).

During trace #2 the only way to free some of the ~1.7GiB in-use by the algorithm
is to call malloc_trim() to free back unused pages (requires free/unsorted chunk
walk and mmumap() calls to the kernel to reduce RSS accounting). Calling 
malloc_trim
is expensive, particularly if you're just going to use the chunks again, as
appears to be happening the next day.

In trace #3, for which we are at 4.2GiB of RSS usage, we see the following:

742 ;; malloc-info
743 (malloc-info)
744 <malloc version="1">
745 <heap nr="0">
746 <sizes>
747 </sizes>
748 <total type="fast" count="0" size="0"/>
749 <total type="rest" count="1" size="112688"/>

a. Arena 0 (kernel heap) shows 0KiB of unused fast bins, 112KiB of other
   in 1 bin (probably top-chunk).

750 <system type="current" size="4243079168"/>
751 <system type="max" size="4243079168"/>
752 <aspace type="total" size="4243079168"/>
753 <aspace type="mprotect" size="4243079168"/>

b. Arena 0 (kernel heap) shows 4.2GiB "current" which means that the
   sbrk-extended kernel heap is in use up to 4.2GiB.
   WARNING: We count "foreign" uses of sbrk as brk space, so looking for
   sbrk or brk by a foreign source is useful.

754 </heap>
755 <heap nr="1">
756 <sizes>
757   <size from="17" to="32" total="32" count="1"/>
758   <size from="33" to="48" total="240" count="5"/>
759   <size from="49" to="64" total="256" count="4"/>
760   <size from="65" to="80" total="160" count="2"/>
761   <size from="97" to="112" total="224" count="2"/>
762   <size from="33" to="33" total="231" count="7"/>
763   <size from="49" to="49" total="294" count="6"/>
764   <size from="65" to="65" total="390" count="6"/>
765   <size from="81" to="81" total="162" count="2"/>
766   <size from="97" to="97" total="97" count="1"/>
767   <size from="129" to="129" total="516" count="4"/>
768   <size from="161" to="161" total="644" count="4"/>
769   <size from="209" to="209" total="1254" count="6"/>
770   <size from="241" to="241" total="241" count="1"/>
771   <size from="257" to="257" total="257" count="1"/>
772   <size from="305" to="305" total="610" count="2"/>
773   <size from="32209" to="32209" total="32209" count="1"/>
774   <size from="3982129" to="8059889" total="28065174" count="6"/>
775   <unsorted from="209" to="4020593" total="4047069" count="13"/>
776 </sizes>
777 <total type="fast" count="14" size="912"/>
778 <total type="rest" count="61" size="42357420"/>
779 <system type="current" size="42426368"/>
780 <system type="max" size="42426368"/>
781 <aspace type="total" size="42426368"/>
782 <aspace type="mprotect" size="42426368"/>
783 <aspace type="subheaps" size="1"/>

c. Arena 1 has 42MiB of free'd chunks for use.

784 </heap>
785 <total type="fast" count="14" size="912"/>
786 <total type="rest" count="62" size="42470108"/>
787 <total type="mmap" count="9" size="208683008"/>

d. We have:
   - 912KiB of fast bins.
   - 42MiB of regular bins.
   - 200MiB of mmap'd large chunks.

788 <system type="current" size="4285505536"/>
789 <system type="max" size="4285505536"/>
790 <aspace type="total" size="4285505536"/>

e. Total allocated space is 4.2GiB.

791 <aspace type="mprotect" size="4285505536"/>
792 </malloc>

Something is using the kernel heap chunks, or calling sbrk/brk
directly (since foreign brks are counted by our statistics).

>> This shows the application is USING memory on the main system heap.
>>
>> It might not be "leaked" memory since the application might be using it.
>>
>> You want visibility into what is USING that memory.
>>
>> With glibc-malloc-trace-utils you can try to do that with:
>>
>> LD_PRELOAD=libmtrace.so \
>> MTRACE_CTL_FILE=/home/user/app.mtr \
>> MTRACE_CTL_BACKTRACE=1 \
>> ./app
>>
>> This will use libgcc's unwinder to get a copy of the malloc caller
>> address and then we'll have to decode that based on a /proc/self/maps.
>>
>> Next steps:
>> - Get a glibc-malloc-trace-utils trace of the application ratcheting.
>> - Get a copy of /proc/$PID/maps for the application (shorter version of 
>> smaps).
>>
>> Then we might be able to correlate where all the kernel heap data went?
> 
> Thanks for the instructions.  Would people please try that and report
> the results?
> 


-- 
Cheers,
Carlos.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]