libunwind-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

profiling libunwind


From: Mark Nelson
Subject: profiling libunwind
Date: Wed, 13 Oct 2021 15:33:10 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0

Hi list,


Recently I decided to re-implement my old gdb based poorman's (ie wallclock) profiler in c++ using libunwind directly since we've started having problems profiling Ceph using my hacky gdb version.  I've got a small prototype here:

https://github.com/markhpc/uwpmp


As soon as I got it working I took a stab at profiling a multithreaded application (ceph-osd) with about 80 threads.  It more or less worked fine and was somewhat faster than using gdb to grab callstacks, but I was hoping for more.  I decided to profile the profiler while it profiled.  Initially (see profile 1) I saw a fairly large amount of time spent lzma_decode for minidebuginfo along with time spent in find_proc_info due to lack of caching.  I disabled minidebuginfo and set a global cache with size 1024 and that did wonders, with ~95% of wallclock time now spent in get_proc_name (see profile 2).  I'm now trying to take this a bit further but wanted feedback from the list.  In the most recent trace, I see time in get_proc_name split mostly between get_proc_name_in_image and get_elf_image:

    ||+ 94.50% _Uelf64_get_proc_name
    || + 51.60% _Uelf64_get_proc_name_in_image
    || |+ 2.60% __strncpy_avx2
    || + 33.60% _Ux86_64_get_elf_image
    || |+ 26.50% __read
    || |+ 2.30% __munmap
    || |+ 1.10% __open64
    || |+ 0.40% close
    || |+ 0.20% __fxstat64
    || |+ 0.20% mmap64
    || + 7.30% __munmap
    || + 2.00% _Uelf64_load_debuglink
    ||  + 1.00% _Uelf64_load_debuglink
    ||  |+ 1.00% __open64
    ||  + 1.00% _Uelf64_find_section

Looking at the code, it appears that at least for get_proc_name_in_image, someone anticipated that this could be slow:

https://github.com/libunwind/libunwind/blob/master/src/elfxx.c#L271-L274

Most of the time in get_elf_image appears to be in __read.  Seems like another opportunity for gains?  Does anyone have any advice before I dig much deeper into this?  Especially anyone else that has started going down this path?

Thanks,

Mark

Attachment: uwpmp_self_profile_2.txt
Description: Text document

Attachment: uwpmp_self_profile_1.txt
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]