libunwind-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Libunwind-devel] Backtrace performance on x86-64 based on Dwarf info


From: Milian Wolff
Subject: [Libunwind-devel] Backtrace performance on x86-64 based on Dwarf info
Date: Wed, 28 May 2014 14:40:41 +0200
User-agent: KMail/4.13.1 (Linux/3.14.4-1-ARCH; KDE/4.13.1; x86_64; ; )

Hello list!

tl;dr; At the end of a backtrace, I encounter a IP=0x0 which triggers a very 
slow path in libunwind.

Long version:

I'm experimenting with libunwind and its working really well for me so far. My 
code can be found in [1] if interested. One issue I have so far is the 
performance of it all. I do local unwinding on a x86-64 machine, i.e. no frame 
pointers, only Dwarf info around. From libunwind, I just want to extract the 
backtrace as a list of instruction pointers, i.e. look at the void trace() 
function in [1], starting at loc 87.

When I run my tracer on any bigger application with many allocations in 
potentially multiple threads, the performance is bad. Worse than running the 
application in Valgrind's massif, even! I did not expect that, assuming 
Valgrind's CPU emulation and thread-synchronization would be far slower.

Profiling my tracer with perf shows tons of time spent in 
_ULx86_64_dwarf_find_proc_info and below. Esp. the syscalls related to 
sigprocmask are a hotspot:

# Samples: 20K of event 'syscalls:sys_enter_rt_sigprocmask'
# Event count (approx.): 20153
#
# Overhead   Command       Shared Object                            Symbol
# ........  ........  ..................  ................................
#
    99.49%    test_c  libc-2.19.so        [.] __sigprocmask               
              |
              --- __sigprocmask
                 |          
                 |--99.83%-- _ULx86_64_dwarf_find_proc_info
                 |          fetch_proc_info.lto_priv.18
                 |          _ULx86_64_dwarf_find_save_locs
                 |          _ULx86_64_step
                 |          (anonymous namespace)::trace(int) [clone 
.constprop.250]
                 |          (anonymous namespace)::Data::handleMalloc(void*, 
unsigned long)
                 |          malloc
                 |          main
                 |          __libc_start_main
                 |          _start
                 |          0x0
                  --0.17%-- [...]

I then tried to figure out whats going on in libunwind, keep in mind that I 
have no experience with the code base whatsoever. I recompiled libunwind with 
debug mode enabled and then ran my trace script with UNW_DEBUG_LEVEL=14. I 
then see that at the end of every backtrace, I get something like this:

 >_ULx86_64_init_local: (cursor=0x7fff0c813870)
 >_ULx86_64_step: (cursor=0x7fff0c813870, ip=0x00007f5034e46b2d, 
cfa=0x00007fff0c8134b0)
     >_ULx86_64_reuse_frame: reuse frame ip=0x7f5034e46b2d cfa=0x7fff0c8134b0 
format=0 addr=0x0 offset=+0
  >_ULx86_64_step: returning 1
 >_ULx86_64_step: (cursor=0x7fff0c813870, ip=0x00007f5034e47c1e, 
cfa=0x00007fff0c813ca0)
     >_ULx86_64_reuse_frame: reuse frame ip=0x7f5034e47c1e cfa=0x7fff0c813ca0 
format=0 addr=0x0 offset=+0
  >_ULx86_64_step: returning 1
 >_ULx86_64_step: (cursor=0x7fff0c813870, ip=0x00007f5034e484cd, 
cfa=0x00007fff0c813d50)
     >_ULx86_64_reuse_frame: reuse frame ip=0x7f5034e484cd cfa=0x7fff0c813d50 
format=0 addr=0x0 offset=+0
  >_ULx86_64_step: returning 1
 >_ULx86_64_step: (cursor=0x7fff0c813870, ip=0x000000000040094c, 
cfa=0x00007fff0c813d70)
     >_ULx86_64_reuse_frame: reuse frame ip=0x40094c cfa=0x7fff0c813d70 
format=0 addr=0x0 offset=+0
  >_ULx86_64_step: returning 1
 >_ULx86_64_step: (cursor=0x7fff0c813870, ip=0x00007f50348a1000, 
cfa=0x00007fff0c813d90)
     >_ULx86_64_reuse_frame: reuse frame ip=0x7f50348a1000 cfa=0x7fff0c813d90 
format=0 addr=0x0 offset=+0
  >_ULx86_64_step: returning 1
 >_ULx86_64_step: (cursor=0x7fff0c813870, ip=0x0000000000400829, 
cfa=0x00007fff0c813e50)
     >_ULx86_64_reuse_frame: reuse frame ip=0x400829 cfa=0x7fff0c813e50 
format=0 addr=0x0 offset=+0
  >_ULx86_64_step: returning 1
 >_ULx86_64_step: (cursor=0x7fff0c813870, ip=0x0000000000000000, 
cfa=0x00007fff0c813e58)
              >_ULx86_64_dwarf_find_proc_info: looking for 
IP=0xffffffffffffffff
              >_ULx86_64_dwarf_find_proc_info: IP=0xffffffffffffffff not found
             >_ULx86_64_step: dwarf_step() failed (ret=-10), trying frame-
chain
  >_ULx86_64_step: NULL %rbp loc, returning 0

So, if I'm not misunderstanding that output, it seems as if the very last 
frame triggers a call to dwarf_find_proc_info, leading to the performance 
issue.

Looking into it, I added debug output to the DWARF_IS_NULL_LOC branch in 
Gstep.c's unw_step. That branch is never taken. Yet still, somewhere ip is set 
to 0 before.

Indeed, if I add a check like this:

  if (!c->dwarf.ip)
    {
      return 0;
    }

to the beginning of unw_step, the excessive calls to sigprocmask etc. pp. are 
gone. Cool! But is this a valid approach? Generally, why is there a zero-IP? 
The good thing is that I can now check for that value in my trace() function 
and break the loop. This will then avoid the following costly call to 
unw_step.

[1]: 
http://quickgit.kde.org/?p=scratch%2Fmwolff%2Fmallocinfo.git&a=blob&h=46a785aa694c5f38bee56a5d78cba0016fb16fc1&hb=3f36142ac44cd43bac34b1d466c334b4fcb7e728&f=malloctrace.cpp

-- 
Milian Wolff
address@hidden
http://milianw.de



reply via email to

[Prev in Thread] Current Thread [Next in Thread]