[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Libunwind-devel] Mixed stack unwinding bug on ppc64le
From: |
Stephen Roberts |
Subject: |
[Libunwind-devel] Mixed stack unwinding bug on ppc64le |
Date: |
Thu, 16 Aug 2018 09:26:17 +0000 |
Hi libunwind developers,
I'd like some advice about fixing a bug on ppc64 which prevents libunwind from
unwinding through mixed C/C++ stacks. Unwinding fails on this platform whenever
libunwind tries to unwind from a callee stack frame without Call Frame
Information (CFI) into caller frames with CFI information. This is a common
scenario on ppc64 because gcc does not produce CFI by default on this platform
while g++ still produces CFI to support exception handling. Any C++ application
which calls in to C library code compiled with the default flags can trigger
this issue.
This bug comes from the unw_step function in ppc64/Gstep.c. This function first
tries to unwind using CFI via dwarf_step and if this fails with -UNW_ENOINFO it
falls back to following the backchain and clobbers all registers in the
process. Any dwarf_step failure other than -UNW_ENOINFO causes unw_step to bomb
out immediately.
The mixed frame bug happens because when libunwind unwinds through the non-CFI
frame it will fall back to the backchain and clobber all registers. Then, when
it encounters a frame with CFI it will try to use this information to unwind
further, which promptly fails because all of the registers have been clobbered.
The return value of dwarf_step in this case is not -UNW_ENOINFO because
information was found, so unwinding is abandoned. This bug doesn't impact other
platforms because they make some attempt to guess the correct values of certain
registers after they are clobbered, which in most (but not all) cases will be
sufficient to allow CFI dwarf unwinding to resume.
I have two candidate solutions, both of which solve the issue but make
different compromises. My first solution is to include an "ignore_cfi" flag
inside the struct cursor from tdep-ppc64/libunwind_i.h.
This flag is initially false but set to true whenever dwarf_step fails with
-UNW_ENOINFO. It has the effect of skipping the dwarf_step logic entirely,
causing libunwind to stick to using back chain unwinding after it encounters
its first frame without CFI. This is a robust and simple solution, but I'm not
sure if this is the libunwind way to approach the issue.
My second solution is a little more invasive, but has the potential to provide
more information. The dwarf_step function uses dwarf_apply_registers internally
to 1) compute the new Canonical Frame Address (CFA) and 2) update the register
values. My fix effectively breaks the logic in dwarf_apply_registers out into
two helper functions, something like "dwarf_advance_cfa" and
"dwarf_update_registers".
The dwarf_step function still works exactly as before, but we can now use
"dwarf_update_registers" to update registers for a user-specified CFA. Given we
have a valid CFA from backchain unwinding, we call "dwarf_update_registers"
with our backchain-derived CFA to leave us with some register values even when
we've unwound from a non-CFA frame. The next time we unwind then we will
*probably* have the registers we need to continue to unwind successfully using
CFI. The caveat with this approach is it is possible for compilers to save
volatile registers in non-volatile registers rather than on the stack.
In this case, dwarf_update_registers would fail to retrieve the register-saved
value and would fail. This is rare in practice, and is still much better than
never being able to unwind. If we wanted we could be extra-safe and use both
solutions, with the ignore_cfi flag being set only when dwarf_update_registers
failed.
My first question is this: which approach is more in-keeping with the libunwind
philosophy? Would either (or both) of them be accepted upstream? I've tested
both approaches and found they both solve my issues on ppc64le, so either way
I'm happy.
A second question involves writing test cases. I have a test case that I run
outside the libunwind infrastructure, but it's proving difficult to make it run
within libunwind's own testing infrastructure. This bug only manifests when a
frame has no CFI, however the libunwind make system turns on -fexceptions
everywhere so this can't be tested easily. I know there are test-specific
CFLAGS, however these come first in the compile command so they are clobbered
by the global options. I think it makes sense to be able to vary the
exception/unwind compiler flags as part of testing as these control which
libunwind code paths are executed. Did I miss something? Is there some other
mechanism to set these flags for test cases?
Many thanks in advance,
Stephen Roberts
- [Libunwind-devel] Mixed stack unwinding bug on ppc64le,
Stephen Roberts <=