bug-gdb
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Solaris Signal Trampolines and sigstep.exp failures


From: Steve Williams
Subject: Solaris Signal Trampolines and sigstep.exp failures
Date: Mon, 5 Dec 2005 18:32:42 -0800

Configuration:

sparc-sun-solaris10
gdb-6.4
gcc-3.4.3

R500.ramses.267> ./gdb --nx
GNU gdb 6.4
Copyright 2005 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.10".
(gdb)

Problem:

The sigstep.exp tests test the interaction of various forms of single
stepping and signal handling. For the tests to run completely successfully
the following two conditions must be true:

1. A signal can be delivered to a process during a single step operation.
2. The signal trampoline frame detection code can accurately detect the
entry to a trampoline and the exit from the trampoline.

Both the above conditions fail on Solaris. Leading to multiple failures in
sigstep.exp (and other tests, for example sigbpt.exp).

The first issue is:

The Solaris single stepping function is implemented using the /proc
filesystem and the PCRUN command with a PRSTEP flag.

All gdb tests that try to deliver a signal while single stepping hang
indefinitely. The reason is that signals pending against the process are
never delivered when single stepping. Investigation shows that if a non
single step based command such as "continue" is used, the signal is
delivered as expected. Use the following gdb command to see the problem:

./gdb --nx --command=gdb.cmd testsuite/gdb.base/sigstep

Where gdb.cmd contains:
br main
r
set done = 1
set itimer = itimer_real
break 66
continue
advance 65
break handler
step


Further investigation identified the specific scenario. If a PCRUN command
is issued with a flag of PRSTEP when the process is in the PR_FAULTED state,
any signals pending against the process are not delivered. If the process is
first transitioned to the PR_REQUESTED state, and a PCRUN command with
PRSTEP flag is now issued, the pending signals are delivered as expected.

I have a patch to implement the above fix.

The second issue is:

The Solaris Signal Trampoline detection code in sparc-sol2-tdep.c detects
the signal trampoline by looking for the functions sigacthandler,
ucbsigvechandler or __sighndlr in the next frame.

This is fine for detecting when you are in a stack frame reached via a
signal trampoline, but it does not work to provide accurate detection of the
beginning and end of the trampoline.

The Solaris10 signal trampoline looks something like this:

  sigacthandler
    call_user_handler
      unsleep_self
        setup_schedctl
          __schedctl
      set_parking_flag
      lmutex_lock
      lmutex_unlock
      sigaddset
        sigvalid
          __sigfillset
      __lwp_sigmask
        __systemcall6
      __sighndlr
        <user handler code called>
      setcontext
        __setcontext_syscall
          _syscall6

This only represents one path through the trampoline, based on signal number
and critical sections, the control flow can change or be deferred. As such
it is very difficult to track weather the current PC is inside a signal
trampoline using the function names of the implementation.

To make matters worse:

1. In the last two patch cluster updates, the signal trampoline mechanism
has changed, functions have been added then removed.

2. The call to call_user_handler reuses the frame of sigacthandler,
therefore sigacthandler cannot be detected on the stack.

Because of issue 2 above the handle_inferior_event incorrectly identifies a
call to call_user_handler in a signal trampoline at infrun.c:2364 as a
subroutine call, i.e. the sigacthandler frame is trashed and replaced with
call_user_handler frame, which is identified as a subroutine call of the
current frame.

Using the same test above(for issue 1), but turning on "set debug infrun 1"
will show that a call to call_user_handler is incorrectly identified as a
subroutine call.

This actually enables the stepping mechanism to step over signal handlers as
if they are subroutines, it works, but not as intended.

If the signal trampoline detection code is corrected, so that it can fully
detect a signal trampoline from beginning to end, it again fails, but now at
infrun.c:2557. It is detected that single stepping has stepped to a
different line, therefore stepping is stopped. It is correct that stepping
is on a different line, but according to the test the expected outcome
involves continuing to step through the user handler and out through the
signal trampoline until we return to the faulting instruction (and continue
stepping at that point if required).

The problems I see are:

1. The mechanism I implemented to identify the complete signal trampoline
includes the names of all possibly invoked functions and a backtrace
mechanism to ensure they were called from a signal handling function, i.e.
sigacthandler or call_user_handler. When the C library implementation
changes, this mechanism will break.

2. The logic in handle_inferior_event seems to be wrong for user signal
handling functions. If it is detected we are at a different line, then it
should be determined if this point was reached due to signal handling, if it
was, then continue stepping though the signal handler and any subsequently
called functions. I think this would require unwinding the frame stack
looking for a SIGTRAMP frame. The test at infrun.c:2348 could be modified to
not only look for a SIGTRAMP_FRAME in the current frame, but in any previous
frame too.

Alternative sigtramp detection:

Any fix depends on a reliable way to detect the signal trampoline. I think a
better way to detect the trampoline would be to use the proc filesystem. The
lwpstatus_t for the current lwp, or the representative lwp for the process
contains a member "pr_oldcontext". If the process or lwp is currently
handling a signal, this member will be non-null and will be the address of
the first ucontext_t on the inferior process stack. (If the process is
handling multiple nested signals the member uc_link in the ucontext_t will
be the address of the next context structure).

A signal trampoline could be reliably detected by just checking for the
presence of a pr_oldcontext in the lwpstatus. The correct ucontext could be
selected by comparing the frame stack pointer passed to the signal
trampoline detection code with the stack pointers saved in the ucontext.

I currently have no satisfactory patch for this problem, any additional
feedback regarding the way signal trampolines currently work in gdb for
Solaris and any change to use the /proc filesystem would be appreciated.

Regards,
Steve Williams

      

------------------------------------
UTStarcom Canada Co.
Stephen J Williams
Director System Development
address@hidden
4600 Jacombs Road
Richmond, British Columbia  V6V 3B1
Canada
tel: +1 (604) 720-2309
fax: +1 (604) 276-0501
mobile: +1 (604) 720-3325
------------------------------------





reply via email to

[Prev in Thread] Current Thread [Next in Thread]