testsuite/2033: sigbpt.exp fails on Solaris10 and Solaris9 (possibly others)

Steve Williams steve.williams@utstar.com
Tue Jan 3 22:28:00 GMT 2006


The following reply was made to PR testsuite/2033; it has been noted by GNATS.

From: "Steve Williams" <steve.williams@utstar.com>
To: <gdb-gnats@sources.redhat.com>
Cc:  
Subject: Re: testsuite/2033: sigbpt.exp fails on Solaris10 and Solaris9 (possibly others)
Date: Tue, 3 Jan 2006 14:24:24 -0800

 A related issue:
 
 Configuration:
 
 sparc-sun-solaris10
 gdb-6.4
 gcc-3.4.3
 
 R500.ramses.267> ./gdb --nx
 GNU gdb 6.4
 Copyright 2005 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain
 conditions.
 Type "show copying" to see the conditions.
 There is absolutely no warranty for GDB.  Type "show warranty" for details.
 This GDB was configured as "sparc-sun-solaris2.10".
 (gdb)
 
 Problem:
 
 The sigstep.exp tests test the interaction of various forms of single
 stepping and signal handling. For the tests to run completely successfully
 the following two conditions must be true:
 
 1. A signal can be delivered to a process during a single step operation.
 2. The signal trampoline frame detection code can accurately detect the
 entry to a trampoline and the exit from the trampoline.
 
 Both the above conditions fail on Solaris. Leading to multiple failures in
 sigstep.exp (and other tests, for example sigbpt.exp).
 
 The first issue is:
 
 The Solaris single stepping function is implemented using the /proc
 filesystem and the PCRUN command with a PRSTEP flag.
 
 All gdb tests that try to deliver a signal while single stepping hang
 indefinitely. The reason is that signals pending against the process are
 not delivered when single stepping. Investigation shows that if a non
 single step based command such as "continue" is used, the signal is
 delivered as expected. Use the following gdb command to see the problem:
 
 ./gdb --nx --command=gdb.cmd testsuite/gdb.base/sigstep
 
 Where gdb.cmd contains:
 br main
 r
 set done = 1
 set itimer = itimer_real
 break 66
 continue
 advance 65
 break handler
 step
 
 Further investigation identified the specific scenario. If a PCRUN command
 is issued with a flag of PRSTEP when the process is in the PR_FAULTED state,
 any signals pending against the process are not delivered. If the process is
 first transitioned to the PR_REQUESTED state, and a PCRUN command with
 PRSTEP flag is now issued, the pending signals are delivered as expected.
 
 I have a patch to implement the above fix.
 
 The second issue is:
 
 The Solaris Signal Trampoline detection code in sparc-sol2-tdep.c detects
 the signal trampoline by looking for the functions sigacthandler,
 ucbsigvechandler or __sighndlr in the next frame.
 
 This is fine for detecting when you are in a stack frame reached via a
 signal trampoline, but it does not work to provide accurate detection of the
 beginning and end of the trampoline.
 
 The Solaris10 signal trampoline looks something like this:
 
   sigacthandler
     call_user_handler
       unsleep_self
         setup_schedctl
           __schedctl
       set_parking_flag
       lmutex_lock
       lmutex_unlock
       sigaddset
         sigvalid
           __sigfillset
       __lwp_sigmask
         __systemcall6
       __sighndlr
         <user handler code called>
       setcontext
         __setcontext_syscall
           _syscall6
 
 This only represents one path through the trampoline, based on signal number
 and critical sections, the control flow can change or be deferred. As such
 it is very difficult to track whether the current PC is inside a signal
 trampoline using the function names of the implementation.
 
 To make matters worse:
 
 1. In the last two patch cluster updates, the signal trampoline mechanism
 has changed, functions have been added then removed.
 
 2. The call to call_user_handler reuses the frame of sigacthandler,
 therefore sigacthandler cannot be detected on the stack.
 
 Because of issue 2 above the handle_inferior_event incorrectly identifies a
 call to call_user_handler in a signal trampoline at infrun.c:2364 as a
 subroutine call, i.e. the sigacthandler frame is trashed and replaced with
 call_user_handler frame, which is identified as a subroutine call of the
 current frame.
 
 Using the same test above(for issue 1), but turning on "set debug infrun 1"
 will show that a call to call_user_handler is incorrectly identified as a
 subroutine call.
 
 This actually enables the stepping mechanism to step over signal handlers as
 if they are subroutines, it works, but not as intended.
 
 If the signal trampoline detection code is corrected, so that it can fully
 detect a signal trampoline from beginning to end, it again fails, but now at
 infrun.c:2557. It is detected that single stepping has stepped to a
 different line, therefore stepping is stopped. It is correct that stepping
 is on a different line, but according to the test the expected outcome
 is that stepping is continued through the user handler and out through the
 signal trampoline until we return to the faulting instruction.
 
 The problems I see are:
 
 1. A mechanism based on function names to identify the complete signal
 trampoline
 is prone to break when the C library implementation changes.
 
 2. The logic in handle_inferior_event seems to be wrong for user signal
 handling functions. If it is detected we are at a different line, then it
 should be determined if this point was reached due to signal handling, if it
 was, then continue stepping though the signal handler and any subsequently
 called functions. I think this would require unwinding the frame stack
 looking for a SIGTRAMP frame. The test at infrun.c:2348 could be modified to
 not only look for a SIGTRAMP_FRAME in the current frame, but in any previous
 frame too.
 
 An alternative sigtramp detection mechanism could be to use the proc
 filesystem. The
 lwpstatus_t for the current lwp, or the representative lwp for the process
 contains a member "pr_oldcontext". If the process or lwp is currently
 handling a signal, this member will be non-null and will be the address of
 the first ucontext_t on the inferior process stack. (If the process is
 handling multiple nested signals the member uc_link in the ucontext_t will
 be the address of the next context structure).
 
 A signal trampoline could be reliably detected by just checking for the
 presence of a pr_oldcontext in the lwpstatus. The correct ucontext could be
 selected by comparing the frame stack pointer passed to the signal
 trampoline detection code with the stack pointers saved in the ucontext.
 



More information about the Gdb-prs mailing list