Bug 15573 - Decode fatal signals to show faulting address, access type, etc.
Summary: Decode fatal signals to show faulting address, access type, etc.
Status: NEW
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: 7.5
: P2 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-04 18:37 UTC by Andy Lutomirski
Modified: 2013-06-15 04:44 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andy Lutomirski 2013-06-04 18:37:33 UTC
I have a (buggy) program that segfaulted while running in gdb.  gdb said:

Program received signal SIGSEGV, Segmentation fault.

followed by a stacktrace.  If I weren't using gdb, my program's signal handler would have run and displayed a far more useful error message:

Caught fatal signal: Segmentation fault (Address not mapped to object [0x28])
Dying due to fatal signal Segmentation fault in pid 14030 / tid 14030
The error was "not mapped" at address 28. The CPU reported page not present reading from 28.

This is on x86_64.  That information comes from psiginfo (the first line) and a custom decoder that reads SEGV_MAPERR as "not mapped" and pulls the number 28 from siginfo (the first time) and cr2 (the second time).  The "page not present" part is the low bit of the error code (from ucontext); the alternative is "protection violation", which is a different error.  The "reading from" part is really quite handy when debugging; it distinguishes read faults from write faults.  The alternatives are "executing from" and "writing to".

Having gdb decode this information would save a lot of time tracking down bugs.
Comment 1 Pedro Alves 2013-06-05 09:08:43 UTC
Note you can get at the siginfo with "p $_siginfo".

cr2 is in mcontext_t, which is in ucontext.  I don't know how GDB could get at that info before the signal is actually delivered.
Comment 2 Pedro Alves 2013-06-05 09:44:43 UTC
BTW, OOC, is the code for that signal handler of yours something you could share?
Comment 3 Andy Lutomirski 2013-06-15 04:44:01 UTC
It looks more or less like this.  I can provide some kind of license if it'll be useful.

static void HandleFatalSignal(int sig, siginfo_t *info, void *context)
{
	/*
	 * This is x86-specific and insanely poorly (wrongly?) documented.
	 * I figured it by reading the kernel source.  --luto
	 */
	struct ucontext *uc = (struct ucontext *)context;
	struct sigcontext *sc = (struct sigcontext *)&uc->uc_mcontext;

	psiginfo(info, "Caught fatal signal");

	std::cerr << "Dying due to fatal signal " << strsignal(sig)
			  << " in pid " << getpid() << " / tid "
			  << syscall(SYS_gettid) << std::endl;

	char causebuf[128];
	sprintf(causebuf, "code %d", info->si_code);

	const char *cause = causebuf;
	if (info->si_code == SI_USER)
		cause = "kill/raise";
	else if (info->si_code == SI_KERNEL)
		cause = "generic error from kernel";
	else if (info->si_code == SI_QUEUE)
		cause = "sigqueue";
	else if (info->si_code == SI_TKILL)
		cause = "tkill/tgkill";

	if (sig == SIGSEGV || sig == SIGBUS) {
		if (sig == SIGSEGV && info->si_code == SEGV_MAPERR)
			cause = "not mapped";
		else if (sig == SIGSEGV && info->si_code == SEGV_ACCERR)
			cause = "access error";
		else if (sig == SIGBUS && info->si_code == BUS_ADRALN)
			cause = "alignment error";
		else if (sig == SIGBUS && info->si_code == BUS_ADRERR)
			cause = "bad physical address";
		else if (sig == SIGBUS && info->si_code == BUS_OBJERR)
			cause = "object error";
		/* damnit, glibc
		else if (sig == SIGBUS && info->si_code == BUS_MCEERR_AR)
			cause = "mce; action required";
		else if (sig == SIGBUS && info->si_code == BUS_MCEERR_AO)
			cause = "mce; action optional";
		*/

		void *cr2 = (void *)sc->cr2;

		// Decode the CPU error code (see Intel or AMD manual)
		const char *hw_reason = (sc->err & 1)
			? "protection violation"
			: "page not present";
		const char *access_type;
		if (sc->err & 0x10)
			access_type = "executing from";
		else if (sc->err & 0x2)
			access_type = "writing to";
		else
			access_type = "reading from";

		std::cerr << "The error was \"" << cause << "\" at address "
				  << (void *)info->si_addr << ". The CPU reported "
				  << hw_reason << ' ' << access_type << ' '
				  << cr2 << '.' << std::endl;
	} else if (sig == SIGTRAP) {
		if (info->si_code == TRAP_BRKPT)
			cause = "breakpoint";
		else if (info->si_code == TRAP_TRACE)
			cause = "trace trap";
		/* damnit, glibc
		else if (info->si_code == TRAP_BRANCH)
			cause = "process taken branch trap";  // whatever that is...
		else if (info->si_code == TRAP_HWBKPT)
			cause = "hw breakpoint/watchpoint";
		*/

		std::cerr << "The error was " << cause << std::endl;
	} else if (sig == SIGILL) {
		if (info->si_code == ILL_ILLOPC)
			cause = "illegal opcode";
		else if (info->si_code == ILL_ILLOPN)
			cause = "illegal operand";
		else if (info->si_code == ILL_ILLADR)
			cause = "illegal addressing mode";
		else if (info->si_code == ILL_ILLTRP)
			cause = "illegal trap";
		else if (info->si_code == ILL_PRVOPC)
			cause = "privileged opcode";
		else if (info->si_code == ILL_PRVREG)
			cause = "privileged register";  // not on x86...
		else if (info->si_code == ILL_COPROC)
			cause = "coprocessor error";  // yay '80s
		else if (info->si_code == ILL_BADSTK)
			cause = "internal stack error";

		std::cerr << "The error was " << cause << std::endl;
	} else {
		// TODO: We could also decode SIGFPE.

		std::cerr << "The error was " << cause << std::endl;
	}

#define SC(x) " " #x " = " << (void *)(uintptr_t)sc->x
	std::cerr << "Signal context:" << SC(rip) << '\n'
			  << SC(rax) << SC(rbx) << SC(rcx) << SC(rdx) << '\n'
			  << SC(rsi) << SC(rdi) << SC(rbp) << SC(rsp) << '\n'
			  << SC(r8) << SC(r9) << SC(r10) << SC(r11) << '\n'
			  << SC(r12) << SC(r13) << SC(r14) << SC(r15) << '\n'
			  << SC(eflags) << SC(cs) << SC(gs) << SC(fs);
#undef SC
}