This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: [PATCH 2/6] Introduce throw_ptrace_error
- From: Mark Kettenis <mark dot kettenis at xs4all dot nl>
- To: palves at redhat dot com
- Cc: gdb-patches at sourceware dot org
- Date: Tue, 10 Mar 2015 15:53:04 +0100 (CET)
- Subject: Re: [PATCH 2/6] Introduce throw_ptrace_error
- Authentication-results: sourceware.org; auth=none
- References: <1425671886-7798-1-git-send-email-palves at redhat dot com> <1425671886-7798-3-git-send-email-palves at redhat dot com> <201503062103 dot t26L3tef004332 at glazunov dot sibelius dot xs4all dot nl> <54FA1EB3 dot 2050706 at redhat dot com> <201503082029 dot t28KToYr022852 at glazunov dot sibelius dot xs4all dot nl> <54FCC39C dot 6090302 at redhat dot com>
> Date: Sun, 08 Mar 2015 21:48:12 +0000
> From: Pedro Alves <palves@redhat.com>
>
> On 03/08/2015 08:29 PM, Mark Kettenis wrote:
>
> > I think your interpretation of ESRCH is too Linux-centric. You're
> > once again duct-taping around the Linux kernel's whoefully
> > insufficient threads debugging capabilities.
>
> Nice.
>
> > It really should not be
> > possible for a thread to just disappear without the debugger being
> > notified. Do I sound like a broken record?
>
> Sorry, but yes, you do. ;-)
>
> The debugger is notified. It's just a fact that a process can
> die (and become zombie) even while it was _stopped_ under
> ptrace control. That's a race you can't prevent, only cope with.
A process yes, but a thread no.
> I found NetBSD 5.1 in the GCC compile farm, and I see ESRCH
> there too:
>
> -bash-4.2$ uname -a
> NetBSD gcc70.fsffrance.org 5.1 NetBSD 5.1 (GENERIC) #0: Sat Nov 6 13:19:33 UTC 2010 builds@b6.netbsd.org:/home/builds/ab/netbsd-5-1-RELEASE/amd64/201011061943Z-obj/home/builds/ab/netbsd-5-1-RELEASE/src/sys/arch/amd64/compile/GENERIC amd64
>
> -bash-4.2$ gdb ./foo
> GNU gdb 6.5
> ...
> (gdb) start
> Breakpoint 1 at 0x400894: file foo.c, line 5.
> Starting program: /home/palves/foo
> main () at foo.c:5
> 5 return 0;
> (gdb) p getpid ()
> $1 = 24557
> (gdb) shell kill -9 24557
> (gdb) c
> Continuing.
> ptrace: No such process.
> (gdb)
That's an ancient GDB though.
> But even if some ptrace-based OS uses a different errno
> for that (which I doubt), we can just tweak throw_ptrace_error
> (a centralized place, yay!) to look for a different
> errno value. So what does OpenBSD's ptrace return
> in the test above?
(gdb) p getpid()
$1 = 24737
(gdb) shell kill -9 24747
ksh: kill: 24747: No such process
(gdb) shell kill -9 24737
(gdb) c
Continuing.
Program received signal SIGKILL, Killed.
main () at ../../../../src/binutils-gdb/gdb/testsuite/gdb.base/wchar.c:29
29 wchar_t narrow = 97;
(gdb)
Which strikes me as the proper behaviour in this case. Not sure if
the NetBSD behaviour you're seeing is the result of different GDB
code, or really different kernel behaviour.
OpenBSD's ptrace(2) will set errno to ESRCH if you pass it a
non-existant process ID or thread ID. As can be seen when using the
"attach" command:
(gdb) attach 666
Attaching to program: /home/kettenis/obj/binutils-gdb/gdb/testsuite/gdb.base/wchar, process 666
ptrace: No such process
Perhaps the error message here could be improved, but that doesn't
require us to add a throw with more details here.
In the end the meaning of ESRCH is dependent on the context. You
can't just interpret it as a missing thread.
> > I think at this point the right approach is to make
> > linux_resume_one_lwp() call ptrace() directly instead of calling down
> > into the inf_ptrace_resume(). That way you can simply check errno in
> > the place where it matters.
>
> No, your "simply" is not simple as you imply. There can be any number
> of ptrace calls that fail before the PT_CONTINUE in inf_ptrace_resume
> is reached. And whether to ignore the error should be left to some
> caller higher up on the call chain. That was the _whole point_ of this
> fuller fix, as I explained throughout the series.
> E.g., the ptrace call that fails can be the one that tries to write
> debug registers to the inferior, normal registers, reading the auxv,
> any memory read/write, whatever. Any ptrace error that throws ends up
> in the generic perror_with_name today, after the series, they'll
> end up in throw_ptrace_error instead, a single place we can add
> more context info to the error thrown. How is that a bad thing?
The problem is that you'll always end up losing some context by using
a throw/catch model. I think that should be avoided whenever
possible, and I got the impression that in the specific race you were
trying to fix here it could be avoided.
It also strikes me as undesitable having to add a new an dfairly
generic file to the list in config/*.mh.