I filed this bug at the homebrew page, so the relevant info can be found there. https://github.com/Homebrew/homebrew-core/issues/25172 If someone wants me to copy the info into a post here, for further convenience, I would be happy to. This may actually not be the same as bug # 20266 here, which was from two years ago, since the issue occurs only on gdb 8.1 in homebrew in mac os, *not* gdb 8.0.1 homebrew. In 8.0.1, gdb actually works correctly (assuming codesigning is done correctly, which is unrelated but has caused many users trouble), only with a dyld version warning.
Same problem with me.
Same problem with me. I have to wonder how Apple gets lldb to work with no problems. It appears to not be codesigned nor setuid/gid or anything special. There is an option which works which Apple should share with the open source community. They have wasted a lot of my time.
If gdb 8.0 works, but gdb 8.1 doesn't, then that suggests doing a git bisect to find the exact change in gdb that caused the problem. Any takers?
I think I've found the culprit: $ git bisect run ../gdb-bisect.sh running /Users/saagarjha/Git/bisect-test.sh [Snip] f6ac5f3d63e03a81c4ff3749aba234961cc9090e is the first bad commit commit f6ac5f3d63e03a81c4ff3749aba234961cc9090e Author: Pedro Alves <palves@redhat.com> Date: Thu May 3 00:37:22 2018 +0100 Convert struct target_ops to C++ [Snip] bisect run success Could someone confirm this for me? Commits before this one can successfully follow the debuggee to completion without incident, but the ones after and including this one crash with a null pointer dereference in gdb`push_target(struct target_ops *) at target.c:653. From a cursory glance, it seems a little fishy that darwin-nat.c doesn't have any sort of add_target call in it, but I can't understand the code in the C/C++ frankenstein state it's in right now, so I wasn't able to come up with a fix. (I did find a bunch of undefined behavior being hit, though, which I *do* have patches for. Let me know if you're curious in seeing them.) <rant> Just as a FYI, confirming this particular commit took well over two days and testing over two hundred revisions, which is something that I find as an outside observer to be truly horrible. Does GDB have *no* automated testing or continuous integration whatsoever? Putting aside the fact that any such infrastructure would catch simple bugs like this one, which are easy to reproduce, it would have also made my life bisecting a lot easier. Many intermediate commits are broken, as in they *literally don't build on macOS*, because someone forgot a header file or messed up a Makefile. Others dereference null pointers or overflow ints during startup, which really threw off my bisect script with false positives: I had to restart the bisect from the beginning at least half a dozen times because it homed in on the wrong bug. I'm aghast that it's possible for such clearly broken patches to land in the master branch. I do apologize for the vitriolic tone here, but I'm extremely frustrated at the amount of time I had to spend finding this when it should have been a rather trivial task. I do hope none of you take it personally–but if you're looking for things to improve, this is one thing I think you should focus on. </rant>
(In reply to Ray Seyfarth from comment #2) > Same problem with me. > > I have to wonder how Apple gets lldb to work with no problems. It appears > to not be codesigned nor setuid/gid or anything special. There is an option > which works which Apple should share with the open source community. They > have wasted a lot of my time. LLDB is signed with Apple's certificate: $ codesign -dvv `xcrun -find lldb` Executable=/Applications/Xcode-beta.app/Contents/Developer/usr/bin/lldb Identifier=com.apple.lldb Format=Mach-O thin (x86_64) CodeDirectory v=20200 size=622 flags=0x0(none) hashes=15+2 location=embedded Signature size=4535 Authority=Software Signing Authority=Apple Code Signing Certification Authority Authority=Apple Root CA Info.plist entries=6 TeamIdentifier=59GAB85EFG Sealed Resources=none Internal requirements count=1 size=64
(In reply to Saagar Jha from comment #4) > I think I've found the culprit: > > $ git bisect run ../gdb-bisect.sh > running /Users/saagarjha/Git/bisect-test.sh > > [Snip] > > f6ac5f3d63e03a81c4ff3749aba234961cc9090e is the first bad commit > commit f6ac5f3d63e03a81c4ff3749aba234961cc9090e > Author: Pedro Alves <palves@redhat.com> > Date: Thu May 3 00:37:22 2018 +0100 > > Convert struct target_ops to C++ > > [Snip] > > bisect run success That commit can't be the culprit for the issue reported in this bug, because that commit is recent, it is in master only, not in 8.1. It if caused some breakage, it's something else. A separate bug report would have been better. > > Could someone confirm this for me? Commits before this one can successfully > follow the debuggee to completion without incident, but the ones after and > including this one crash with a null pointer dereference in > gdb`push_target(struct target_ops *) at target.c:653. From a cursory glance, > it seems a little fishy that darwin-nat.c doesn't have any sort of > add_target call in it, The add_target call is in i386-darwin-nat.c:_initialize_i386_darwin_nat add_inf_child_target (&darwin_target); > but I can't understand the code in the C/C++ > frankenstein state it's in right now, Yeah. Anything in particular you'd like to point out? > so I wasn't able to come up with a > fix. (I did find a bunch of undefined behavior being hit, though, which I > *do* have patches for. Let me know if you're curious in seeing them.) Yes please. If you could contribute fixes, it'd be awesome: https://sourceware.org/gdb/wiki/ContributionChecklist In case it isn't obvious, the macOS port is in real need of someone motivated to maintain it. I'm afraid that none of the day-to-day maintainers uses macOS, AFAIK. You can see it as an opportunity. > > <rant> > Just as a FYI, confirming this particular commit took well over two days and > testing over two hundred revisions, which is something that I find as an > outside observer to be truly horrible. Wow. Sorry about that. Two hundred revisions sounds way too many for a git bisect? How could that have happened? > Does GDB have *no* automated testing > or continuous integration whatsoever? It does, see <https://sourceware.org/gdb/wiki/BuildBot>. The problem is nobody ever contributed a macOS buildslave. > Putting aside the fact that any such > infrastructure would catch simple bugs like this one, which are easy to > reproduce, it would have also made my life bisecting a lot easier. Many > intermediate commits are broken, as in they *literally don't build on > macOS*, because someone forgot a header file or messed up a Makefile. Others > dereference null pointers or overflow ints during startup, which really > threw off my bisect script with false positives: I had to restart the bisect > from the beginning at least half a dozen times because it homed in on the > wrong bug. :-( Sound like maybe "git bisect skip" would have helped? > I'm aghast that it's possible for such clearly broken patches to > land in the master branch. I do apologize for the vitriolic tone here, but > I'm extremely frustrated at the amount of time I had to spend finding this > when it should have been a rather trivial task. I do hope none of you take > it personally–but if you're looking for things to improve, this is one thing > I think you should focus on. Nope, sorry. The thing to improve is _getting someone that actually cares about the port to step up and help maintain it_. That could be you. Otherwise, I fear that at some point, the port will just end up deprecated and removed.
> including this one crash with a null pointer dereference in > gdb`push_target(struct target_ops *) at target.c:653. I think I see what is going on here. I'll send a patch.
Sorry, I took a break from because I couldn't figure it out: my bisect kept ending up on 4bbd4ef219c5b4c7d437618ba8937af86dd1032e, with a one character diff. My guess is that this commit changes what methods get called, so it might be able to discover what this changes if I could log every method call, but I don't know how to do that in gdb. > Yeah. Anything in particular you'd like to point out? The darwin-nat/i386-darwin-nat thing was kind of confusing to me, since I thought darwin-nat was for x86_64 and i386-darwin-nat was for, well, i386. Plus this one didn't really follow the example set by other platforms so I didn't have much to go off of. Just my thoughts. > Yes please. If you could contribute fixes, it'd be awesome I have a couple of clumsy patches for issues I found up here (as well as yours), if you find them useful: https://github.com/saagarjha/binutils-gdb. If they're useful I could try to format them to follow the guidelines. > Two hundred revisions sounds way too many for a git bisect? How could that have happened? Well, each bisect ideally should have been around a dozen commits to test, but I kept needing to run bisect again because my bisect script, having no real way of testing whether the current commit was good, ended up doing something along the lines of checking "echo r | gdb -return-child-result a.out". But a lot of commits had issues such as not building (which meant that my script, which "git bisect skip"ed any commit that didn't build, ended up mired in a 90-odd commit block where none of the commits built, jumping around randomly to find one that compiled), or many that had some sort of undefined behavior that was recognized much later, which means I had to manually backport fixes to rule out the false positives it discovered.
(In reply to Saagar Jha from comment #8) > > Yes please. If you could contribute fixes, it'd be awesome > > I have a couple of clumsy patches for issues I found up here (as well as > yours), if you find them useful: https://github.com/saagarjha/binutils-gdb. > If they're useful I could try to format them to follow the guidelines. They do seem to point at real issues that should be fixed somehow. If you send the fixes to the list, they can be discussed there.
> > > Yeah. Anything in particular you'd like to point out? > > The darwin-nat/i386-darwin-nat thing was kind of confusing to me, since I > thought darwin-nat was for x86_64 and i386-darwin-nat was for, well, i386. > Plus this one didn't really follow the example set by other platforms so I > didn't have much to go off of. Just my thoughts. OK. There are exceptions for single-arch ports, but $OS-nat.c is usually _not_ architecture-specific. E.g., linux-nat.c is for all Linux architectures, and then we have i386-linux-nat.c/amd64-linux-nat.c. Same with fbsd-nat.c, etc. Maybe we should rename i386-darwin-nat to x86-darwin-nat though, as that's the convention we follow most everywhere (i386=>32-bit, amd64=>64-bit, x86=>both). Please do feel free to pop in to #gdb on freenode, where several maintainers hang. I'd be happy to help you get around the codebase a bit more, if you're interested.
Back to the original topic: (In reply to Saagar Jha from comment #8) > Sorry, I took a break from because I couldn't figure it out: my bisect kept > ending up on 4bbd4ef219c5b4c7d437618ba8937af86dd1032e, with a one character > diff. My guess is that this commit changes what methods get called, so it > might be able to discover what this changes if I could log every method > call, but I don't know how to do that in gdb. Hmm, at least that is indeed changing something in the darwin-related code. The original patch was submitted here, but it didn't come with any sort of detail: <https://sourceware.org/ml/gdb-patches/2017-07/msg00447.html>. Did you try reverting that commit on top of current master, see if it makes a difference?
> They do seem to point at real issues that should be fixed somehow. If you send the fixes to the list, they can be discussed there. Sure, I'll make sure to stop by after we get this figured out so I can make a clean set of patches. > Did you try reverting that commit on top of current master, see if it makes a difference? Yup, the issue (mostly) goes away if I do that. There are other latent issues hiding out somewhere that we can get to later, but at least I can get GDB to execute by program to successful termination as it did in 8.0.1.
I built gdb 8.0 from git and that did not work for me on macOS 10.13.5. Neither did git master. I also tried the 8.0.1 from brew. They both fail in the same way, with "Unknown signal". I tend to think this is a dup of 20266.
Tom, does reverting the offending commit work for you?
(In reply to Pedro Alves from comment #14) > Tom, does reverting the offending commit work for you? No. What happens for me is that darwin_decode_message gets a MACH_NOTIFY_DEAD_NAME (the "== 0x48") case. Then the subsequent wait4() call returns with wstatus=5. wstatus=5 is a strange response. It is not WIFEXITED, but neither is it WIFSIGNALED. So far I haven't found any documentation about what it might be. One wild guess is that maybe this mach message actually does carry the name of the new port and it could be extracted via darwin_find_new_inferior. But that seems like a longshot. I looked at the lldb patch that Jason Molenda posted (see https://sourceware.org/bugzilla/show_bug.cgi?id=20266#c6), but lldb seems to work in a completely different way here, I guess hooking into some low-level mach thing somehow? Like, those functions aren't obviously called from anywhere. So, I do wonder whether the answer is a bigger rewrite of darwin-nat.c, to use mach stuff everywhere and not ptrace or wait. However, this experience has shown me that even minor revisions of macOS can come with big changes, so modifying this code seems somewhat tricky.
Are you testing with "set startup-with-shell off", perhaps? Or maybe Saagar was? I could see that impacting whether affecting whether you see the SIGTRAP, since this all seems to be exec-event related.
(In reply to Pedro Alves from comment #16) > Are you testing with "set startup-with-shell off", perhaps? Or maybe Saagar > was? I could see that impacting whether affecting whether you see the > SIGTRAP, since this all seems to be exec-event related. I have tried it both ways to no avail.
In my case the first problem was that I was trying "gdb /bin/ls" -- but that is subject to System Integrity Protection. Using my own test executable gives a different problem. I'll file a separate bug about detecting SIP. gdb could at least tell the user what is going on.
I'd just like to confirm that I am seeing the exact same error on Mac OS 10.13.6 using GDB 8.2 installed via Homebrew. Downgrading to 8.0.1 via Homebrew gives me a working version of GDB.
Try git master gdb. There have been a few High Sierra fixes there. I used to get this problem there but now I no longer do.
Ooh, it's nice to see that the underlying issue has been fixed. Can confirm that this works on macOS Mojave with a small patch to deal with new load commands. I'll look into the process of getting this merged in so we can extend support to 10.14 as well.
(In reply to Saagar Jha from comment #21) > Ooh, it's nice to see that the underlying issue has been fixed. Can confirm > that this works on macOS Mojave with a small patch to deal with new load > commands. I'll look into the process of getting this merged in so we can > extend support to 10.14 as well. Looking forward to that. See also bug #23728, bug #23742, and bug #23746.
I've taken the time to clean up my patches and submit them to the gdb-patches mailing list (though, I don't see them in the archives. Is this just a standard delay, or did I mess up somewhere?)
(In reply to Saagar Jha from comment #23) > I've taken the time to clean up my patches and submit them to the > gdb-patches mailing list (though, I don't see them in the archives. Is this > just a standard delay, or did I mess up somewhere?) I didn't see them either, so maybe try re-sending.
The patches should be on the list now. Turns out gdb-patches is extremely strict about emails that contain any kind of HTML ;P
Ok, back to the original topic, now that the patches have been merged: I'm still intermittently seeing the original issue about half the time. Tom, for me it seems like wait4 is giving me WIFSTOPPED with SIGTRAP. Does this mean we need to refresh out task port?
(In reply to Saagar Jha from comment #26) > Ok, back to the original topic, now that the patches have been merged: I'm > still intermittently seeing the original issue about half the time. Tom, for > me it seems like wait4 is giving me WIFSTOPPED with SIGTRAP. Does this mean > we need to refresh out task port? I don't really know. Actually I'm surprised to hear that this is still a problem as I would have thought the earlier round of macOS changes would have fixed this. However, I don't have Mojave, only High Sierra, so I can't really try it. I haven't been able to reproduce this bug there.
Tom, Saagar I'm not seeing SIGTRAP on master but it exists in the latest stable gdb from homebrew (8.2_1). Do you know the commits which could resolve the issue? Thank you, Roman
This issue is intermittent for me; I'm building straight off of master (./configure --disable-werror CFLAGS="-g -fsanitize=address -fsanitize=undefined" CXXFLAGS="-g -fsanitize=address -fsanitize=undefined" LDFLAGS="-g -fsanitize=address -fsanitize=undefined") on macOS Mojave 10.14.3 Beta (18D21c). I'll try gdb a couple times on a toy binary and it'll work, and then it will start to randomly hang until I SIGKILL it. This is similar to the behavior I had on High Sierra when I was on it back in May, so I'm guessing that the underlying issue is still there somewhere. As to what it is, I have no idea…
So, there should be two different issues: * when program quits shortly after start with SIGTRAP (the issue). I haven't seen the issue on master. * program doesn't quit at all (I don't know if it has a bug#) but I could easily catch it by running gdb in loop: for i in $(seq 1 100); do sudo /usr/local/Cellar/gdb/HEAD-750b258_1/bin/gdb -ex 'r' -ex 'quit' ./a.out; done Sampling shows gdb hangs in darwin_decode_message: Analysis of sampling gdb (pid 746) every 1 millisecond Process: gdb [746] Path: /usr/local/Cellar/gdb/HEAD-750b258_1/bin/gdb Load Address: 0x100000000 Identifier: gdb Version: 0 Code Type: X86-64 Parent Process: sudo [745] Date/Time: 2018-12-12 03:07:55.146 +0300 Launch Time: 2018-12-12 03:07:30.741 +0300 OS Version: Mac OS X 10.14.1 (18B75) Report Version: 7 Analysis Tool: /usr/bin/sample Physical footprint: 5268K Physical footprint (peak): 5272K ---- Call graph: 2826 Thread_21543800 DispatchQueue_1: com.apple.main-thread (serial) 2826 start (in libdyld.dylib) + 1 [0x7fff63cd508d] 2826 main (in gdb) + 44 [0x1000039dc] 2826 gdb_main(captured_main_args*) (in gdb) + 3701 [0x10019399c] 2826 catch_command_errors(void (*)(char const*, int), char const*, int) (in gdb) + 53 [0x1001941af] 2826 execute_command(char const*, int) (in gdb) + 489 [0x1002836e4] 2826 cmd_func(cmd_list_element*, char const*, int) (in gdb) + 104 [0x100079296] 2826 run_command_1(char const*, int, run_how) (in gdb) + 594 [0x100162c6a] 2826 darwin_nat_target::create_inferior(char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, char**, int) (in gdb) + 939 [0x1000bf50b] 2826 fork_inferior(char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, char**, void (*)(), void (*)(int), void (*)(), char const*, void (*)(char const*, char* const*, char* const*)) (in gdb) + 376 [0x100126069] 2826 darwin_ptrace_him(int) (in gdb) + 99 [0x1000bf96f] 2826 gdb_startup_inferior(int, int) (in gdb) + 22 [0x100125727] 2826 startup_inferior(int, int, target_waitstatus*, ptid_t*) (in gdb) + 205 [0x1001262d4] 2826 target_wait(ptid_t, target_waitstatus*, int) (in gdb) + 67 [0x10026901c] 2826 darwin_nat_target::wait(ptid_t, target_waitstatus*, int) (in gdb) + 39 [0x1000be861] 2826 darwin_wait(ptid_t, target_waitstatus*) (in gdb) + 290 [0x1000be98d] 2826 darwin_decode_message(mach_msg_header_t*, darwin_thread_info**, inferior**, target_waitstatus*) (in gdb) + 1091 [0x1000c155d] 2826 __wait4_nocancel (in libsystem_kernel.dylib) + 10 [0x7fff63e15e72] Total number in stack (recursive counted multiple, when >=5): Sort by top of stack, same collapsed (when >= 5): __wait4_nocancel (in libsystem_kernel.dylib) 2826
I dug into this problem and the issue is that that gdb hangs in darwin_decode_message I had a look at the most current version from the ftp server gdb-8.2.50.20190105.tar.xz In darwin-nat.c in darwin_decode_message(...) in line 1131 wait4 is called for the first time check if and how the thread exited. In line 1154 wait4 is called a second time on the now potentially terminated thread. darwin-nat.c line 1154: wait4 (inf->pid, &wstatus, 0, NULL); changing this to darwin-nat.c line 1154: wait4 (inf->pid, &wstatus, WNOHANG, NULL); tells wait4 not to wait for threads that won't report in any more. This makes the frequent hangs of gdb under Mojave go away. I've tested this with C++ and fortran from the console and from eclipse under Mojave and had no problems.
I think the problem(s) that were happening on High Sierra mentioned in this here bug report eventually got fixed, if I read the history of this bug correctly. The issue discussed later in here, starting in https://sourceware.org/bugzilla/show_bug.cgi?id=22960#c21 where Mojave is mentioned, seems to be a duplicate of https://sourceware.org/bugzilla/show_bug.cgi?id=24069, which should be fixed in master.