(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: break break_fn continue Continuing. [LWP 13327 exited] [LWP 13324 exited] [LWP 13295 exited] [LWP 13301 exited] [LWP 13300 exited] [LWP 13286 exited] [LWP 13249 exited] [LWP 13248 exited] [LWP 13236 exited] [LWP 13233 exited] [LWP 13231 exited] [LWP 13242 exited] [LWP 13240 exited] [LWP 13221 exited] [LWP 13215 exited] [LWP 13213 exited] [LWP 13210 exited] [LWP 13161 exited] [LWP 13155 exited] [LWP 13124 exited] [LWP 13120 exited] [LWP 13117 exited] [LWP 13115 exited] [LWP 13113 exited] [LWP 13111 exited] [LWP 13110 exited] [LWP 13108 exited] [LWP 13105 exited] [LWP 13104 exited] [LWP 13143 exited] [LWP 13140 exited] [LWP 13137 exited] [LWP 13136 exited] [LWP 13133 exited] [LWP 13131 exited] [LWP 13128 exited] [LWP 13127 exited] [LWP 13125 exited] [LWP 13099 exited] [LWP 13091 exited] [LWP 13089 exited] [LWP 13085 exited] [LWP 13083 exited] [LWP 13081 exited] [LWP 13079 exited] [LWP 13078 exited] [LWP 13076 exited] [LWP 13073 exited] [LWP 13071 exited] [LWP 13070 exited] [LWP 13065 exited] [LWP 12948 exited] [LWP 12946 exited] [LWP 12945 exited] [LWP 12943 exited] [LWP 12940 exited] [LWP 12937 exited] [LWP 12934 exited] [LWP 12931 exited] [LWP 12930 exited] [LWP 12927 exited] [LWP 12923 exited] [LWP 12921 exited] [LWP 12918 exited] [LWP 12912 exited] [LWP 12909 exited] [LWP 12906 exited] [LWP 12903 exited] [LWP 12900 exited] [LWP 12886 exited] [LWP 12823 exited] [LWP 12820 exited] [LWP 12816 exited] [LWP 12813 exited] [LWP 12811 exited] [LWP 12808 exited] [LWP 12798 exited] [LWP 12737 exited] [LWP 12735 exited] [LWP 12733 exited] [LWP 12727 exited] [LWP 12724 exited] [LWP 12720 exited] [LWP 12717 exited] [LWP 12606 exited] [LWP 12532 exited] [LWP 12522 exited] [LWP 12518 exited] [LWP 12509 exited] [LWP 12503 exited] [LWP 12500 exited] [LWP 12496 exited] [LWP 12493 exited] [LWP 12490 exited] [LWP 12485 exited] [LWP 12482 exited] [LWP 12477 exited] [LWP 12475 exited] [LWP 12473 exited] [LWP 12469 exited] [LWP 12468 exited] [LWP 12465 exited] [LWP 12464 exited] [LWP 12460 exited] [LWP 12457 exited] [LWP 12455 exited] [LWP 12453 exited] [LWP 12451 exited] [LWP 12445 exited] [LWP 12443 exited] [LWP 12440 exited] [LWP 12438 exited] [LWP 12433 exited] [LWP 12418 exited] [LWP 12409 exited] [LWP 12387 exited] [LWP 12348 exited] [LWP 12011 exited] Program terminated with signal SIGTRAP, Trace/breakpoint trap. The program no longer exists. (gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 1: break at break_fn: 1
Created attachment 12718 [details] gdb.log
Reproduced today on master, so it's not a fluke. FTR: on openSUSE Leap 15.2 laptop.
It is not a bug but security issue. Due to ptrace protection, if you wanna attach another process without "parent-child relationship", it is illegal behavior. There are two different way to solve this issue. 1. Execute under root level 2. Set "/proc/sys/kernel/yama/ptrace_scope" into 0.
(In reply to Chungyi Chi from comment #3) > It is not a bug but security issue. Due to ptrace protection, if you wanna > attach another process without "parent-child relationship", it is illegal > behavior. > > There are two different way to solve this issue. > 1. Execute under root level > 2. Set "/proc/sys/kernel/yama/ptrace_scope" into 0. On my system, there's no yama: ... $ cat /sys/kernel/security/lsm lockdown,capability,apparmor $ ... Also, I don't understand how yama would cause the specific failure reported in this PR. If yama were active, wouldn't things fail much earlier, and in much more tests?
I also encountered this issue with current master branch on 3 machines: two x86_64-linux and one aarch64-linux. Carl Love also reported in bug #31312 that he encountered the issue on a powerpc64le-linux system. The aarch64 and powerpc64le machines had a patch to fix bug #31312 applied. In all cases it's necessary to keep running attach-many-short-lived-threads.exp in a loop to reproduce the problem. In one of the x86_64-linux machines it takes anywhere from 30 to 500 iterations to hit the problem, while in the other it took between 120 and 900 iterations. The aarch64-linux machine took ~2500 iterations. In powerpc64le-linux, the problem happened in 3 iterations out of 500. I do have Yama present in those machines, but it is disabled in all of them: $ sysctl kernel.yama.ptrace_scope kernel.yama.ptrace_scope = 0 I also agree with Tom that if Yama was the problem, it would affect the testcase in a different way. The issue arises from this loop in linux_proc_attach_tgid_threads (): /* Scan the task list for existing threads. While we go through the threads, new threads may be spawned. Cycle through the list of threads until we have done two iterations without finding new threads. */ for (iterations = 0; iterations < 2; iterations++) { struct dirent *dp; new_threads_found = 0; while ((dp = readdir (dir.get ())) != NULL) { unsigned long lwp; /* Fetch one lwp. */ lwp = strtoul (dp->d_name, NULL, 10); if (lwp != 0) { ptid_t ptid = ptid_t (pid, lwp); if (attach_lwp (ptid)) new_threads_found = 1; } } if (new_threads_found) { /* Start over. */ iterations = -1; } rewinddir (dir.get ()); } What happens is that two iterations without seeing new threads in linux_proc_attach_tgid_threads () isn't always enough for GDB to know that it has attached to all inferior threads. So sometimes after this function returns, an unattached inferior thread trips on the breakpoint instruction that GDB put in the inferior. I don't know if I would consider this a bug, but rather an issue that arises from the way attach-many-short-lived-threads.c behaves: since it's constantly creating new threads it's impossible for GDB to know when it has attached to all of them so that it can finish looking for new threads to attach. The only way I can see to improve GDB's behaviour is to increase the number of iterations of the loop that checks for new threads. I suspected that the ability of the inferior to create new threads was proportional to the number of CPUs present in the system so I was going to make the number of iterations in linux_proc_attach_tgid_threads () proportional to the number of CPUS, but on the machines I have at hand, the one where it takes longest to reproduce the problem has the most CPUs (160, vs 8 CPUs on the other machines), so maybe we just have to find a magical iteration number that works well for everybody who can reproduce the issue?