Bug 26286 - FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 1: break at break_fn: 1 (SIGTRAP)
Summary: FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 1: break at break...
Status: NEW
Alias: None
Product: gdb
Classification: Unclassified
Component: threads (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-07-22 19:08 UTC by Tom de Vries
Modified: 2024-03-19 17:44 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
gdb.log (4.34 KB, text/x-log)
2020-07-22 19:08 UTC, Tom de Vries
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tom de Vries 2020-07-22 19:08:07 UTC
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: break break_fn
continue
Continuing.
[LWP 13327 exited]
[LWP 13324 exited]
[LWP 13295 exited]
[LWP 13301 exited]
[LWP 13300 exited]
[LWP 13286 exited]
[LWP 13249 exited]
[LWP 13248 exited]
[LWP 13236 exited]
[LWP 13233 exited]
[LWP 13231 exited]
[LWP 13242 exited]
[LWP 13240 exited]
[LWP 13221 exited]
[LWP 13215 exited]
[LWP 13213 exited]
[LWP 13210 exited]
[LWP 13161 exited]
[LWP 13155 exited]
[LWP 13124 exited]
[LWP 13120 exited]
[LWP 13117 exited]
[LWP 13115 exited]
[LWP 13113 exited]
[LWP 13111 exited]
[LWP 13110 exited]
[LWP 13108 exited]
[LWP 13105 exited]
[LWP 13104 exited]
[LWP 13143 exited]
[LWP 13140 exited]
[LWP 13137 exited]
[LWP 13136 exited]
[LWP 13133 exited]
[LWP 13131 exited]
[LWP 13128 exited]
[LWP 13127 exited]
[LWP 13125 exited]
[LWP 13099 exited]
[LWP 13091 exited]
[LWP 13089 exited]
[LWP 13085 exited]
[LWP 13083 exited]
[LWP 13081 exited]
[LWP 13079 exited]
[LWP 13078 exited]
[LWP 13076 exited]
[LWP 13073 exited]
[LWP 13071 exited]
[LWP 13070 exited]
[LWP 13065 exited]
[LWP 12948 exited]
[LWP 12946 exited]
[LWP 12945 exited]
[LWP 12943 exited]
[LWP 12940 exited]
[LWP 12937 exited]
[LWP 12934 exited]
[LWP 12931 exited]
[LWP 12930 exited]
[LWP 12927 exited]
[LWP 12923 exited]
[LWP 12921 exited]
[LWP 12918 exited]
[LWP 12912 exited]
[LWP 12909 exited]
[LWP 12906 exited]
[LWP 12903 exited]
[LWP 12900 exited]
[LWP 12886 exited]
[LWP 12823 exited]
[LWP 12820 exited]
[LWP 12816 exited]
[LWP 12813 exited]
[LWP 12811 exited]
[LWP 12808 exited]
[LWP 12798 exited]
[LWP 12737 exited]
[LWP 12735 exited]
[LWP 12733 exited]
[LWP 12727 exited]
[LWP 12724 exited]
[LWP 12720 exited]
[LWP 12717 exited]
[LWP 12606 exited]
[LWP 12532 exited]
[LWP 12522 exited]
[LWP 12518 exited]
[LWP 12509 exited]
[LWP 12503 exited]
[LWP 12500 exited]
[LWP 12496 exited]
[LWP 12493 exited]
[LWP 12490 exited]
[LWP 12485 exited]
[LWP 12482 exited]
[LWP 12477 exited]
[LWP 12475 exited]
[LWP 12473 exited]
[LWP 12469 exited]
[LWP 12468 exited]
[LWP 12465 exited]
[LWP 12464 exited]
[LWP 12460 exited]
[LWP 12457 exited]
[LWP 12455 exited]
[LWP 12453 exited]
[LWP 12451 exited]
[LWP 12445 exited]
[LWP 12443 exited]
[LWP 12440 exited]
[LWP 12438 exited]
[LWP 12433 exited]
[LWP 12418 exited]
[LWP 12409 exited]
[LWP 12387 exited]
[LWP 12348 exited]
[LWP 12011 exited]

Program terminated with signal SIGTRAP, Trace/breakpoint trap.
The program no longer exists.
(gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 1: break at break_fn: 1
Comment 1 Tom de Vries 2020-07-22 19:08:25 UTC
Created attachment 12718 [details]
gdb.log
Comment 2 Tom de Vries 2020-07-27 14:17:10 UTC
Reproduced today on master, so it's not a fluke.

FTR: on openSUSE Leap 15.2 laptop.
Comment 3 Chungyi Chi 2020-08-31 07:54:19 UTC
It is not a bug but security issue. Due to ptrace protection, if you wanna attach another process without "parent-child relationship", it is illegal behavior.

There are two different way to solve this issue.
1. Execute under root level
2. Set "/proc/sys/kernel/yama/ptrace_scope" into 0.
Comment 4 Tom de Vries 2020-08-31 08:30:39 UTC
(In reply to Chungyi Chi from comment #3)
> It is not a bug but security issue. Due to ptrace protection, if you wanna
> attach another process without "parent-child relationship", it is illegal
> behavior.
> 
> There are two different way to solve this issue.
> 1. Execute under root level
> 2. Set "/proc/sys/kernel/yama/ptrace_scope" into 0.

On my system, there's no yama:
...
$ cat /sys/kernel/security/lsm 
lockdown,capability,apparmor
$
...

Also, I don't understand how yama would cause the specific failure reported in this PR.  If yama were active, wouldn't things fail much earlier, and in much more tests?
Comment 5 Thiago Jung Bauermann 2024-03-19 17:44:59 UTC
I also encountered this issue with current master branch on 3 machines: two
x86_64-linux and one aarch64-linux. Carl Love also reported in bug #31312
that he encountered the issue on a powerpc64le-linux system. The aarch64
and powerpc64le machines had a patch to fix bug #31312 applied.

In all cases it's necessary to keep running
attach-many-short-lived-threads.exp in a loop to reproduce the problem. In
one of the x86_64-linux machines it takes anywhere from 30 to 500
iterations to hit the problem, while in the other it took between 120 and
900 iterations. The aarch64-linux machine took ~2500 iterations. In
powerpc64le-linux, the problem happened in 3 iterations out of 500.

I do have Yama present in those machines, but it is disabled in all of
them:

  $ sysctl kernel.yama.ptrace_scope
  kernel.yama.ptrace_scope = 0

I also agree with Tom that if Yama was the problem, it would affect the
testcase in a different way.

The issue arises from this loop in linux_proc_attach_tgid_threads ():

  /* Scan the task list for existing threads.  While we go through the
     threads, new threads may be spawned.  Cycle through the list of
     threads until we have done two iterations without finding new
     threads.  */
  for (iterations = 0; iterations < 2; iterations++)
    {
      struct dirent *dp;

      new_threads_found = 0;
      while ((dp = readdir (dir.get ())) != NULL)
	{
	  unsigned long lwp;

	  /* Fetch one lwp.  */
	  lwp = strtoul (dp->d_name, NULL, 10);
	  if (lwp != 0)
	    {
	      ptid_t ptid = ptid_t (pid, lwp);

	      if (attach_lwp (ptid))
		new_threads_found = 1;
	    }
	}

      if (new_threads_found)
	{
	  /* Start over.  */
	  iterations = -1;
	}

      rewinddir (dir.get ());
    }

What happens is that two iterations without seeing new threads in
linux_proc_attach_tgid_threads () isn't always enough for GDB to know that
it has attached to all inferior threads. So sometimes after this function
returns, an unattached inferior thread trips on the breakpoint instruction
that GDB put in the inferior.

I don't know if I would consider this a bug, but rather an issue that
arises from the way attach-many-short-lived-threads.c behaves: since it's
constantly creating new threads it's impossible for GDB to know when it has
attached to all of them so that it can finish looking for new threads to
attach.

The only way I can see to improve GDB's behaviour is to increase the number
of iterations of the loop that checks for new threads.

I suspected that the ability of the inferior to create new threads was
proportional to the number of CPUs present in the system so I was going to
make the number of iterations in linux_proc_attach_tgid_threads ()
proportional to the number of CPUS, but on the machines I have at hand, the
one where it takes longest to reproduce the problem has the most CPUs (160,
vs 8 CPUs on the other machines), so maybe we just have to find a magical
iteration number that works well for everybody who can reproduce the issue?