This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

fix "info os processes" race -> crash (ext-run.exp racy FAIL)


I'm seeing ext-run.exp randomly fail with:

gdb.sum:

Running ../../../src/gdb/testsuite/gdb.server/ext-run.exp ...
FAIL: gdb.server/ext-run.exp: get process list (pattern 1)
FAIL: gdb.server/ext-run.exp: load new file without any gdbserver inferior
FAIL: gdb.server/ext-run.exp: monitor exit

gdb.log:

(gdb) PASS: gdb.server/ext-run.exp: continue to main
gdb_expect_list pattern: /pid +user +command/
info os processes
Remote connection closed
(gdb) FAIL: gdb.server/ext-run.exp: get process list (pattern 1)

This is gdbserver crashing:

$ ./gdb gdbserver/gdbserver ./testsuite/core.27095
...
Program terminated with signal 11, Segmentation fault.
...
(top-gdb) bt
#0  0x00002ae59c23a3f6 in __readdir (dirp=0x0) at ../sysdeps/unix/readdir.c:45
#1  0x000000000042613b in get_cores_used_by_process (pid=27135, cores=0xafe7e0) at ../../../src/gdb/gdbserver/../common/linux-osdata.c:263
#2  0x0000000000426312 in linux_xfer_osdata_processes (readbuf=0xafd7d0 "", offset=0, len=4096) at ../../../src/gdb/gdbserver/../common/linux-osdata.c:338
#3  0x0000000000426b91 in linux_common_xfer_osdata (annex=0xaf5202 "processes", readbuf=0xafd7d0 "", offset=0, len=4096)
    at ../../../src/gdb/gdbserver/../common/linux-osdata.c:579
#4  0x0000000000424cb7 in linux_qxfer_osdata (annex=0xaf5202 "processes", readbuf=0xafd7d0 "", writebuf=0x0, offset=0, len=4096)
    at ../../../src/gdb/gdbserver/linux-low.c:4467
#5  0x000000000040812a in handle_qxfer_osdata (annex=0xaf5202 "processes", readbuf=0xafd7d0 "", writebuf=0x0, offset=0, len=4096)
    at ../../../src/gdb/gdbserver/server.c:981
#6  0x00000000004088ac in handle_qxfer (own_buf=0xaf51f0 "qXfer:osdata", packet_len=33, new_packet_len_p=0x7fff8e2ecdd4)
    at ../../../src/gdb/gdbserver/server.c:1254
#7  0x0000000000409dce in handle_query (own_buf=0xaf51f0 "qXfer:osdata", packet_len=33, new_packet_len_p=0x7fff8e2ecdd4)
    at ../../../src/gdb/gdbserver/server.c:1749
#8  0x000000000040bda0 in process_serial_event () at ../../../src/gdb/gdbserver/server.c:2778
#9  0x000000000040ce3f in handle_serial_event (err=0, client_data=0x0) at ../../../src/gdb/gdbserver/server.c:3194
#10 0x000000000041164b in handle_file_event (event_file_desc=6) at ../../../src/gdb/gdbserver/event-loop.c:489
#11 0x0000000000410dfc in process_event () at ../../../src/gdb/gdbserver/event-loop.c:244
#12 0x0000000000411bbd in start_event_loop () at ../../../src/gdb/gdbserver/event-loop.c:607
#13 0x000000000040bc21 in main (argc=4, argv=0x7fff8e2ed008) at ../../../src/gdb/gdbserver/server.c:2689

The problem is that get_cores_used_by_process assumes opening
/proc/PID/task always suceeds, but since we're listing all the
processes running on the system, it can fail if
PID happens to exit after we've seen it exist (by listing
/proc contents), but just before we open /proc/PID/task.

This is easier to trip on if you run the testsuite
in parallel mode (make check -jN).

All other places are careful in handling /proc... file
or dir open failure, except this one.

I've applied the obvious fix.

(fixes both native gdb and gdbserver, hurray for code sharing!)

-- 
Pedro Alves

2011-08-26  Pedro Alves  <pedro@codesourcery.com>

	gdb/
	* common/linux-osdata.c (get_cores_used_by_process): Don't assume
	opening /proc/PID/task always succeeds.

---
 gdb/common/linux-osdata.c |   36 +++++++++++++++++++-----------------
 1 file changed, 19 insertions(+), 17 deletions(-)

Index: src/gdb/common/linux-osdata.c
===================================================================
--- src.orig/gdb/common/linux-osdata.c	2011-08-26 19:41:37.255883141 +0100
+++ src/gdb/common/linux-osdata.c	2011-08-26 19:45:18.515883179 +0100
@@ -259,27 +259,29 @@ get_cores_used_by_process (pid_t pid, in
 
   sprintf (taskdir, "/proc/%d/task", pid);
   dir = opendir (taskdir);
-
-  while ((dp = readdir (dir)) != NULL)
+  if (dir)
     {
-      pid_t tid;
-      int core;
-
-      if (!isdigit (dp->d_name[0])
-	  || NAMELEN (dp) > sizeof ("4294967295") - 1)
-	continue;
-
-      tid = atoi (dp->d_name);
-      core = linux_common_core_of_thread (ptid_build (pid, tid, 0));
-
-      if (core >= 0)
+      while ((dp = readdir (dir)) != NULL)
 	{
-	  ++cores[core];
-	  ++task_count;
+	  pid_t tid;
+	  int core;
+
+	  if (!isdigit (dp->d_name[0])
+	      || NAMELEN (dp) > sizeof ("4294967295") - 1)
+	    continue;
+
+	  tid = atoi (dp->d_name);
+	  core = linux_common_core_of_thread (ptid_build (pid, tid, 0));
+
+	  if (core >= 0)
+	    {
+	      ++cores[core];
+	      ++task_count;
+	    }
 	}
-    }
 
-  closedir (dir);
+      closedir (dir);
+    }
 
   return task_count;
 }


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]