This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
fix "info os processes" race -> crash (ext-run.exp racy FAIL)
- From: Pedro Alves <pedro at codesourcery dot com>
- To: gdb-patches at sourceware dot org
- Date: Fri, 26 Aug 2011 20:06:20 +0100
- Subject: fix "info os processes" race -> crash (ext-run.exp racy FAIL)
I'm seeing ext-run.exp randomly fail with:
gdb.sum:
Running ../../../src/gdb/testsuite/gdb.server/ext-run.exp ...
FAIL: gdb.server/ext-run.exp: get process list (pattern 1)
FAIL: gdb.server/ext-run.exp: load new file without any gdbserver inferior
FAIL: gdb.server/ext-run.exp: monitor exit
gdb.log:
(gdb) PASS: gdb.server/ext-run.exp: continue to main
gdb_expect_list pattern: /pid +user +command/
info os processes
Remote connection closed
(gdb) FAIL: gdb.server/ext-run.exp: get process list (pattern 1)
This is gdbserver crashing:
$ ./gdb gdbserver/gdbserver ./testsuite/core.27095
...
Program terminated with signal 11, Segmentation fault.
...
(top-gdb) bt
#0 0x00002ae59c23a3f6 in __readdir (dirp=0x0) at ../sysdeps/unix/readdir.c:45
#1 0x000000000042613b in get_cores_used_by_process (pid=27135, cores=0xafe7e0) at ../../../src/gdb/gdbserver/../common/linux-osdata.c:263
#2 0x0000000000426312 in linux_xfer_osdata_processes (readbuf=0xafd7d0 "", offset=0, len=4096) at ../../../src/gdb/gdbserver/../common/linux-osdata.c:338
#3 0x0000000000426b91 in linux_common_xfer_osdata (annex=0xaf5202 "processes", readbuf=0xafd7d0 "", offset=0, len=4096)
at ../../../src/gdb/gdbserver/../common/linux-osdata.c:579
#4 0x0000000000424cb7 in linux_qxfer_osdata (annex=0xaf5202 "processes", readbuf=0xafd7d0 "", writebuf=0x0, offset=0, len=4096)
at ../../../src/gdb/gdbserver/linux-low.c:4467
#5 0x000000000040812a in handle_qxfer_osdata (annex=0xaf5202 "processes", readbuf=0xafd7d0 "", writebuf=0x0, offset=0, len=4096)
at ../../../src/gdb/gdbserver/server.c:981
#6 0x00000000004088ac in handle_qxfer (own_buf=0xaf51f0 "qXfer:osdata", packet_len=33, new_packet_len_p=0x7fff8e2ecdd4)
at ../../../src/gdb/gdbserver/server.c:1254
#7 0x0000000000409dce in handle_query (own_buf=0xaf51f0 "qXfer:osdata", packet_len=33, new_packet_len_p=0x7fff8e2ecdd4)
at ../../../src/gdb/gdbserver/server.c:1749
#8 0x000000000040bda0 in process_serial_event () at ../../../src/gdb/gdbserver/server.c:2778
#9 0x000000000040ce3f in handle_serial_event (err=0, client_data=0x0) at ../../../src/gdb/gdbserver/server.c:3194
#10 0x000000000041164b in handle_file_event (event_file_desc=6) at ../../../src/gdb/gdbserver/event-loop.c:489
#11 0x0000000000410dfc in process_event () at ../../../src/gdb/gdbserver/event-loop.c:244
#12 0x0000000000411bbd in start_event_loop () at ../../../src/gdb/gdbserver/event-loop.c:607
#13 0x000000000040bc21 in main (argc=4, argv=0x7fff8e2ed008) at ../../../src/gdb/gdbserver/server.c:2689
The problem is that get_cores_used_by_process assumes opening
/proc/PID/task always suceeds, but since we're listing all the
processes running on the system, it can fail if
PID happens to exit after we've seen it exist (by listing
/proc contents), but just before we open /proc/PID/task.
This is easier to trip on if you run the testsuite
in parallel mode (make check -jN).
All other places are careful in handling /proc... file
or dir open failure, except this one.
I've applied the obvious fix.
(fixes both native gdb and gdbserver, hurray for code sharing!)
--
Pedro Alves
2011-08-26 Pedro Alves <pedro@codesourcery.com>
gdb/
* common/linux-osdata.c (get_cores_used_by_process): Don't assume
opening /proc/PID/task always succeeds.
---
gdb/common/linux-osdata.c | 36 +++++++++++++++++++-----------------
1 file changed, 19 insertions(+), 17 deletions(-)
Index: src/gdb/common/linux-osdata.c
===================================================================
--- src.orig/gdb/common/linux-osdata.c 2011-08-26 19:41:37.255883141 +0100
+++ src/gdb/common/linux-osdata.c 2011-08-26 19:45:18.515883179 +0100
@@ -259,27 +259,29 @@ get_cores_used_by_process (pid_t pid, in
sprintf (taskdir, "/proc/%d/task", pid);
dir = opendir (taskdir);
-
- while ((dp = readdir (dir)) != NULL)
+ if (dir)
{
- pid_t tid;
- int core;
-
- if (!isdigit (dp->d_name[0])
- || NAMELEN (dp) > sizeof ("4294967295") - 1)
- continue;
-
- tid = atoi (dp->d_name);
- core = linux_common_core_of_thread (ptid_build (pid, tid, 0));
-
- if (core >= 0)
+ while ((dp = readdir (dir)) != NULL)
{
- ++cores[core];
- ++task_count;
+ pid_t tid;
+ int core;
+
+ if (!isdigit (dp->d_name[0])
+ || NAMELEN (dp) > sizeof ("4294967295") - 1)
+ continue;
+
+ tid = atoi (dp->d_name);
+ core = linux_common_core_of_thread (ptid_build (pid, tid, 0));
+
+ if (core >= 0)
+ {
+ ++cores[core];
+ ++task_count;
+ }
}
- }
- closedir (dir);
+ closedir (dir);
+ }
return task_count;
}