[PATCH] gdb: Fix instability in thread groups test
Andrew Burgess
andrew.burgess@embecosm.com
Mon Aug 13 13:01:00 GMT 2018
* Pedro Alves <palves@redhat.com> [2018-08-13 13:03:47 +0100]:
> On 08/13/2018 12:41 PM, Andrew Burgess wrote:
> > * Pedro Alves <palves@redhat.com> [2018-08-13 10:51:44 +0100]:
> >
> >> But shouldn't we make GDB handle this better? Make the output
> >> more "atomic" in the sense that we either show a valid complete
> >> entry, or no entry? There's an inherent race
> >> here, since we use multiple /proc accesses to fill up a process
> >> entry. If we start fetching process info for a process, and the process
> >> disappears midway, I'd think it better to discard that process's entry,
> >> as-if we had not even seen it, i.e., as if we had listed the set of
> >> processes a tiny moment later.
> >
> > I agree.
> >
> > We also need to think about process reuse. So with multiple accesses
> > to /proc we might start with one process, and end up with a completely
> > new process.
> >
> > I might be overthinking it, but my first guess at a reliable strategy
> > would be:
> >
> > 1. Find each /proc/PID directory.
> > 2. Read /proc/PID/stat and extract the start time. Failure to read
> > this causes the process to be abandoned.
> > 3. Read all of the other /proc/PID/XXX files as needed. Any failure
> > results in the process being abandoned.
> > 4. Reread /proc/PID/stat and confirm the start time hasn't changed,
> > this would indicate a new process having slipped in.
> >
>
> My initial quick thought was just to drop the process entry if
> it turns out we end up with an empty core set.
>
> I wonder whether we can prevent PID reuse by keeping a descriptor
> for /proc/PID/ open while we open the other files. Probably not.
That was my first though, I tried:
- chdir /proc/PID
- opendir for /proc/PID
- Kill /proc/PID
- Read from the opendir handle, find nothing there.
Which didn't really surprise me, but was worth a try...
> Otherwise, your scheme sounds like the next best.
>
> > Given the system is still running, we can never be sure that we have
> > "all" processes, so throwing out anything that looks wrong seems like
> > the right strategy.
> >
> > Also in step #4 we know we've just missed a process - something new
> > has started, but we ignore it. I think this is fine though given the
> > racy nature of this sort of thing...
> >
> > The only question is, could these thoughts be dropped into a bug
> > report,
>
>
> Sure.
>
>
> > and the original patch to remove the unstable result applied?
> > Or maybe the test updated to either PASS or KFAIL?
>
> I'd prefer the KFAIL option. At the very least, a comment in
> the .exp file.
I'll put something together...
Thanks,
Andrew
More information about the Gdb-patches
mailing list