[PATCH] gdb: Fix instability in thread groups test

Mon Aug 13 13:01:00 GMT 2018

* Pedro Alves <palves@redhat.com> [2018-08-13 13:03:47 +0100]:

> On 08/13/2018 12:41 PM, Andrew Burgess wrote:
> > * Pedro Alves <palves@redhat.com> [2018-08-13 10:51:44 +0100]:
> > 
> >> But shouldn't we make GDB handle this better?  Make the output
> >> more "atomic" in the sense that we either show a valid complete
> >> entry, or no entry?  There's an inherent race
> >> here, since we use multiple /proc accesses to fill up a process
> >> entry.  If we start fetching process info for a process, and the process
> >> disappears midway, I'd think it better to discard that process's entry,
> >> as-if we had not even seen it, i.e., as if we had listed the set of
> >> processes a tiny moment later.
> > 
> > I agree.
> > 
> > We also need to think about process reuse.  So with multiple accesses
> > to /proc we might start with one process, and end up with a completely
> > new process.
> > 
> > I might be overthinking it, but my first guess at a reliable strategy
> > would be:
> > 
> >   1. Find each /proc/PID directory.
> >   2. Read /proc/PID/stat and extract the start time.  Failure to read
> >      this causes the process to be abandoned.
> >   3. Read all of the other /proc/PID/XXX files as needed.  Any failure
> >      results in the process being abandoned.
> >   4. Reread /proc/PID/stat and confirm the start time hasn't changed,
> >      this would indicate a new process having slipped in.
> > 
> 
> My initial quick thought was just to drop the process entry if
> it turns out we end up with an empty core set.  
> 
> I wonder whether we can prevent PID reuse by keeping a descriptor
> for /proc/PID/ open while we open the other files.  Probably not.

That was my first though, I tried:

  - chdir /proc/PID
  - opendir for /proc/PID

  - Kill /proc/PID

  - Read from the opendir handle, find nothing there.

Which didn't really surprise me, but was worth a try...

> Otherwise, your scheme sounds like the next best.
> 
> > Given the system is still running, we can never be sure that we have
> > "all" processes, so throwing out anything that looks wrong seems like
> > the right strategy.
> > 
> > Also in step #4 we know we've just missed a process - something new
> > has started, but we ignore it.  I think this is fine though given the
> > racy nature of this sort of thing...
> > 
> > The only question is, could these thoughts be dropped into a bug
> > report, 
> 
> 
> Sure.
> 
> 
> > and the original patch to remove the unstable result applied?
> > Or maybe the test updated to either PASS or KFAIL?
> 
> I'd prefer the KFAIL option.  At the very least, a comment in
> the .exp file.

I'll put something together...

Thanks,
Andrew