This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PR25394 patch: debuginfod groom vs. scan race condition


Hi Frank,

On Mon, 2020-01-20 at 12:10 -0500, Frank Ch. Eigler wrote:
> > Nice work. And I think I almost understand it. I am just a little
> > confused about thread_main_fts_source_paths also being an "idler":
> 
> Yup, it's a bit subtle.
> 
> > >   while (! interrupted)
> > >      {
> > > -      set_metric("thread_timer", "role","traverse", rescan_timer);
> > > -      // set_metric("thread_forced_total", "role","traverse", forced_rescan_count);
> > > -      if (rescan_s && rescan_timer > rescan_s)
> > > -        rescan_timer = 0;
> > > +      sleep (1);
> > > +      scanq.wait_idle(); // don't start a new traversal while scanners haven't finished the job
> > > +      scanq.done_idle(); // release the hounds
> > > +      if (interrupted) break;
> > I guess the comment "release the hounds" gets me confused. I don't
> > know precisely what that means. 
> 
> https://www.youtube.com/watch?time_continue=28&v=ORvkElqw0QM

Aha, so it is as if we are going to start to run (chasing files),
except there is also the rescan timeout which might still stop the
hounds from really being released :)

> > Is this needed for correctness, or is it simply to have an hint that
> > the scanners have ran because you register as an idler and so at
> > this point you could only run if no scanner is running?
> 
> It's because we don't want to start a new fts traversal pass while the
> scanners haven't even finished the last one.  I've seen a case where
> the rescan interval was so short that the scanner backlog just kept
> growing and growing and growing.  (Changing the work queue to a set
> from a dequeue also helps with duplicate elimination.)  So in this
> way, it just returns to the timing behaviour of the pre-workqueue
> implementation: we wait a certain amount of time AFTER the scanning is
> ALL DONE, before starting a new scan.

That all makes sense. Thanks.

> > But given you immediately tell them you are done a scanner could in
> > theory start their work up again while you are doing your own
> > traversal?
> 
> Yes, because that's ideal for concurrency: while one thread fts(3)'s
> and enumerates the results, the scanner threads can process the
> already enumerated files.

OK. And if any idlers (grooming) would wake up that is no problem,
because that simply means the work queue was empty and adding work to
the queue can be done whether either the grooming thread (x)or scanner
threads are running concurrently.

Please do push if you feel confident about it being correct. I think it
is now that I understand how it actually works.

Thanks,

Mark


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]