[PATCH] gdb: infrun: consume multiple events at each pass in stop_all_threads

Thu Apr 16 17:51:25 GMT 2020

On 2/24/20 7:36 PM, Simon Marchi wrote:
> From: Laurent Morichetti <Laurent.Morichetti@amd.com>
> 
> [Simon: I send this patch on behalf of Laurent Morichetti, I added the
>  commit message and performance measurement stuff.
> 
>  Also, this patch is better viewed with "git show -w".]

Indeed it is!

> 
> stop_all_threads, in infrun.c, is used to stop all running threads on
> targets that are always non-stop.  It's used, for example, when the
> program hits a breakpoint while GDB is set to "non-stop off".  It sends
> a stop request for each running thread, then collects one wait event for
> each.
> 
> Since new threads can spawn while we are stopping the threads, it's
> written in a way where it makes multiple such "send stop requests to
> running threads & collect wait events" passes.  The function completes
> when it has made two passes where it hasn't seen any running threads.
> 
> With the way it's written right now is, it iterates on the thread list,
> sending a stop request for each running thread.  It then waits for a
> single event, after which it iterates through the thread list again.  It
> sends stop requests for any running threads that's been created since
> the last iteration.  It then consumes another single wait event.
> 
> This makes it so we iterate on O(n^2) threads in total, where n is the
> number of threads.  This patch changes the function to reduce it to
> O(n).  This starts to have an impact when dealing with multiple
> thousands of threads (see numbers below).  At each pass, we know the
> number of outstanding stop requests we have sent, for which we need to
> collect a stop event.  We can therefore loop to collect this many stop
> events before proceeding to the next pass and iterate on the thread list
> again.
> 
> To check the performance improvements with this patch, I made an
> x86/Linux program with a large number of idle threads (varying from 1000
> to 10000).  The program's main thread hits a breakpoint once all these
> threads have started, which causes stop_all_threads to be called to stop
> all these threads.  I measured (by patching stop_all_threads):
> 
> - the execution time of stop_all_threads
> - the total number of threads we iterate on during the complete
>   execution of the function (the total number of times we execute the
>   "for (thread_info *t : all_non_exited_threads ())" loop)
> 
> These are the execution times, in milliseconds:
> 
>     # threads  before  after
>          1000     226    106
>          2000     997    919
>          3000    3461   2323
>          4000    4330   3570
>          5000    8642   6600
>          6000    9918   8039
>          7000   12662  10930
>          8000   16652  11222
>          9000   21561  15875
>         10000   26613  20019
> 
> Note that I very unscientifically executed each case only once.
> 
> These are the number of loop executions:
> 
>     # threads     before  after
>          1000    1003002   3003
>          2000    4006002   6003
>          3000    9009002   9003
>          4000   16012002  12003
>          5000   25015002  15003
>          6000   36018002  18003
>          7000   49021002  21003
>          8000   64024002  24003
>          9000   81027002  27003
>         10000  100030002  30003
> 
> This last table shows pretty well the O(n^2) vs O(n) behaviors.

Wow!

> @@ -4758,110 +4758,114 @@ stop_all_threads (void)
>  	  if (pass > 0)
>  	    pass = -1;
>  
> -	  wait_one_event event = wait_one ();
> -
> -	  if (debug_infrun)
> +	  for (int i = 0; i < waits_needed; i++)

This makes sense to me, but can you try locally to check whether
if you do _more_ waits than wait_needed, like, say:

    for (int i = 0; i < (waits_needed * 2); i++)

... GDB still works correctly?  In theory, wait_one will end up
returning TARGET_WAITKIND_NO_RESUMED once you get to waits_needed,
and things will all work out.

The reason I'm asking this, is if a process exits, or execs, while
we're trying to stop it, I think that it's possible that we won't see
an exit event for each and every thread of that exiting process.
Particularly execs -- see follow_exec's delete_thread calls.
This is somewhat related to Tankut's patch, here:

 https://sourceware.org/pipermail/gdb-patches/2020-April/167416.html

Thanks,
Pedro Alves