[RFC][gdbsupport] Improve thread scheduling in parallel_for_each
Tom de Vries
tdevries@suse.de
Fri Jul 15 09:40:35 GMT 2022
Hi,
In https://sourceware.org/pipermail/gdb-patches/2022-July/190757.html I added
some debug statements to the thread scheduling in parallel_for_each and
noticed that only 3 out of 4 worker threads were used:
...
Parallel for: n_elements: 7271
Parallel for: minimum elements per thread: 10
Parallel for: elts_per_thread: 1817
Parallel for: worker threads: 4
Parallel for: worker threads used: 3
Parallel for: elements 0-1816 on thread 0: 1817
Parallel for: elements 1817-3633 on thread 1: 1817
Parallel for: elements 3634-5450 on thread 2: 1817
Parallel for: elements 5451-7270 on main thread: 1820
...
This simple fix:
...
- size_t count = n_threads == 0 ? 0 : n_threads - 1;
+ size_t count = n_threads == 0 ? 0 : n_threads;
...
would give thread 3 a share of the work, such that we have instead:
...
Parallel for: worker threads used: 4
Parallel for: elements 0-1816 on thread 0: 1817
Parallel for: elements 1817-3633 on thread 1: 1817
Parallel for: elements 3634-5450 on thread 2: 1817
Parallel for: elements 5451-7267 on thread 3: 1817
Parallel for: elements 7268-7270 on main thread: 3
...
but that would leave the main thread with very few iterations (always less
than the number of worker threads).
Fix this instead by also counting the main thread for distributing work to,
such that we have instead:
...
Parallel for: elts_per_thread: 1454
Parallel for: worker threads: 4
Parallel for: worker threads used: 4
Parallel for: elements 0-1453 on thread 0: 1454
Parallel for: elements 1454-2907 on thread 1: 1454
Parallel for: elements 2908-4361 on thread 2: 1454
Parallel for: elements 4362-5815 on thread 3: 1454
Parallel for: elements 5816-7270 on main thread: 1455
...
This introduces a performance regression on a particular test-case I happened
to use:
...
$ for n in $(seq 1 10); do \
time gdb -q -batch libxul.so.debug 2>&1 | grep real:; \
done
...
so revert to the original schedule by reducing the worker threads:
...
n_threads = std::thread::hardware_concurrency ();
+ if (n_threads > 0)
+ /* Account for main thread. */
+ n_threads--;
...
So, it looks like the only thing we've achieved is not have an idle worker
thread anymore.
Redoing the performance experiment still yields a slight performance loss.
Try to improve things a tiny bit by spreading the left-over elements over
all threads rather than just the main thread:
...
Parallel for: elements 0-1817 on thread 0: 1818
Parallel for: elements 1818-3635 on thread 1: 1818
Parallel for: elements 3636-5453 on thread 2: 1818
Parallel for: elements 5454-7270 on main thread: 1817
...
Still, the performance experiment yields a slight performance loss.
Tested on x86_64-linux.
Any comments?
Thanks,
- Tom
[gdbsupport] Improve thread scheduling in parallel_for_each
---
gdb/maint.c | 7 ++++++-
gdbsupport/parallel-for.h | 27 ++++++++++++++++-----------
2 files changed, 22 insertions(+), 12 deletions(-)
diff --git a/gdb/maint.c b/gdb/maint.c
index 289560957f2..cc3be7b7d82 100644
--- a/gdb/maint.c
+++ b/gdb/maint.c
@@ -850,7 +850,12 @@ update_thread_pool_size ()
int n_threads = n_worker_threads;
if (n_threads < 0)
- n_threads = std::thread::hardware_concurrency ();
+ {
+ n_threads = std::thread::hardware_concurrency ();
+ if (n_threads > 0)
+ /* Account for main thread. */
+ n_threads--;
+ }
gdb::thread_pool::g_thread_pool->set_thread_count (n_threads);
#endif
diff --git a/gdbsupport/parallel-for.h b/gdbsupport/parallel-for.h
index a614fc35766..2bf8de25bc9 100644
--- a/gdbsupport/parallel-for.h
+++ b/gdbsupport/parallel-for.h
@@ -139,25 +139,30 @@ parallel_for_each (unsigned n, RandomIt first, RandomIt last,
using result_type
= typename std::result_of<RangeFunction (RandomIt, RandomIt)>::type;
- size_t n_threads = thread_pool::g_thread_pool->thread_count ();
+ size_t n_threads
+ = (thread_pool::g_thread_pool->thread_count ()
+ /* Also count the main thread as a thread to distribute work over. */
+ + 1);
size_t n_elements = last - first;
size_t elts_per_thread = 0;
- if (n_threads > 1)
- {
- /* Require that there should be at least N elements in a
- thread. */
- gdb_assert (n > 0);
- if (n_elements / n_threads < n)
- n_threads = std::max (n_elements / n, (size_t) 1);
- elts_per_thread = n_elements / n_threads;
- }
- size_t count = n_threads == 0 ? 0 : n_threads - 1;
+ /* Require that there should be at least N elements in a
+ thread. */
+ gdb_assert (n > 0);
+ if (n_elements / n_threads < n)
+ n_threads = std::max (n_elements / n, (size_t) 1);
+ elts_per_thread = n_elements / n_threads;
+ gdb_assert (elts_per_thread >= n || n_threads == 1);
+ size_t left_over = n_elements % n_threads;
+
+ size_t count = n_threads - 1;
gdb::detail::par_for_accumulator<result_type> results (count);
for (int i = 0; i < count; ++i)
{
RandomIt end = first + elts_per_thread;
+ if (i < left_over)
+ end++;
results.post (i, [=] ()
{
return callback (first, end);
More information about the Gdb-patches
mailing list