This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH RFC] introduce dl_iterate_phdr_parallel
- From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- To: Florian Weimer <fweimer at redhat dot com>, Torvald Riegel <triegel at redhat dot com>, Gleb Natapov <gleb at scylladb dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Wed, 3 Aug 2016 11:00:39 -0300
- Subject: Re: [PATCH RFC] introduce dl_iterate_phdr_parallel
- Authentication-results: sourceware.org; auth=none
- References: <20160725142326.GM1018@scylladb.com> <579A6F54.2080709@linaro.org> <20160731091642.GF2502@scylladb.com> <579F8FA8.9060009@linaro.org> <20160801184946.GL17903@scylladb.com> <1470080795.19224.101.camel@localhost.localdomain> <113b9545-292b-e089-c00c-072da711c7ec@redhat.com>
On 03/08/2016 07:53, Florian Weimer wrote:
> On 08/01/2016 09:46 PM, Torvald Riegel wrote:
>> The new rwlock is built so that it supports process-shared usage, which
>> means that we have to put everything into struct pthread_rwlock_t. This
>> will lead to contention if you rdlock it frequently from many threads.
>> There is potential for tuning there because we haven't looked closely at
>> adding back-off in the CAS loop (and if you tested on an arch without
>> direct HW support for fetch-add, the CAS loop used instead of that might
>> also be suboptimal).
>
> The rwlock doesn't eliminate the contention at the hardware level.
>
> If that causes a performance issue, we could reuse Ingo Molnar's brlock approach: per-thread, readers acquire their own lock, writers acquire the locks of all threads. This is fairly efficient in the read case (and I suspect you can't get much better than that in a non-managed run tine), but the write case is obviously extremely costly. This could be the right trade-off here, though.
>
> Florian
The only difference is lglocks/brlocks are per-cpu in kernel, not per-thread.
My concern is what kind of writer degradation it could be in a highly threaded
workload (for instance, a threaded c++ workload with some exceptions that tries
to load a plugin).
It could be the case a constant write lock array, as the initial proposal, could
be a better initial proposal.