This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH RFC] introduce dl_iterate_phdr_parallel

From: Gleb Natapov <gleb at scylladb dot com>
To: Torvald Riegel <triegel at redhat dot com>
Cc: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, libc-alpha at sourceware dot org
Date: Mon, 1 Aug 2016 23:07:38 +0300
Subject: Re: [PATCH RFC] introduce dl_iterate_phdr_parallel
Authentication-results: sourceware.org; auth=none
References: <20160725142326.GM1018@scylladb.com> <579A6F54.2080709@linaro.org> <20160731091642.GF2502@scylladb.com> <579F8FA8.9060009@linaro.org> <20160801184946.GL17903@scylladb.com> <1470080795.19224.101.camel@localhost.localdomain>

On Mon, Aug 01, 2016 at 09:46:35PM +0200, Torvald Riegel wrote:
> On Mon, 2016-08-01 at 21:49 +0300, Gleb Natapov wrote:
> > On Mon, Aug 01, 2016 at 03:06:32PM -0300, Adhemerval Zanella wrote:
> > > 2. Another option is to push to cleanup dl_iterate_phdr interface to *not*
> > >    require callback serialization and use a rdlock while accessing the
> > >    maps list.  With current rwlock implementation performance won't change
> > >    as you noticed, however since we are reworking and changing to a more
> > >    scalable one [1] the read only pass should *much* better: the 
> > >    __pthread_rwlock_rdlock_full should issue just a atomic_fetch_add_acquire
> > >    for uncontented read case.
> > I saw new rwlock implementation and tested it. It is much better that
> > current rwlock, but still almost twice slower than multiple locks.
> > Profiling shows that exchange in unlock takes a lot of cpu. No wonder
> > since congested locking operation is very expensive. IMO ideal solution
> > is array of rwlocks.
> 
> The new rwlock is built so that it supports process-shared usage, which
> means that we have to put everything into struct pthread_rwlock_t.  This
> will lead to contention if you rdlock it frequently from many threads.
> There is potential for tuning there because we haven't looked closely at
> adding back-off in the CAS loop (and if you tested on an arch without
> direct HW support for fetch-add, the CAS loop used instead of that might
> also be suboptimal).
> Which machine did you test this on?
> 
x86_64

> If we built something custom for this and are willing to make the
> wrlock / exclusive-access case much more costly, we can decrease this
> overhead.  This could be roughly similar to one lock per thread or a set
> of rwlocks as you mentioned, but with less space overhead.
> 
IMO space overhead is negligible. More efficient rwlock is, of course,
better and can be useful in many more places. If you have something to
test I am willing to do so, but if custom rwlock will take time to
materialize we may start with lock array and change it later. The lock
array is not part of the interface, but implementation detail. What I
would like to avoid is stalling the afford while waiting for something
better. Exception scalability is very pressing issue for us.

--
			Gleb.

Follow-Ups:
- Re: [PATCH RFC] introduce dl_iterate_phdr_parallel
  - From: Torvald Riegel

References:
- Re: [PATCH RFC] introduce dl_iterate_phdr_parallel
  - From: Adhemerval Zanella
- Re: [PATCH RFC] introduce dl_iterate_phdr_parallel
  - From: Gleb Natapov
- Re: [PATCH RFC] introduce dl_iterate_phdr_parallel
  - From: Torvald Riegel

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]