This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH RFC] introduce dl_iterate_phdr_parallel

From: Gleb Natapov <gleb at scylladb dot com>
To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
Cc: libc-alpha at sourceware dot org
Date: Tue, 2 Aug 2016 17:55:29 +0300
Subject: Re: [PATCH RFC] introduce dl_iterate_phdr_parallel
Authentication-results: sourceware.org; auth=none
References: <20160725142326.GM1018@scylladb.com> <579A6F54.2080709@linaro.org> <20160731091642.GF2502@scylladb.com> <579F8FA8.9060009@linaro.org> <20160801184946.GL17903@scylladb.com> <561fcac5-5643-99d0-06b4-50a50919fa22@linaro.org> <20160802104657.GD24278@scylladb.com> <7592137c-1f4e-303a-5f25-2e021b4f0028@linaro.org>

On Tue, Aug 02, 2016 at 11:16:33AM -0300, Adhemerval Zanella wrote:
> 
> 
> On 02/08/2016 07:46, Gleb Natapov wrote:
> > On Mon, Aug 01, 2016 at 05:23:47PM -0300, Adhemerval Zanella wrote:
> >>>> Now, I see we have two options here:
> >>>>
> >>>> 1. Work on your proposal for dl_iterate_phdr_parallel and *document* that
> >>>>    current dl_iterate_phdr serializes callback execution and programs that
> >>>>    would like to scale its execution should move to dl_iterate_phdr_parallel.
> >>>>    We would need to further document that scalability uses an internal 64
> >>>>    locks and using depending of thread and number of processor this limit
> >>>>    can be hit.
> >>> 64 locks is implementation detail. It could be change to lock per thread
> >>> where a writer has to lock thread list + all thread locks.
> >>
> >> My point here is more to document current behaviour and performance assumption
> >> to actually discuss the choice of 64 for lock numbers.
> >>
> > OK.
> > 
> >>>
> >>>>
> >>>> 2. Another option is to push to cleanup dl_iterate_phdr interface to *not*
> >>>>    require callback serialization and use a rdlock while accessing the
> >>>>    maps list.  With current rwlock implementation performance won't change
> >>>>    as you noticed, however since we are reworking and changing to a more
> >>>>    scalable one [1] the read only pass should *much* better: the 
> >>>>    __pthread_rwlock_rdlock_full should issue just a atomic_fetch_add_acquire
> >>>>    for uncontented read case.
> >>> I saw new rwlock implementation and tested it. It is much better that
> >>> current rwlock, but still almost twice slower than multiple locks.
> >>> Profiling shows that exchange in unlock takes a lot of cpu. No wonder
> >>> since congested locking operation is very expensive. IMO ideal solution
> >>> is array of rwlocks.
> >>
> >> I would expect so since the unlock phase on new algorithm will yield an
> >> 'atomic_compare_exchange_weak_release' in a loop it will be potentially 
> >> slower than just a 'lll_unlock' (which issues a lock decl on x86_64).
> >>
> >> Performance-wise an array of rdlocks should be best approach.  What I am
> >> not comfortable with is adding another glibc specific interface to clear
> >> circle around a bad interface specification.
> >>
> > I proposed new interface to be on a safe side as an RFC, I am not against
> > fixing existing one. What is the consequences of modifying current behaviour
> > though? Old libgcc will stop working with new glibc, but as far as I
> > understand that can be fixed with symbol versioning (how?). What about other
> > applications if any?
> > 
> 
> First we need to define if we should change dl_iterate_phdr interface to
> remove or not the requirement of running callback mutually exclusive.  If
> this idea is to keep with current behaviour I see we need to work on
> make this behaviour explicit in documentation and check if it is possible
> to add the lock array mechanism without the add of parallel version.
>
What is the procedure to define that? Both are OK to me. You guys have
more experience with glibc compatibility problems/requirements.

> If we decide that locking of callback call are not meant to be the interface
> requirement, we can add a new version (2.25 for instance) that does not
> have this behaviour while still providing an older one (2.2.5) that still
> does locking.  Main problem is building old libstdc++ with newer glibc might
> not work, so I am not sure if this is the best course of action.
I am reading about symbol versioning at it looks like we can define old
interface (2.2.5) to be default one and use a new one explicitly in new libstdc++.
What I have not figured out yet if it is possible to check in new
libstdc++ that 2.25 interface is available in runtime and use it if it
is. For _parallel variant I can do it like this:

extern int dl_iterate_phdr_parallel (int (*__callback) (struct dl_phdr_info *, size_t, void *),
                                     void *__data) __attribute__((weak));

  if (dl_iterate_phdr_parallel) {
    if (dl_iterate_phdr_parallel (_Unwind_IteratePhdrCallback, &data) <
0)
      return NULL;
  } else {
      if (dl_iterate_phdr (_Unwind_IteratePhdrCallback, &data) < 0)
        return NULL;
  }

This will use dl_iterate_phdr_parallel if it is available in glibc and
dl_iterate_phdr otherwise.

--
			Gleb.

References:
- Re: [PATCH RFC] introduce dl_iterate_phdr_parallel
  - From: Adhemerval Zanella
- Re: [PATCH RFC] introduce dl_iterate_phdr_parallel
  - From: Gleb Natapov
- Re: [PATCH RFC] introduce dl_iterate_phdr_parallel
  - From: Adhemerval Zanella
- Re: [PATCH RFC] introduce dl_iterate_phdr_parallel
  - From: Gleb Natapov
- Re: [PATCH RFC] introduce dl_iterate_phdr_parallel
  - From: Adhemerval Zanella

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]