[PATCH] libdw: add thread-safety to dwarf_getabbrev()

Jonathon Anderson jma14@rice.edu
Fri Aug 23 22:36:00 GMT 2019



On Fri, Aug 23, 2019 at 1:25 PM, Mark Wielaard <mark@klomp.org> wrote:
> Hi,
> 
> On Wed, 2019-08-21 at 09:08 -0500, Jonathon Anderson wrote:
>>  On Wed, Aug 21, 2019 at 6:16 AM, Mark Wielaard <mark@klomp.org>
>>  wrote:On Fri, 2019-08-16 at 14:24 -0500, Jonathon Anderson wrote:
>>  > >  For parallel applications that need the information in the 
>> DIEs, the
>>  > >  Dwarf_Abbrev hash table et al. become a massive data race. This
>>  > > fixes
>>  > >  that by:
>>  > >
>>  > >  1. Adding atomics & locks to the hash table to manage 
>> concurrency
>>  > >     (lib/dynamicsizehash_concurrent.{c,h})
>>  > >  2. Adding a lock & array structure to the memory manager
>>  > > (pseudo-TLS)
>>  > >     (libdwP.h, libdw_alloc.c)
>>  > >  3. Adding extra configure options for Helgrind/DRD annotations
>>  > >     (configure.ac)
>>  > >  4. Including "stdatomic.h" from FreeBSD, to support C11-style
>>  > > atomics.
>>  > >     (lib/stdatomic.h)
>>  >
>>  > This looks like really nice work. Thanks!
>>  >
>>  > I am splitting review in some smaller parts if you don't mind.
>>  > Simply because it is large and I cannot keep everything in my 
>> head at
>>  > once :)
> 
> BTW. I would prefer to handle this as 4 separate additions, probably 
> in
> this order:
> 
> 1) configure stuff for valgrind annotations.
> 2) add support for stdatomic.h functions.
> 3) thread-safe obstack memory handling
> 4) concurrent dynamic hash table.

Sure thing, I can split the patch into bits over the weekend. I may 
take your advice and just use git request-pull though.

> 
>>  > If the compiler provides stdatomic.h then I think it would be good
>>  > to
>>  > use that instead of our own implementation. The copyright isn't a
>>  > problem. But do you have a reference/URL to the upstream version? 
>> I
>>  > like to add that somewhere, so we can sync with it in the future. 
>> I
>>  > see
>>  > various commented out parts. Was that already upstream? Should we 
>> just
>>  > remove those parts?
>> 
>>  It would definitely be preferable to use the compiler's 
>> implementation
>>  if possible, we used this in case GCC 4.7 and 4.8 (RHEL7) 
>> compatibility
>>  was needed. If those versions are old enough the file can be removed
>>  entirely.
>> 
>>  The upstream is at
>>  https://github.com/freebsd/freebsd/blob/master/sys/sys/stdatomic.h,
>>  although the version here appears to be slightly modified (we used 
>> the
>>  version that HPCToolkit ships). The components we use don't seem
>>  affected, so a resync shouldn't make a difference.
> 
> OK, then we should come up with some kind of configure test to see if
> we can use the standard stdatomic.h and otherwise use our own. I am
> surprised I cannot find other projects doing this. Would be nice to
> "steal" something standard for this.

At least OpenSSL does it: 
https://github.com/openssl/openssl/blob/master/include/internal/refcount.h, 
the interesting note being that it has a series of fallbacks (various 
compiler builtins and then locks). The other projects I skimmed just 
have the fallbacks and don't check for C11, given that Elfutils only 
supports GCC that might be a valid (and more compact) approach.

> 
>>  > >   - Currently the concurrent hash table is always enabled,
>>  > >     performance-wise there is no known difference between it
>>  > >     and the non-concurrent  version.
>>  > >     This can be changed to toggle with --enable-thread-safety
>>  > >     if preferred.
>>  >
>>  > I would prefer it always enabled, unless there is a massive 
>> slowdown
>>  > of
>>  > the single-threaded case. The problem with --enable-thread-safety 
>> is
>>  > that it is a) known broken (sigh) and b) it basically introduces 
>> two
>>  > separate libraries that behave subtly differently. I would very 
>> much
>>  > like to get rid of --enable-thread-safety by fixing the broken 
>> locking
>>  > and simply making it the default.
>> 
>>  I haven't noticed any slowdown in the single-threaded case, 
>> although I
>>  haven't stressed it hard enough to find out for certain. From a
>>  theoretical standpoint it shouldn't, atomics (with the proper memory
>>  orders) are usually (on x86 at least) as cheap as normal accesses 
>> when
>>  used by a single thread, and the algorithm is otherwise effectively 
>> the
>>  same as the original hash table.
>> 
>>  How difficult would it be to fix the locking (or rather, what's
>>  "broken")? We would definitely benefit from having thread-safety at
>>  least for getters, which would only need locks around the internal
>>  management.
> 
> To be honest I don't know how badly it is broken.
> It is only implemented for libelf.
> If you configure --enable-thread-safety and make check you will see
> several tests fail because they abort with Unexpected error: Resource
> deadlock avoided.
> 
> I think it is mainly that nobody maintained the locks and now some are
> just wrongly placed. Ideally we switch --enable-thread-safety on by
> default, identify which locks are wrongly placed, run all tests with
> valgrind/hellgrind and fix any issues found.
> 
> It really has not been a priority. Sorry.

No worries, its not a priority on our end either. Elfutils' codebase is 
significantly simpler (IMHO) than ours, so if it ever comes up we'll 
just submit another patch.

> 
>>  > >   - Another implementation of #2 above might use dynamic TLS
>>  > >     (pthread_key_*),
>>  > >     we chose this implementation to reduce the overall 
>> complexity.
>>  >
>>  > Are there any other trade-offs?
>> 
>>  If the application spawns N threads that all use a Dwarf object 
>> (same
>>  or different) enough to cause an allocation, and then joins them 
>> all,
>>  any Dwarf objects allocated after will allocate N unusable slots in 
>> the
>>  mem_tails array. Thus long-running applications (that don't use 
>> thread
>>  pools) would start experiencing effects similar to a memory leak, 
>> of 1
>>  pointer's worth (8 bytes on 64-bit) per dead thread.
>> 
>>  The alternative is to use dynamic TLS so that only threads that are
>>  currently active use the extra memory, assuming libpthread is
>>  sufficiently proactive about reclaiming unused key values. I think 
>> if
>>  we assume `dwarf_end` happens-after any memory management (which 
>> would
>>  make sense for a destructor), there should be a simple atomic 
>> pattern
>>  to handle the freeing, but I would need to sit down for a while to 
>> work
>>  out a sufficient proof.
>> 
>>  I was also under the impression that dynamic TLS was particularly
>>  expensive performance-wise, although I haven't experimented with it
>>  myself enough to know. The compiler can be a little smarter about
>>  static TLS, and the result is easier to reason about, which is why 
>> we
>>  chose this implementation for initial patch. If the alternative 
>> would
>>  be preferable we can change it.
> 
> I thought a bit about this one and although I am slightly worried 
> about
> the possible indefinite growing of the thread_ids, I don't think the
> "memory leak" is an issue. Concurrent usage of the Dwarf object 
> already
> costs a bit more memory (since each thread gets its own memory block 
> if
> they allocate at the same time) which is probably larger than any 
> extra
> created by reserving space for all possible thread_ids. This is only
> really a problem for a program that doesn't use thread pools and keeps
> opening and concurrently accessing new Dwarf objects (because at a
> certain point the whole Dwarf will have been read and no new
> allocations happen). Although it would be nice if we could somehow
> reset the next_id to zero in dwarf_end (), when this is the last 
> thread
> or Dwarf object.

The memory overhead is a little worse than that (each thread allocates 
its own memory blocks, period), but that would be present in both 
implementations. I can't think of a simple way to increase the memory 
efficiency past that (although I can think of some ridiculously complex 
ways).

I suppose it would be possible to use a sort of free-list for the IDs, 
although that requires a hook at thread exit (doable with PThreads, not 
with C11) and cleaning up would be a bit of a nightmare. At some point 
dynamic TLS is more robust against weird situations (and, if my 
thought-proof is correct, simpler).

> 
> Cheers,
> 
> Mark



More information about the Elfutils-devel mailing list