[PATCH] libdw: add thread-safety to dwarf_getabbrev()

Jonathon Anderson jma14@rice.edu
Wed Aug 21 14:21:00 GMT 2019

First message failed to send, hopefully this one works...

On Wed, Aug 21, 2019 at 6:16 AM, Mark Wielaard <mark@klomp.org> wrote:
> Hi Jonathon and Srđan,
> On Fri, 2019-08-16 at 14:24 -0500, Jonathon Anderson wrote:
>>  For parallel applications that need the information in the DIEs, the
>>  Dwarf_Abbrev hash table et al. become a massive data race. This 
>> fixes
>>  that by:
>>  1. Adding atomics & locks to the hash table to manage concurrency
>>     (lib/dynamicsizehash_concurrent.{c,h})
>>  2. Adding a lock & array structure to the memory manager 
>> (pseudo-TLS)
>>     (libdwP.h, libdw_alloc.c)
>>  3. Adding extra configure options for Helgrind/DRD annotations
>>     (configure.ac)
>>  4. Including "stdatomic.h" from FreeBSD, to support C11-style   
>> atomics.
>>     (lib/stdatomic.h)
> This looks like really nice work. Thanks!
> I am splitting review in some smaller parts if you don't mind.
> Simply because it is large and I cannot keep everything in my head at
> once :) But here some initial overall comments.

Thank you for looking over it! Comments on comments below.

>>  Notes:
>>   - GCC >= 4.9 provides <stdatomic.h> natively; for those versions
>>     lib/stdatomic.h could be removed or disabled. We can also 
>> rewrite the
>>     file if the copyright becomes an issue.
> If the compiler provides stdatomic.h then I think it would be good to
> use that instead of our own implementation. The copyright isn't a
> problem. But do you have a reference/URL to the upstream version? I
> like to add that somewhere, so we can sync with it in the future. I 
> see
> various commented out parts. Was that already upstream? Should we just
> remove those parts?

It would definitely be preferable to use the compiler's implementation
if possible, we used this in case GCC 4.7 and 4.8 (RHEL7) compatibility
was needed. If those versions are old enough the file can be removed

The upstream is at
although the version here appears to be slightly modified (we used the
version that HPCToolkit ships). The components we use don't seem
affected, so a resync shouldn't make a difference.

>>   - Currently the concurrent hash table is always enabled,
>>     performance-wise there is no known difference between it
>>     and the non-concurrent  version.
>>     This can be changed to toggle with --enable-thread-safety
>>     if preferred.
> I would prefer it always enabled, unless there is a massive slowdown 
> of
> the single-threaded case. The problem with --enable-thread-safety is
> that it is a) known broken (sigh) and b) it basically introduces two
> separate libraries that behave subtly differently. I would very much
> like to get rid of --enable-thread-safety by fixing the broken locking
> and simply making it the default.

I haven't noticed any slowdown in the single-threaded case, although I
haven't stressed it hard enough to find out for certain. From a
theoretical standpoint it shouldn't, atomics (with the proper memory
orders) are usually (on x86 at least) as cheap as normal accesses when
used by a single thread, and the algorithm is otherwise effectively the
same as the original hash table.

How difficult would it be to fix the locking (or rather, what's
"broken")? We would definitely benefit from having thread-safety at
least for getters, which would only need locks around the internal

>>   - Another implementation of #2 above might use dynamic TLS
>>     (pthread_key_*),
>>     we chose this implementation to reduce the overall complexity.
> Are there any other trade-offs?

If the application spawns N threads that all use a Dwarf object (same
or different) enough to cause an allocation, and then joins them all,
any Dwarf objects allocated after will allocate N unusable slots in the
mem_tails array. Thus long-running applications (that don't use thread
pools) would start experiencing effects similar to a memory leak, of 1
pointer's worth (8 bytes on 64-bit) per dead thread.

The alternative is to use dynamic TLS so that only threads that are
currently active use the extra memory, assuming libpthread is
sufficiently proactive about reclaiming unused key values. I think if
we assume `dwarf_end` happens-after any memory management (which would
make sense for a destructor), there should be a simple atomic pattern
to handle the freeing, but I would need to sit down for a while to work
out a sufficient proof.

I was also under the impression that dynamic TLS was particularly
expensive performance-wise, although I haven't experimented with it
myself enough to know. The compiler can be a little smarter about
static TLS, and the result is easier to reason about, which is why we
chose this implementation for initial patch. If the alternative would
be preferable we can change it.

> Thanks,
> Mark

More information about the Elfutils-devel mailing list