[PATCH] libdw: add thread-safety to dwarf_getabbrev()
Jonathon Anderson
jma14@rice.edu
Fri Aug 23 22:36:00 GMT 2019
On Fri, Aug 23, 2019 at 1:25 PM, Mark Wielaard <mark@klomp.org> wrote:
> Hi,
>
> On Wed, 2019-08-21 at 09:08 -0500, Jonathon Anderson wrote:
>> On Wed, Aug 21, 2019 at 6:16 AM, Mark Wielaard <mark@klomp.org>
>> wrote:On Fri, 2019-08-16 at 14:24 -0500, Jonathon Anderson wrote:
>> > > For parallel applications that need the information in the
>> DIEs, the
>> > > Dwarf_Abbrev hash table et al. become a massive data race. This
>> > > fixes
>> > > that by:
>> > >
>> > > 1. Adding atomics & locks to the hash table to manage
>> concurrency
>> > > (lib/dynamicsizehash_concurrent.{c,h})
>> > > 2. Adding a lock & array structure to the memory manager
>> > > (pseudo-TLS)
>> > > (libdwP.h, libdw_alloc.c)
>> > > 3. Adding extra configure options for Helgrind/DRD annotations
>> > > (configure.ac)
>> > > 4. Including "stdatomic.h" from FreeBSD, to support C11-style
>> > > atomics.
>> > > (lib/stdatomic.h)
>> >
>> > This looks like really nice work. Thanks!
>> >
>> > I am splitting review in some smaller parts if you don't mind.
>> > Simply because it is large and I cannot keep everything in my
>> head at
>> > once :)
>
> BTW. I would prefer to handle this as 4 separate additions, probably
> in
> this order:
>
> 1) configure stuff for valgrind annotations.
> 2) add support for stdatomic.h functions.
> 3) thread-safe obstack memory handling
> 4) concurrent dynamic hash table.
Sure thing, I can split the patch into bits over the weekend. I may
take your advice and just use git request-pull though.
>
>> > If the compiler provides stdatomic.h then I think it would be good
>> > to
>> > use that instead of our own implementation. The copyright isn't a
>> > problem. But do you have a reference/URL to the upstream version?
>> I
>> > like to add that somewhere, so we can sync with it in the future.
>> I
>> > see
>> > various commented out parts. Was that already upstream? Should we
>> just
>> > remove those parts?
>>
>> It would definitely be preferable to use the compiler's
>> implementation
>> if possible, we used this in case GCC 4.7 and 4.8 (RHEL7)
>> compatibility
>> was needed. If those versions are old enough the file can be removed
>> entirely.
>>
>> The upstream is at
>> https://github.com/freebsd/freebsd/blob/master/sys/sys/stdatomic.h,
>> although the version here appears to be slightly modified (we used
>> the
>> version that HPCToolkit ships). The components we use don't seem
>> affected, so a resync shouldn't make a difference.
>
> OK, then we should come up with some kind of configure test to see if
> we can use the standard stdatomic.h and otherwise use our own. I am
> surprised I cannot find other projects doing this. Would be nice to
> "steal" something standard for this.
At least OpenSSL does it:
https://github.com/openssl/openssl/blob/master/include/internal/refcount.h,
the interesting note being that it has a series of fallbacks (various
compiler builtins and then locks). The other projects I skimmed just
have the fallbacks and don't check for C11, given that Elfutils only
supports GCC that might be a valid (and more compact) approach.
>
>> > > - Currently the concurrent hash table is always enabled,
>> > > performance-wise there is no known difference between it
>> > > and the non-concurrent version.
>> > > This can be changed to toggle with --enable-thread-safety
>> > > if preferred.
>> >
>> > I would prefer it always enabled, unless there is a massive
>> slowdown
>> > of
>> > the single-threaded case. The problem with --enable-thread-safety
>> is
>> > that it is a) known broken (sigh) and b) it basically introduces
>> two
>> > separate libraries that behave subtly differently. I would very
>> much
>> > like to get rid of --enable-thread-safety by fixing the broken
>> locking
>> > and simply making it the default.
>>
>> I haven't noticed any slowdown in the single-threaded case,
>> although I
>> haven't stressed it hard enough to find out for certain. From a
>> theoretical standpoint it shouldn't, atomics (with the proper memory
>> orders) are usually (on x86 at least) as cheap as normal accesses
>> when
>> used by a single thread, and the algorithm is otherwise effectively
>> the
>> same as the original hash table.
>>
>> How difficult would it be to fix the locking (or rather, what's
>> "broken")? We would definitely benefit from having thread-safety at
>> least for getters, which would only need locks around the internal
>> management.
>
> To be honest I don't know how badly it is broken.
> It is only implemented for libelf.
> If you configure --enable-thread-safety and make check you will see
> several tests fail because they abort with Unexpected error: Resource
> deadlock avoided.
>
> I think it is mainly that nobody maintained the locks and now some are
> just wrongly placed. Ideally we switch --enable-thread-safety on by
> default, identify which locks are wrongly placed, run all tests with
> valgrind/hellgrind and fix any issues found.
>
> It really has not been a priority. Sorry.
No worries, its not a priority on our end either. Elfutils' codebase is
significantly simpler (IMHO) than ours, so if it ever comes up we'll
just submit another patch.
>
>> > > - Another implementation of #2 above might use dynamic TLS
>> > > (pthread_key_*),
>> > > we chose this implementation to reduce the overall
>> complexity.
>> >
>> > Are there any other trade-offs?
>>
>> If the application spawns N threads that all use a Dwarf object
>> (same
>> or different) enough to cause an allocation, and then joins them
>> all,
>> any Dwarf objects allocated after will allocate N unusable slots in
>> the
>> mem_tails array. Thus long-running applications (that don't use
>> thread
>> pools) would start experiencing effects similar to a memory leak,
>> of 1
>> pointer's worth (8 bytes on 64-bit) per dead thread.
>>
>> The alternative is to use dynamic TLS so that only threads that are
>> currently active use the extra memory, assuming libpthread is
>> sufficiently proactive about reclaiming unused key values. I think
>> if
>> we assume `dwarf_end` happens-after any memory management (which
>> would
>> make sense for a destructor), there should be a simple atomic
>> pattern
>> to handle the freeing, but I would need to sit down for a while to
>> work
>> out a sufficient proof.
>>
>> I was also under the impression that dynamic TLS was particularly
>> expensive performance-wise, although I haven't experimented with it
>> myself enough to know. The compiler can be a little smarter about
>> static TLS, and the result is easier to reason about, which is why
>> we
>> chose this implementation for initial patch. If the alternative
>> would
>> be preferable we can change it.
>
> I thought a bit about this one and although I am slightly worried
> about
> the possible indefinite growing of the thread_ids, I don't think the
> "memory leak" is an issue. Concurrent usage of the Dwarf object
> already
> costs a bit more memory (since each thread gets its own memory block
> if
> they allocate at the same time) which is probably larger than any
> extra
> created by reserving space for all possible thread_ids. This is only
> really a problem for a program that doesn't use thread pools and keeps
> opening and concurrently accessing new Dwarf objects (because at a
> certain point the whole Dwarf will have been read and no new
> allocations happen). Although it would be nice if we could somehow
> reset the next_id to zero in dwarf_end (), when this is the last
> thread
> or Dwarf object.
The memory overhead is a little worse than that (each thread allocates
its own memory blocks, period), but that would be present in both
implementations. I can't think of a simple way to increase the memory
efficiency past that (although I can think of some ridiculously complex
ways).
I suppose it would be possible to use a sort of free-list for the IDs,
although that requires a hook at thread exit (doable with PThreads, not
with C11) and cleaning up would be a bit of a nightmare. At some point
dynamic TLS is more robust against weird situations (and, if my
thought-proof is correct, simpler).
>
> Cheers,
>
> Mark
More information about the Elfutils-devel
mailing list