[PATCH] libdw: add thread-safety to dwarf_getabbrev()

Fri Aug 23 18:25:00 GMT 2019

Hi,

On Wed, 2019-08-21 at 09:08 -0500, Jonathon Anderson wrote:
> On Wed, Aug 21, 2019 at 6:16 AM, Mark Wielaard <mark@klomp.org>
> wrote:On Fri, 2019-08-16 at 14:24 -0500, Jonathon Anderson wrote:
> > >  For parallel applications that need the information in the DIEs, the
> > >  Dwarf_Abbrev hash table et al. become a massive data race. This 
> > > fixes
> > >  that by:
> > > 
> > >  1. Adding atomics & locks to the hash table to manage concurrency
> > >     (lib/dynamicsizehash_concurrent.{c,h})
> > >  2. Adding a lock & array structure to the memory manager 
> > > (pseudo-TLS)
> > >     (libdwP.h, libdw_alloc.c)
> > >  3. Adding extra configure options for Helgrind/DRD annotations
> > >     (configure.ac)
> > >  4. Including "stdatomic.h" from FreeBSD, to support C11-style   
> > > atomics.
> > >     (lib/stdatomic.h)
> > 
> > This looks like really nice work. Thanks!
> > 
> > I am splitting review in some smaller parts if you don't mind.
> > Simply because it is large and I cannot keep everything in my head at
> > once :)

BTW. I would prefer to handle this as 4 separate additions, probably in
this order:

1) configure stuff for valgrind annotations.
2) add support for stdatomic.h functions.
3) thread-safe obstack memory handling
4) concurrent dynamic hash table.

> > If the compiler provides stdatomic.h then I think it would be good
> > to
> > use that instead of our own implementation. The copyright isn't a
> > problem. But do you have a reference/URL to the upstream version? I
> > like to add that somewhere, so we can sync with it in the future. I 
> > see
> > various commented out parts. Was that already upstream? Should we just
> > remove those parts?
> 
> It would definitely be preferable to use the compiler's implementation 
> if possible, we used this in case GCC 4.7 and 4.8 (RHEL7) compatibility 
> was needed. If those versions are old enough the file can be removed 
> entirely.
> 
> The upstream is at 
> https://github.com/freebsd/freebsd/blob/master/sys/sys/stdatomic.h, 
> although the version here appears to be slightly modified (we used the 
> version that HPCToolkit ships). The components we use don't seem 
> affected, so a resync shouldn't make a difference.

OK, then we should come up with some kind of configure test to see if
we can use the standard stdatomic.h and otherwise use our own. I am
surprised I cannot find other projects doing this. Would be nice to
"steal" something standard for this.

> > >   - Currently the concurrent hash table is always enabled,
> > >     performance-wise there is no known difference between it
> > >     and the non-concurrent  version.
> > >     This can be changed to toggle with --enable-thread-safety
> > >     if preferred.
> > 
> > I would prefer it always enabled, unless there is a massive slowdown 
> > of
> > the single-threaded case. The problem with --enable-thread-safety is
> > that it is a) known broken (sigh) and b) it basically introduces two
> > separate libraries that behave subtly differently. I would very much
> > like to get rid of --enable-thread-safety by fixing the broken locking
> > and simply making it the default.
> 
> I haven't noticed any slowdown in the single-threaded case, although I 
> haven't stressed it hard enough to find out for certain. From a 
> theoretical standpoint it shouldn't, atomics (with the proper memory 
> orders) are usually (on x86 at least) as cheap as normal accesses when 
> used by a single thread, and the algorithm is otherwise effectively the 
> same as the original hash table.
> 
> How difficult would it be to fix the locking (or rather, what's 
> "broken")? We would definitely benefit from having thread-safety at 
> least for getters, which would only need locks around the internal 
> management.

To be honest I don't know how badly it is broken.
It is only implemented for libelf.
If you configure --enable-thread-safety and make check you will see
several tests fail because they abort with Unexpected error: Resource
deadlock avoided.

I think it is mainly that nobody maintained the locks and now some are
just wrongly placed. Ideally we switch --enable-thread-safety on by
default, identify which locks are wrongly placed, run all tests with
valgrind/hellgrind and fix any issues found.

It really has not been a priority. Sorry.

> > >   - Another implementation of #2 above might use dynamic TLS
> > >     (pthread_key_*),
> > >     we chose this implementation to reduce the overall complexity.
> > 
> > Are there any other trade-offs?
> 
> If the application spawns N threads that all use a Dwarf object (same 
> or different) enough to cause an allocation, and then joins them all, 
> any Dwarf objects allocated after will allocate N unusable slots in the 
> mem_tails array. Thus long-running applications (that don't use thread 
> pools) would start experiencing effects similar to a memory leak, of 1 
> pointer's worth (8 bytes on 64-bit) per dead thread.
> 
> The alternative is to use dynamic TLS so that only threads that are 
> currently active use the extra memory, assuming libpthread is 
> sufficiently proactive about reclaiming unused key values. I think if 
> we assume `dwarf_end` happens-after any memory management (which would 
> make sense for a destructor), there should be a simple atomic pattern 
> to handle the freeing, but I would need to sit down for a while to work 
> out a sufficient proof.
> 
> I was also under the impression that dynamic TLS was particularly 
> expensive performance-wise, although I haven't experimented with it 
> myself enough to know. The compiler can be a little smarter about 
> static TLS, and the result is easier to reason about, which is why we 
> chose this implementation for initial patch. If the alternative would 
> be preferable we can change it.

I thought a bit about this one and although I am slightly worried about
the possible indefinite growing of the thread_ids, I don't think the
"memory leak" is an issue. Concurrent usage of the Dwarf object already
costs a bit more memory (since each thread gets its own memory block if
they allocate at the same time) which is probably larger than any extra
created by reserving space for all possible thread_ids. This is only
really a problem for a program that doesn't use thread pools and keeps
opening and concurrently accessing new Dwarf objects (because at a
certain point the whole Dwarf will have been read and no new
allocations happen). Although it would be nice if we could somehow
reset the next_id to zero in dwarf_end (), when this is the last thread
or Dwarf object.

Cheers,

Mark