This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()




On Fri, Aug 23, 2019 at 1:25 PM, Mark Wielaard <mark@klomp.org> wrote:
Hi,

On Wed, 2019-08-21 at 09:08 -0500, Jonathon Anderson wrote:
 On Wed, Aug 21, 2019 at 6:16 AM, Mark Wielaard <mark@klomp.org>
 wrote:On Fri, 2019-08-16 at 14:24 -0500, Jonathon Anderson wrote:
> > For parallel applications that need the information in the DIEs, the
 > >  Dwarf_Abbrev hash table et al. become a massive data race. This
 > > fixes
 > >  that by:
 > >
> > 1. Adding atomics & locks to the hash table to manage concurrency
 > >     (lib/dynamicsizehash_concurrent.{c,h})
 > >  2. Adding a lock & array structure to the memory manager
 > > (pseudo-TLS)
 > >     (libdwP.h, libdw_alloc.c)
 > >  3. Adding extra configure options for Helgrind/DRD annotations
 > >     (configure.ac)
 > >  4. Including "stdatomic.h" from FreeBSD, to support C11-style
 > > atomics.
 > >     (lib/stdatomic.h)
 >
 > This looks like really nice work. Thanks!
 >
 > I am splitting review in some smaller parts if you don't mind.
> Simply because it is large and I cannot keep everything in my head at
 > once :)

BTW. I would prefer to handle this as 4 separate additions, probably in
this order:

1) configure stuff for valgrind annotations.
2) add support for stdatomic.h functions.
3) thread-safe obstack memory handling
4) concurrent dynamic hash table.

Sure thing, I can split the patch into bits over the weekend. I may take your advice and just use git request-pull though.


 > If the compiler provides stdatomic.h then I think it would be good
 > to
 > use that instead of our own implementation. The copyright isn't a
> problem. But do you have a reference/URL to the upstream version? I > like to add that somewhere, so we can sync with it in the future. I
 > see
> various commented out parts. Was that already upstream? Should we just
 > remove those parts?

It would definitely be preferable to use the compiler's implementation if possible, we used this in case GCC 4.7 and 4.8 (RHEL7) compatibility
 was needed. If those versions are old enough the file can be removed
 entirely.

 The upstream is at
 https://github.com/freebsd/freebsd/blob/master/sys/sys/stdatomic.h,
although the version here appears to be slightly modified (we used the
 version that HPCToolkit ships). The components we use don't seem
 affected, so a resync shouldn't make a difference.

OK, then we should come up with some kind of configure test to see if
we can use the standard stdatomic.h and otherwise use our own. I am
surprised I cannot find other projects doing this. Would be nice to
"steal" something standard for this.

At least OpenSSL does it: https://github.com/openssl/openssl/blob/master/include/internal/refcount.h, the interesting note being that it has a series of fallbacks (various compiler builtins and then locks). The other projects I skimmed just have the fallbacks and don't check for C11, given that Elfutils only supports GCC that might be a valid (and more compact) approach.


 > >   - Currently the concurrent hash table is always enabled,
 > >     performance-wise there is no known difference between it
 > >     and the non-concurrent  version.
 > >     This can be changed to toggle with --enable-thread-safety
 > >     if preferred.
 >
> I would prefer it always enabled, unless there is a massive slowdown
 > of
> the single-threaded case. The problem with --enable-thread-safety is > that it is a) known broken (sigh) and b) it basically introduces two > separate libraries that behave subtly differently. I would very much > like to get rid of --enable-thread-safety by fixing the broken locking
 > and simply making it the default.

I haven't noticed any slowdown in the single-threaded case, although I
 haven't stressed it hard enough to find out for certain. From a
 theoretical standpoint it shouldn't, atomics (with the proper memory
orders) are usually (on x86 at least) as cheap as normal accesses when used by a single thread, and the algorithm is otherwise effectively the
 same as the original hash table.

 How difficult would it be to fix the locking (or rather, what's
 "broken")? We would definitely benefit from having thread-safety at
 least for getters, which would only need locks around the internal
 management.

To be honest I don't know how badly it is broken.
It is only implemented for libelf.
If you configure --enable-thread-safety and make check you will see
several tests fail because they abort with Unexpected error: Resource
deadlock avoided.

I think it is mainly that nobody maintained the locks and now some are
just wrongly placed. Ideally we switch --enable-thread-safety on by
default, identify which locks are wrongly placed, run all tests with
valgrind/hellgrind and fix any issues found.

It really has not been a priority. Sorry.

No worries, its not a priority on our end either. Elfutils' codebase is significantly simpler (IMHO) than ours, so if it ever comes up we'll just submit another patch.


 > >   - Another implementation of #2 above might use dynamic TLS
 > >     (pthread_key_*),
> > we chose this implementation to reduce the overall complexity.
 >
 > Are there any other trade-offs?

If the application spawns N threads that all use a Dwarf object (same or different) enough to cause an allocation, and then joins them all, any Dwarf objects allocated after will allocate N unusable slots in the mem_tails array. Thus long-running applications (that don't use thread pools) would start experiencing effects similar to a memory leak, of 1
 pointer's worth (8 bytes on 64-bit) per dead thread.

 The alternative is to use dynamic TLS so that only threads that are
 currently active use the extra memory, assuming libpthread is
sufficiently proactive about reclaiming unused key values. I think if we assume `dwarf_end` happens-after any memory management (which would make sense for a destructor), there should be a simple atomic pattern to handle the freeing, but I would need to sit down for a while to work
 out a sufficient proof.

 I was also under the impression that dynamic TLS was particularly
 expensive performance-wise, although I haven't experimented with it
 myself enough to know. The compiler can be a little smarter about
static TLS, and the result is easier to reason about, which is why we chose this implementation for initial patch. If the alternative would
 be preferable we can change it.

I thought a bit about this one and although I am slightly worried about
the possible indefinite growing of the thread_ids, I don't think the
"memory leak" is an issue. Concurrent usage of the Dwarf object already costs a bit more memory (since each thread gets its own memory block if they allocate at the same time) which is probably larger than any extra
created by reserving space for all possible thread_ids. This is only
really a problem for a program that doesn't use thread pools and keeps
opening and concurrently accessing new Dwarf objects (because at a
certain point the whole Dwarf will have been read and no new
allocations happen). Although it would be nice if we could somehow
reset the next_id to zero in dwarf_end (), when this is the last thread
or Dwarf object.

The memory overhead is a little worse than that (each thread allocates its own memory blocks, period), but that would be present in both implementations. I can't think of a simple way to increase the memory efficiency past that (although I can think of some ridiculously complex ways).

I suppose it would be possible to use a sort of free-list for the IDs, although that requires a hook at thread exit (doable with PThreads, not with C11) and cleaning up would be a bit of a nightmare. At some point dynamic TLS is more robust against weird situations (and, if my thought-proof is correct, simpler).


Cheers,

Mark


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]