This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Fri, Aug 23, 2019 at 1:25 PM, Mark Wielaard <mark@klomp.org> wrote:
Hi, On Wed, 2019-08-21 at 09:08 -0500, Jonathon Anderson wrote:On Wed, Aug 21, 2019 at 6:16 AM, Mark Wielaard <mark@klomp.org> wrote:On Fri, 2019-08-16 at 14:24 -0500, Jonathon Anderson wrote:> > For parallel applications that need the information in the DIEs, the> > Dwarf_Abbrev hash table et al. become a massive data race. This > > fixes > > that by: > >> > 1. Adding atomics & locks to the hash table to manage concurrency> > (lib/dynamicsizehash_concurrent.{c,h}) > > 2. Adding a lock & array structure to the memory manager > > (pseudo-TLS) > > (libdwP.h, libdw_alloc.c) > > 3. Adding extra configure options for Helgrind/DRD annotations > > (configure.ac) > > 4. Including "stdatomic.h" from FreeBSD, to support C11-style > > atomics. > > (lib/stdatomic.h) > > This looks like really nice work. Thanks! > > I am splitting review in some smaller parts if you don't mind.> Simply because it is large and I cannot keep everything in my head at> once :)BTW. I would prefer to handle this as 4 separate additions, probably inthis order: 1) configure stuff for valgrind annotations. 2) add support for stdatomic.h functions. 3) thread-safe obstack memory handling 4) concurrent dynamic hash table.
Sure thing, I can split the patch into bits over the weekend. I may take your advice and just use git request-pull though.
> If the compiler provides stdatomic.h then I think it would be good > to > use that instead of our own implementation. The copyright isn't a> problem. But do you have a reference/URL to the upstream version? I > like to add that somewhere, so we can sync with it in the future. I> see> various commented out parts. Was that already upstream? Should we just> remove those parts?It would definitely be preferable to use the compiler's implementation if possible, we used this in case GCC 4.7 and 4.8 (RHEL7) compatibilitywas needed. If those versions are old enough the file can be removed entirely. The upstream is at https://github.com/freebsd/freebsd/blob/master/sys/sys/stdatomic.h,although the version here appears to be slightly modified (we used theversion that HPCToolkit ships). The components we use don't seem affected, so a resync shouldn't make a difference.OK, then we should come up with some kind of configure test to see if we can use the standard stdatomic.h and otherwise use our own. I am surprised I cannot find other projects doing this. Would be nice to "steal" something standard for this.
At least OpenSSL does it: https://github.com/openssl/openssl/blob/master/include/internal/refcount.h, the interesting note being that it has a series of fallbacks (various compiler builtins and then locks). The other projects I skimmed just have the fallbacks and don't check for C11, given that Elfutils only supports GCC that might be a valid (and more compact) approach.
> > - Currently the concurrent hash table is always enabled, > > performance-wise there is no known difference between it > > and the non-concurrent version. > > This can be changed to toggle with --enable-thread-safety > > if preferred. >> I would prefer it always enabled, unless there is a massive slowdown> of> the single-threaded case. The problem with --enable-thread-safety is > that it is a) known broken (sigh) and b) it basically introduces two > separate libraries that behave subtly differently. I would very much > like to get rid of --enable-thread-safety by fixing the broken locking> and simply making it the default.I haven't noticed any slowdown in the single-threaded case, although Ihaven't stressed it hard enough to find out for certain. From a theoretical standpoint it shouldn't, atomics (with the proper memoryorders) are usually (on x86 at least) as cheap as normal accesses when used by a single thread, and the algorithm is otherwise effectively thesame as the original hash table. How difficult would it be to fix the locking (or rather, what's "broken")? We would definitely benefit from having thread-safety at least for getters, which would only need locks around the internal management.To be honest I don't know how badly it is broken. It is only implemented for libelf. If you configure --enable-thread-safety and make check you will see several tests fail because they abort with Unexpected error: Resource deadlock avoided. I think it is mainly that nobody maintained the locks and now some are just wrongly placed. Ideally we switch --enable-thread-safety on by default, identify which locks are wrongly placed, run all tests with valgrind/hellgrind and fix any issues found. It really has not been a priority. Sorry.
No worries, its not a priority on our end either. Elfutils' codebase is significantly simpler (IMHO) than ours, so if it ever comes up we'll just submit another patch.
> > - Another implementation of #2 above might use dynamic TLS > > (pthread_key_*),> > we chose this implementation to reduce the overall complexity.> > Are there any other trade-offs?If the application spawns N threads that all use a Dwarf object (same or different) enough to cause an allocation, and then joins them all, any Dwarf objects allocated after will allocate N unusable slots in the mem_tails array. Thus long-running applications (that don't use thread pools) would start experiencing effects similar to a memory leak, of 1pointer's worth (8 bytes on 64-bit) per dead thread. The alternative is to use dynamic TLS so that only threads that are currently active use the extra memory, assuming libpthread issufficiently proactive about reclaiming unused key values. I think if we assume `dwarf_end` happens-after any memory management (which would make sense for a destructor), there should be a simple atomic pattern to handle the freeing, but I would need to sit down for a while to workout a sufficient proof. I was also under the impression that dynamic TLS was particularly expensive performance-wise, although I haven't experimented with it myself enough to know. The compiler can be a little smarter aboutstatic TLS, and the result is easier to reason about, which is why we chose this implementation for initial patch. If the alternative wouldbe preferable we can change it.I thought a bit about this one and although I am slightly worried aboutthe possible indefinite growing of the thread_ids, I don't think the"memory leak" is an issue. Concurrent usage of the Dwarf object already costs a bit more memory (since each thread gets its own memory block if they allocate at the same time) which is probably larger than any extracreated by reserving space for all possible thread_ids. This is only really a problem for a program that doesn't use thread pools and keeps opening and concurrently accessing new Dwarf objects (because at a certain point the whole Dwarf will have been read and no new allocations happen). Although it would be nice if we could somehowreset the next_id to zero in dwarf_end (), when this is the last threador Dwarf object.
The memory overhead is a little worse than that (each thread allocates its own memory blocks, period), but that would be present in both implementations. I can't think of a simple way to increase the memory efficiency past that (although I can think of some ridiculously complex ways).
I suppose it would be possible to use a sort of free-list for the IDs, although that requires a hook at thread exit (doable with PThreads, not with C11) and cleaning up would be a bit of a nightmare. At some point dynamic TLS is more robust against weird situations (and, if my thought-proof is correct, simpler).
Cheers, Mark
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |