This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()


Hello Florian Weimer,

I'm the original author of this patch, so I'll try to answer what I can.

For some overall perspective, this patch replaces the original libdw allocator with a thread-safe variant. The original acts both as a suballocator (to keep from paying the malloc tax on frequent small allocations) and a garbage collection list (to free internal structures on dwarf_end). The patch attempts to replicate the same overall behavior in the more volatile parallel case.

On Sat, Oct 26, 2019 at 18:14, Florian Weimer <fw@deneb.enyo.de> wrote:
* Mark Wielaard:

 I'll see if I can create a case where that is a problem. Then we can
see how to adjust things to use less pthread_keys. Is there a different
 pattern we can use?

It's unclear what purpose thread-local storage serves in this context.

The thread-local storage provides the suballocator side: for each Dwarf, each thread has its own "top block" to perform allocations from. To make this simple, each Dwarf has a key to give threads local storage specific to that Dwarf. Or at least that was the intent, I didn't think to consider the limit, we didn't run into it in our use cases.

There may be other ways to handle this, I'm high-level considering potential alternatives (with more atomics, of course). The difficulty is mostly in providing the same performance in the single-threaded case.

You already have a Dwarf *.  I would consider adding some sort of
clone function which creates a shallow Dwarf * with its own embedded
allocator or something like that.

The downside with this is that its an API addition, which we (the Dyninst + HPCToolkit projects) would need to enforce. Which isn't a huge deal for us, but I will need to make a case to those teams to make the shift.

On the upside, it does provide a very understandable semantic in the case of parallelism. For an API without synchronization clauses, this would put our work back into the realm of "reasonably correct" (from "technically incorrect but works.")

This assumes that memory allocation
is actually a performance problem, otherwise you could let malloc
handle the details.

In our (Dyninst + HPCToolkit) work, we have found that malloc tends to be slow in the multithreaded case, in particular with many small allocations. The glibc implementation (which most of our clients use) uses a full mutex to provide thread-safety. While we could do a lot better in our own projects with regards to memory management, the fact remains that malloc alone is a notable facet to the performance of libdw.

Hopefully this helps give a little light on the issue.

-Jonathon













Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]