[PATCH] libdw: add thread-safety to dwarf_getabbrev()

Jonathon Anderson jma14@rice.edu
Mon Oct 28 15:32:00 GMT 2019



On Mon, Oct 28, 2019 at 14:26, Mark Wielaard <mark@klomp.org> wrote:
> On Sat, 2019-10-26 at 19:56 -0500, Jonathon Anderson wrote:
>>  On Sun, Oct 27, 2019 at 00:50, Mark Wielaard <mark@klomp.org 
>> <mailto:mark@klomp.org>> wrote:
>>  >
>>  > I see that getconf PTHREAD_KEYS_MAX gives 1024 on my machine.
>>  > Is this tunable in any way?
>> 
>>   From what I can tell, no. A quick google search indicates as such,
>>  and its even #defined as 1024 on my machine.
> 
> I see, it is a hardcoded constant per architecture, but it seems every
> architecture simply uses 1024. I am afraid that kind of rules out
> having a pthread_key per Dwarf object. It is not that large a number.
> 
> Programs are sometimes linked against 50 till 100 shared libraries, if
> they use dwz/alt-files that means the potential open Dwarf objects is
> several hundred already. It wouldn't be that crazy to have all of them
> open at the same time. That might not reach the limit yet, but I think
> in practice you could come close to half very easily. And with split-
> dwarf every CU basically turns into a Dwarf object, which can easily 
> go
> past 1024.
> 
>>  >  There may be other ways to handle this, I'm high-level 
>> considering
>>  > >  potential alternatives (with more atomics, of course). The
>>  > > difficulty
>>  > >  is mostly in providing the same performance in the 
>> single-threaded
>>  > > case.
>>  > >
>>  > >  > You already have a Dwarf *.  I would consider adding some 
>> sort of
>>  > >  > clone function which creates a shallow Dwarf * with its own
>>  > > embedded
>>  > >  > allocator or something like that.
>>  > >
>>  > >  The downside with this is that its an API addition, which we 
>> (the
>>  > >  Dyninst + HPCToolkit projects) would need to enforce. Which 
>> isn't a
>>  > >  huge deal for us, but I will need to make a case to those 
>> teams to
>>  > > make
>>  > >  the shift.
>>  > >
>>  > >  On the upside, it does provide a very understandable semantic 
>> in the
>>  > >  case of parallelism. For an API without synchronization 
>> clauses,
>>  > > this
>>  > >  would put our work back into the realm of "reasonably correct" 
>> (from
>>  > >  "technically incorrect but works.")
>>  >
>>  > Could someone give an example of this pattern?
>>  > I don't fully understand what is being proposed and how the 
>> interface
>>  > would look exactly.
>> 
>>  An application would do something along these lines:
>> 
>>  Dwarf* dbg = dwarf_begin(...);
>>  Dwarf* dbg2 = dwarf_clone(dbg, ...);
>>  pthread_create(worker, ...);
>>  // ...
>>  dwarf_get_units(dbg, ...);
>>  // ...
>>  pthread_join(worker);
>>  dwarf_end(dbg);
>> 
>>  // worker:
>>  // ...
>>  dwarf_getabbrev(...);
>>  // ...
>>  dwarf_end(dbg2);
>> 
>>  The idea being that dbg2 and dbg share most of the same internal 
>> state,
>>  but concurrent access to said state is between Dwarfs (or
>>  "Dwarf_Views", maybe?), and the state is cleaned up on the 
>> original's
>>  dwarf_end. I suppose in database terms the Dwarfs are acting like
>>  separate "cursors" for the internal DWARF data. For this particular
>>  instance, the "top of stack" pointers would be in dbg and dbg2 (the
>>  non-shared state), while the atomic mem_tail would be part of the
>>  internal (shared) state.
>> 
>>  I'm not sure how viable implementing this sort of thing would be, it
>>  might end up overhauling a lot of internals, and I'm not familiar
>>  enough with all the components of the API to know whether there 
>> would
>>  be some quirks with this style.
> 
> So they would have separate lazy DWARF DIE/abbrev readers and separate
> allocators? And any abbrevs read in the clone would just be thrown 
> away
> after a dwarf_end?

Separate allocators but the same lazy DIE/abbrev readers, most likely. 
So a Dwarf would be split into the concurrent Dwarf_Shared and 
non-concurrent Dwarf "View", maybe something like:

struct Dwarf_Shared
{
  pthread_rwlock_t rwl; /* For all currently non-concurrent internals */
  Elf *elf;
  char *debugdir;
  Dwarf *alt_dwarf;
  Elf_Data *sectiondata[IDX_last];
  bool other_byte_order;
  bool free_elf;
  int alt_fd;
  struct pubnames_s
  {
    Dwarf_Off cu_offset;
    Dwarf_Off set_start;
    unsigned int cu_header_size;
    int address_len;
  } *pubnames_sets;
  size_t pubnames_nsets;
  void *cu_tree;
  /* Dwarf_Off next_cu_offset; // Moved to Dwarf View */
  void *tu_tree;
  /* Dwarf_Off next_tu_offset; // Moved to Dwarf View */
  Dwarf_Sig8_Hash sig8_hash;
  void *split_tree;
  void *macro_ops;
  void *files_lines;
  Dwarf_Aranges *aranges;
  struct Dwarf_CFI_s *cfi;
  struct Dwarf_CU *fake_loc_cu;
  struct Dwarf_CU *fake_loclists_cu;
  struct Dwarf_CU *fake_addr_cu;
  /* pthread_key_t mem_key; // Implemented differently */
  atomic_uintptr_t shared_mem_tail;
  size_t mem_default_size;
  /* Dwarf_OOM oom_handler; // Moved to Dwarf View, maybe */
};
struct Dwarf /*View*/ {
  bool free_shared; /* If true, we handle cleaning up on dwarf_end */
  struct Dwarf_Shared *shared; /* All cloned() Views share the same 
Shared in the back */
  Dwarf_Off next_cu_offset;
  Dwarf_Off next_tu_offset;
  struct libdw_memblock *mem_tail;
  Dwarf_OOM oom_handler;
};

So most everything is in Dwarf_Shared, and the bits that really only 
make sense when done in serial are part of the View. And then 
allocations are done from a View-local stack, while everything is 
pushed as is now onto the Shared stack for deallocation.

> 
> In general I think this style is not as nice as having a shared state
> of the Dwarf object that is only manipulated in a thread-safe manner.

I tend to prefer styles like this only because they have a clean split 
to define what can and cannot be concurrent, even logically (What makes 
sense when iterating CUs from multiple threads? Does the OOM handler 
have to be thread-safe?). Otherwise you end up trying to answer an N^2 
problem for how all the moving pieces interact (and it often requires 
knowledge of the code to know which pieces do interact).

> You did have an earlier implementation that didn't use pthread_keys.
> Maybe we should just fall back to that one?

I can revive and rebase it, its on my list for doing ASAP after my 
other more pressing duties. It still has quirks that bother me a little 
(like not doing well without a thread pool), but it'll work fine.

> 
> Thanks,
> 
> Mark



More information about the Elfutils-devel mailing list