This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()




On Mon, Oct 28, 2019 at 14:26, Mark Wielaard <mark@klomp.org> wrote:
On Sat, 2019-10-26 at 19:56 -0500, Jonathon Anderson wrote:
On Sun, Oct 27, 2019 at 00:50, Mark Wielaard <mark@klomp.org <mailto:mark@klomp.org>> wrote:
 >
 > I see that getconf PTHREAD_KEYS_MAX gives 1024 on my machine.
 > Is this tunable in any way?

  From what I can tell, no. A quick google search indicates as such,
 and its even #defined as 1024 on my machine.

I see, it is a hardcoded constant per architecture, but it seems every
architecture simply uses 1024. I am afraid that kind of rules out
having a pthread_key per Dwarf object. It is not that large a number.

Programs are sometimes linked against 50 till 100 shared libraries, if
they use dwz/alt-files that means the potential open Dwarf objects is
several hundred already. It wouldn't be that crazy to have all of them
open at the same time. That might not reach the limit yet, but I think
in practice you could come close to half very easily. And with split-
dwarf every CU basically turns into a Dwarf object, which can easily go
past 1024.

> There may be other ways to handle this, I'm high-level considering
 > >  potential alternatives (with more atomics, of course). The
 > > difficulty
> > is mostly in providing the same performance in the single-threaded
 > > case.
 > >
> > > You already have a Dwarf *. I would consider adding some sort of
 > >  > clone function which creates a shallow Dwarf * with its own
 > > embedded
 > >  > allocator or something like that.
 > >
> > The downside with this is that its an API addition, which we (the > > Dyninst + HPCToolkit projects) would need to enforce. Which isn't a > > huge deal for us, but I will need to make a case to those teams to
 > > make
 > >  the shift.
 > >
> > On the upside, it does provide a very understandable semantic in the > > case of parallelism. For an API without synchronization clauses,
 > > this
> > would put our work back into the realm of "reasonably correct" (from
 > >  "technically incorrect but works.")
 >
 > Could someone give an example of this pattern?
> I don't fully understand what is being proposed and how the interface
 > would look exactly.

 An application would do something along these lines:

 Dwarf* dbg = dwarf_begin(...);
 Dwarf* dbg2 = dwarf_clone(dbg, ...);
 pthread_create(worker, ...);
 // ...
 dwarf_get_units(dbg, ...);
 // ...
 pthread_join(worker);
 dwarf_end(dbg);

 // worker:
 // ...
 dwarf_getabbrev(...);
 // ...
 dwarf_end(dbg2);

The idea being that dbg2 and dbg share most of the same internal state,
 but concurrent access to said state is between Dwarfs (or
"Dwarf_Views", maybe?), and the state is cleaned up on the original's
 dwarf_end. I suppose in database terms the Dwarfs are acting like
 separate "cursors" for the internal DWARF data. For this particular
 instance, the "top of stack" pointers would be in dbg and dbg2 (the
 non-shared state), while the atomic mem_tail would be part of the
 internal (shared) state.

 I'm not sure how viable implementing this sort of thing would be, it
 might end up overhauling a lot of internals, and I'm not familiar
enough with all the components of the API to know whether there would
 be some quirks with this style.

So they would have separate lazy DWARF DIE/abbrev readers and separate
allocators? And any abbrevs read in the clone would just be thrown away
after a dwarf_end?

Separate allocators but the same lazy DIE/abbrev readers, most likely. So a Dwarf would be split into the concurrent Dwarf_Shared and non-concurrent Dwarf "View", maybe something like:

struct Dwarf_Shared
{
 pthread_rwlock_t rwl; /* For all currently non-concurrent internals */
 Elf *elf;
 char *debugdir;
 Dwarf *alt_dwarf;
 Elf_Data *sectiondata[IDX_last];
 bool other_byte_order;
 bool free_elf;
 int alt_fd;
 struct pubnames_s
 {
   Dwarf_Off cu_offset;
   Dwarf_Off set_start;
   unsigned int cu_header_size;
   int address_len;
 } *pubnames_sets;
 size_t pubnames_nsets;
 void *cu_tree;
 /* Dwarf_Off next_cu_offset; // Moved to Dwarf View */
 void *tu_tree;
 /* Dwarf_Off next_tu_offset; // Moved to Dwarf View */
 Dwarf_Sig8_Hash sig8_hash;
 void *split_tree;
 void *macro_ops;
 void *files_lines;
 Dwarf_Aranges *aranges;
 struct Dwarf_CFI_s *cfi;
 struct Dwarf_CU *fake_loc_cu;
 struct Dwarf_CU *fake_loclists_cu;
 struct Dwarf_CU *fake_addr_cu;
 /* pthread_key_t mem_key; // Implemented differently */
 atomic_uintptr_t shared_mem_tail;
 size_t mem_default_size;
 /* Dwarf_OOM oom_handler; // Moved to Dwarf View, maybe */
};
struct Dwarf /*View*/ {
 bool free_shared; /* If true, we handle cleaning up on dwarf_end */
struct Dwarf_Shared *shared; /* All cloned() Views share the same Shared in the back */
 Dwarf_Off next_cu_offset;
 Dwarf_Off next_tu_offset;
 struct libdw_memblock *mem_tail;
 Dwarf_OOM oom_handler;
};

So most everything is in Dwarf_Shared, and the bits that really only make sense when done in serial are part of the View. And then allocations are done from a View-local stack, while everything is pushed as is now onto the Shared stack for deallocation.


In general I think this style is not as nice as having a shared state
of the Dwarf object that is only manipulated in a thread-safe manner.

I tend to prefer styles like this only because they have a clean split to define what can and cannot be concurrent, even logically (What makes sense when iterating CUs from multiple threads? Does the OOM handler have to be thread-safe?). Otherwise you end up trying to answer an N^2 problem for how all the moving pieces interact (and it often requires knowledge of the code to know which pieces do interact).

You did have an earlier implementation that didn't use pthread_keys.
Maybe we should just fall back to that one?

I can revive and rebase it, its on my list for doing ASAP after my other more pressing duties. It still has quirks that bother me a little (like not doing well without a thread pool), but it'll work fine.


Thanks,

Mark


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]