This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()

From: Jonathon Anderson <jma14 at rice dot edu>
To: Mark Wielaard <mark at klomp dot org>
Cc: Florian Weimer <fw at deneb dot enyo dot de>, elfutils-devel at sourceware dot org, Srdan Milakovic <sm108 at rice dot edu>
Date: Mon, 28 Oct 2019 10:32:15 -0500
Subject: Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()
References: <1565983469.1826.0@smtp.mail.rice.edu> <20190824232438.GA2622@wildebeest.org> <1566695452.979.1@smtp.mail.rice.edu> <5ba06557703ee363e19488c994cbddd92ade25be.camel@klomp.org> <1566782688.5803.0@smtp.mail.rice.edu> <ba2c453b09ddac529a3aa122402380f5d7f82b2a.camel@klomp.org> <1566826627.5246.0@smtp.mail.rice.edu> <1566877968.10901.0@smtp.mail.rice.edu> <87tv7vg4l0.fsf@mid.deneb.enyo.de> <8fde453a2c570fa150aa39b0dabd0f925c7b0970.camel@klomp.org> <Mark Wielaard's message of "Sat, 26 Oct 2019 14:06:40 +0200">



On Mon, Oct 28, 2019 at 14:26, Mark Wielaard <mark@klomp.org> wrote:

On Sat, 2019-10-26 at 19:56 -0500, Jonathon Anderson wrote:
On Sun, Oct 27, 2019 at 00:50, Mark Wielaard <mark@klomp.org<mailto:mark@klomp.org>> wrote:
 >
 > I see that getconf PTHREAD_KEYS_MAX gives 1024 on my machine.
 > Is this tunable in any way?

  From what I can tell, no. A quick google search indicates as such,
 and its even #defined as 1024 on my machine.
I see, it is a hardcoded constant per architecture, but it seems every
architecture simply uses 1024. I am afraid that kind of rules out
having a pthread_key per Dwarf object. It is not that large a number.

Programs are sometimes linked against 50 till 100 shared libraries, if
they use dwz/alt-files that means the potential open Dwarf objects is
several hundred already. It wouldn't be that crazy to have all of them
open at the same time. That might not reach the limit yet, but I think
in practice you could come close to half very easily. And with split-
dwarf every CU basically turns into a Dwarf object, which can easilygo
past 1024.
> There may be other ways to handle this, I'm high-levelconsidering
 > >  potential alternatives (with more atomics, of course). The
 > > difficulty
> > is mostly in providing the same performance in thesingle-threaded
 > > case.
 > >
> > > You already have a Dwarf *. I would consider adding somesort of
 > >  > clone function which creates a shallow Dwarf * with its own
 > > embedded
 > >  > allocator or something like that.
 > >
> > The downside with this is that its an API addition, which we(the> > Dyninst + HPCToolkit projects) would need to enforce. Whichisn't a> > huge deal for us, but I will need to make a case to thoseteams to
 > > make
 > >  the shift.
 > >
> > On the upside, it does provide a very understandable semanticin the> > case of parallelism. For an API without synchronizationclauses,
 > > this
> > would put our work back into the realm of "reasonably correct"(from
 > >  "technically incorrect but works.")
 >
 > Could someone give an example of this pattern?
> I don't fully understand what is being proposed and how theinterface
 > would look exactly.

 An application would do something along these lines:

 Dwarf* dbg = dwarf_begin(...);
 Dwarf* dbg2 = dwarf_clone(dbg, ...);
 pthread_create(worker, ...);
 // ...
 dwarf_get_units(dbg, ...);
 // ...
 pthread_join(worker);
 dwarf_end(dbg);

 // worker:
 // ...
 dwarf_getabbrev(...);
 // ...
 dwarf_end(dbg2);
The idea being that dbg2 and dbg share most of the same internalstate,
 but concurrent access to said state is between Dwarfs (or
"Dwarf_Views", maybe?), and the state is cleaned up on theoriginal's
 dwarf_end. I suppose in database terms the Dwarfs are acting like
 separate "cursors" for the internal DWARF data. For this particular
 instance, the "top of stack" pointers would be in dbg and dbg2 (the
 non-shared state), while the atomic mem_tail would be part of the
 internal (shared) state.

 I'm not sure how viable implementing this sort of thing would be, it
 might end up overhauling a lot of internals, and I'm not familiar
enough with all the components of the API to know whether therewould
 be some quirks with this style.
So they would have separate lazy DWARF DIE/abbrev readers and separate
allocators? And any abbrevs read in the clone would just be thrownaway
after a dwarf_end?

Separate allocators but the same lazy DIE/abbrev readers, most likely.So a Dwarf would be split into the concurrent Dwarf_Shared andnon-concurrent Dwarf "View", maybe something like:


struct Dwarf_Shared
{
 pthread_rwlock_t rwl; /* For all currently non-concurrent internals */
 Elf *elf;
 char *debugdir;
 Dwarf *alt_dwarf;
 Elf_Data *sectiondata[IDX_last];
 bool other_byte_order;
 bool free_elf;
 int alt_fd;
 struct pubnames_s
 {
   Dwarf_Off cu_offset;
   Dwarf_Off set_start;
   unsigned int cu_header_size;
   int address_len;
 } *pubnames_sets;
 size_t pubnames_nsets;
 void *cu_tree;
 /* Dwarf_Off next_cu_offset; // Moved to Dwarf View */
 void *tu_tree;
 /* Dwarf_Off next_tu_offset; // Moved to Dwarf View */
 Dwarf_Sig8_Hash sig8_hash;
 void *split_tree;
 void *macro_ops;
 void *files_lines;
 Dwarf_Aranges *aranges;
 struct Dwarf_CFI_s *cfi;
 struct Dwarf_CU *fake_loc_cu;
 struct Dwarf_CU *fake_loclists_cu;
 struct Dwarf_CU *fake_addr_cu;
 /* pthread_key_t mem_key; // Implemented differently */
 atomic_uintptr_t shared_mem_tail;
 size_t mem_default_size;
 /* Dwarf_OOM oom_handler; // Moved to Dwarf View, maybe */
};
struct Dwarf /*View*/ {
 bool free_shared; /* If true, we handle cleaning up on dwarf_end */

struct Dwarf_Shared *shared; /* All cloned() Views share the sameShared in the back */

 Dwarf_Off next_cu_offset;
 Dwarf_Off next_tu_offset;
 struct libdw_memblock *mem_tail;
 Dwarf_OOM oom_handler;
};

So most everything is in Dwarf_Shared, and the bits that really onlymake sense when done in serial are part of the View. And thenallocations are done from a View-local stack, while everything ispushed as is now onto the Shared stack for deallocation.


In general I think this style is not as nice as having a shared state
of the Dwarf object that is only manipulated in a thread-safe manner.

I tend to prefer styles like this only because they have a clean splitto define what can and cannot be concurrent, even logically (What makessense when iterating CUs from multiple threads? Does the OOM handlerhave to be thread-safe?). Otherwise you end up trying to answer an N^2problem for how all the moving pieces interact (and it often requiresknowledge of the code to know which pieces do interact).

You did have an earlier implementation that didn't use pthread_keys.
Maybe we should just fall back to that one?

I can revive and rebase it, its on my list for doing ASAP after myother more pressing duties. It still has quirks that bother me a little(like not doing well without a thread pool), but it'll work fine.


Thanks,

Mark

References:
- Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()
  - From: Florian Weimer
- Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()
  - From: Mark Wielaard
- Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()
  - From: Mark Wielaard

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]