On Sat, 2019-10-26 at 19:56 -0500, Jonathon Anderson wrote:
On Sun, Oct 27, 2019 at 00:50, Mark Wielaard <mark@klomp.org
<mailto:mark@klomp.org>> wrote:
>
> I see that getconf PTHREAD_KEYS_MAX gives 1024 on my machine.
> Is this tunable in any way?
From what I can tell, no. A quick google search indicates as such,
and its even #defined as 1024 on my machine.
I see, it is a hardcoded constant per architecture, but it seems every
architecture simply uses 1024. I am afraid that kind of rules out
having a pthread_key per Dwarf object. It is not that large a number.
Programs are sometimes linked against 50 till 100 shared libraries, if
they use dwz/alt-files that means the potential open Dwarf objects is
several hundred already. It wouldn't be that crazy to have all of them
open at the same time. That might not reach the limit yet, but I think
in practice you could come close to half very easily. And with split-
dwarf every CU basically turns into a Dwarf object, which can easily
go
past 1024.
> There may be other ways to handle this, I'm high-level
considering
> > potential alternatives (with more atomics, of course). The
> > difficulty
> > is mostly in providing the same performance in the
single-threaded
> > case.
> >
> > > You already have a Dwarf *. I would consider adding some
sort of
> > > clone function which creates a shallow Dwarf * with its own
> > embedded
> > > allocator or something like that.
> >
> > The downside with this is that its an API addition, which we
(the
> > Dyninst + HPCToolkit projects) would need to enforce. Which
isn't a
> > huge deal for us, but I will need to make a case to those
teams to
> > make
> > the shift.
> >
> > On the upside, it does provide a very understandable semantic
in the
> > case of parallelism. For an API without synchronization
clauses,
> > this
> > would put our work back into the realm of "reasonably correct"
(from
> > "technically incorrect but works.")
>
> Could someone give an example of this pattern?
> I don't fully understand what is being proposed and how the
interface
> would look exactly.
An application would do something along these lines:
Dwarf* dbg = dwarf_begin(...);
Dwarf* dbg2 = dwarf_clone(dbg, ...);
pthread_create(worker, ...);
// ...
dwarf_get_units(dbg, ...);
// ...
pthread_join(worker);
dwarf_end(dbg);
// worker:
// ...
dwarf_getabbrev(...);
// ...
dwarf_end(dbg2);
The idea being that dbg2 and dbg share most of the same internal
state,
but concurrent access to said state is between Dwarfs (or
"Dwarf_Views", maybe?), and the state is cleaned up on the
original's
dwarf_end. I suppose in database terms the Dwarfs are acting like
separate "cursors" for the internal DWARF data. For this particular
instance, the "top of stack" pointers would be in dbg and dbg2 (the
non-shared state), while the atomic mem_tail would be part of the
internal (shared) state.
I'm not sure how viable implementing this sort of thing would be, it
might end up overhauling a lot of internals, and I'm not familiar
enough with all the components of the API to know whether there
would
be some quirks with this style.
So they would have separate lazy DWARF DIE/abbrev readers and separate
allocators? And any abbrevs read in the clone would just be thrown
away
after a dwarf_end?