This is the mail archive of the libc-hacker@sources.redhat.com mailing list for the glibc project.

Note that libc-hacker is a closed list. You may look at the archives of this list, but subscription and posting are not open.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

second thoughts on using dl_iterate_phdr() for cache-validation


Doing some more careful performance analysis with libunwind, I'm
finding that the dl_iterate_phdr() call needed to verify that the
phdr-list didn't change is rather expensive.  Specifically, the time
needed to initialize an "unwind cursor" (via unw_init_local()) is as
follows:

  without dl_iterate_phdr() callback (unsafe):	140 ns
  with dl_iterate_phdr(), without -lpthread:	200 ns
  with dl_iterate_phdr(), with -lpthread:	300 ns

This is rather more than I expected and a slow-down of more than a
factor of 2 for multi-threaded apps is bit more than I'm willing to
bear since it could really affect the usability of libunwind for
things such as allocation tracking or stack-trace-based sampling.

The profile for the "without -lpthread" case looks like this:

% time   self  cumul  calls self/call tot/call name
 60.44  13.05  13.05   101M      129n     204n _ULia64_init_local
 17.19   3.71  16.76  99.6M     37.3n    66.7n dl_iterate_phdr
  4.65   1.00  17.76   101M     9.97n    9.97n rtld_lock_default_lock_recursive
  4.64   1.00  18.77  99.0M     10.1n    10.1n rtld_lock_default_unlock_recursive

The profile for the "with -lpthread" case looks like this (this was
measured on a different machine, so the total time of 223 ns is not
comparable to the 300 ns mentioned above; the relative times are fine,
though):

% time   self  cumul  calls self/call tot/call name
 47.93  11.25  11.25  99.6M      113n     223n _ULia64_init_local
 18.35   4.31  15.56   100M     43.0n    43.0n pthread_mutex_lock
 11.81   2.77  18.33   100M     27.7n    27.7n __pthread_mutex_unlock_usercnt
 11.65   2.73  21.06   100M     27.3n     103n __GI___dl_iterate_phdr

For brevity, I didn't include the call-graphs, but they are pretty
easy: all calls to dl_iterate_phdr() are indirectly due to the
cache-validation done by _ULia64_init_local() and almost all
lock-related calls are due to dl_iterate_phdr().

I suppose I could add a libunwind-hack to disable cache-validation,
but that seems like a step backward since it would make caching unsafe
again.

In case it matters, the first profile was obtained with libc v2.3.2
and the second profile was obtained with the CVS libc (as of a few
days ago).

Can this be improved?

	--david


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]