This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Malloc improvements


Hi DJ,

> Hmmm... not sure why that test case is worse in my branch, the whole
> point of my work is to add a lockless fast path.  I'll have to
> investigate that some more.  Conveniently, I have a trace feature in
> there that I'm also working on ;-)

The trace function looks great, I've started playing with it on ppc64.

> But yeah, we know there's a lot of unneeded overhead in glibc's
> malloc, and that other mallocs can do better.  Glibc's malloc itself
> states that it's not trying to be the best at one task, but the best
> usable compromise.  Part of what I'm doing is understanding where
> these compromises cause significant performance problems, and seeing
> if we can find other ways to solve them without just replacing it
> with something else.

Sounds good.

> Also, of course, obligatory note... your test case is not a typical
> application.  We're trying to come up with a way of modelling real
> applications and benchmarking those, instead of relying on trivial
> test cases to "represent" the real world.  A change that is worse in
> a test case might be better for real apps, and visa-versa.

Yeah it is, and we can help gathering traces.

I just used the trace feature on omnetpp from SPECint2006 (something
glibc malloc does pretty badly at) and it shows a semi regular repeating
pattern of ~4000 small mallocs (32, 192, 224 bytes), followed by
the freeing of all of them.

One potential issue - I struggled to capture the entire run.
Even after bumping the buffer a bunch, I only traced a fraction of the
run:

749055656 out of 10000000 events captured

And the output file was almost 1GB. Having the intermediate ASCII output
is nice though, so I'm not arguing for getting rid of it. After
processing, the binary file size ended up at 30MB.

I'm not sure if I am running the tools correctly (or if I need to add
anything for ppc64 other than the rdtsc* functions), but trace_run
spends most of its time in pthread mutexes on POWER8:

Overhead  Command    Shared Object       Symbol
  33.37%  trace_run  libpthread-2.23.so  [.] __lll_unlock_elision
  24.23%  trace_run  libpthread-2.23.so  [.] __lll_lock_elision
  10.86%  trace_run  trace_run           [.] free_wipe
  10.74%  trace_run  trace_run           [.] thread_common
  10.51%  trace_run  libpthread-2.23.so  [.] pthread_mutex_lock
   2.84%  trace_run  libpthread-2.23.so  [.] pthread_mutex_unlock
   2.27%  trace_run  libc-2.23.so        [.] _int_free
   1.61%  trace_run  libc-2.23.so        [.] malloc
   1.06%  trace_run  libc-2.23.so        [.] _int_malloc

Anton


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]