This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: inline _int_free into free
- From: "Joseph S. Myers" <joseph at codesourcery dot com>
- To: Tom de Vries <Tom_deVries at mentor dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Fri, 3 Feb 2012 13:16:24 +0000 (UTC)
- Subject: Re: inline _int_free into free
- References: <4F258B7D.4050307@mentor.com>
On Sun, 29 Jan 2012, Tom de Vries wrote:
> The purpose of this patch is to inline the call to _int_free in
> public_fREe, but not (necessarily) the other calls to _int_free.
Well, the *purpose* is presumably to improve allocator performance, and
the way the patch achieves that is by this selective inlining ... so did
some benchmarking show up the function call overhead for this call as
being significant in allocator performance?
> this patch improves performance for the sip parser benchmark with 2 to
What is "the sip parser benchmark"? Unless a benchmark is something as
famous as SPEC (among people concerned with performance in the relevant
area - so here, allocation performance) it's a good idea to give a URL
with more details (with a download if the benchmark is freely available) -
and to give the version number unless there is only one version of the
benchmark.
> 3.5% for n32 and o32 on for the xlp processor, while increasing code
> size of malloc.o with 1512 bytes.
What about on other architectures, e.g. x86_64? Since this is
architecture-independent code, it's important to have evidence that it is
an architecture-independent improvement - or if the performance
improvements are architecture-dependent, it would be desirable to avoid
the size increase on architectures not seeing a performance improvement.
What about other allocation benchmarks? Allocator performance may depend
on the allocation patterns of a particular workload, so it's important to
test a range of cases reflecting typical GNU/Linux load (so far as such a
concept is meaningful) to make sure a change to improve one benchmark
doesn't make things worse on average. I don't know what benchmarks have
been used for the glibc allocator in the past, but there are probably
references to such benchmarks in the libc-alpha or libc-hacker archives or
Bugzilla; it would be useful to summarize what you find about past
allocator benchmarking for glibc (and anything from outside the glibc
world about what are generally considered good ways to benchmark
allocators), as well as giving figures for performance changes on whatever
other benchmarks seem to be relevant.
> I tested the patch on MIPS qemu (mips-linux-gnu) with host i686-pc-linux-gnu.
> On that target, code size of malloc.o increased with 1976 bytes.
I'm confused about what exactly the size increases refer to. You say
above 1512 bytes for "n32 and o32 on for the xlp processor" (a type of
MIPS) - but as those are two different ABIs, I'd expect them to have two
different figures. You then say "1976 bytes", again for MIPS - what ABI?
Is the difference because of optimizing for a different (unspecified) MIPS
processor in those tests?
For a comparison, what is the total (before or after) text size of libc.so
on the platforms for which you give these size increase figures?
Chris Metcalf's patch to support glibc builds with -Os (for at least some
architectures) recently went into glibc. It would seem a good idea for
this sort of patch to avoid causing code size increases with -Os, so maybe
parts of it should have __OPTIMIZE_SIZE__ conditionals.
--
Joseph S. Myers
joseph@codesourcery.com