This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: inline _int_free into free

From: "Joseph S. Myers" <joseph at codesourcery dot com>
To: Tom de Vries <Tom_deVries at mentor dot com>
Cc: libc-alpha at sourceware dot org
Date: Fri, 3 Feb 2012 13:16:24 +0000 (UTC)
Subject: Re: inline _int_free into free
References: <4F258B7D.4050307@mentor.com>

On Sun, 29 Jan 2012, Tom de Vries wrote:

> The purpose of this patch is to inline the call to _int_free in 
> public_fREe, but not (necessarily) the other calls to _int_free.

Well, the *purpose* is presumably to improve allocator performance, and 
the way the patch achieves that is by this selective inlining ... so did 
some benchmarking show up the function call overhead for this call as 
being significant in allocator performance?

> this patch improves performance for the sip parser benchmark with 2 to 

What is "the sip parser benchmark"?  Unless a benchmark is something as 
famous as SPEC (among people concerned with performance in the relevant 
area - so here, allocation performance) it's a good idea to give a URL 
with more details (with a download if the benchmark is freely available) - 
and to give the version number unless there is only one version of the 
benchmark.

> 3.5% for n32 and o32 on for the xlp processor, while increasing code 
> size of malloc.o with 1512 bytes.

What about on other architectures, e.g. x86_64?  Since this is 
architecture-independent code, it's important to have evidence that it is 
an architecture-independent improvement - or if the performance 
improvements are architecture-dependent, it would be desirable to avoid 
the size increase on architectures not seeing a performance improvement.

What about other allocation benchmarks?  Allocator performance may depend 
on the allocation patterns of a particular workload, so it's important to 
test a range of cases reflecting typical GNU/Linux load (so far as such a 
concept is meaningful) to make sure a change to improve one benchmark 
doesn't make things worse on average.  I don't know what benchmarks have 
been used for the glibc allocator in the past, but there are probably 
references to such benchmarks in the libc-alpha or libc-hacker archives or 
Bugzilla; it would be useful to summarize what you find about past 
allocator benchmarking for glibc (and anything from outside the glibc 
world about what are generally considered good ways to benchmark 
allocators), as well as giving figures for performance changes on whatever 
other benchmarks seem to be relevant.

> I tested the patch on MIPS qemu (mips-linux-gnu) with host i686-pc-linux-gnu.
> On that target, code size of malloc.o increased with 1976 bytes.

I'm confused about what exactly the size increases refer to.  You say 
above 1512 bytes for "n32 and o32 on for the xlp processor" (a type of 
MIPS) - but as those are two different ABIs, I'd expect them to have two 
different figures.  You then say "1976 bytes", again for MIPS - what ABI?  
Is the difference because of optimizing for a different (unspecified) MIPS 
processor in those tests?

For a comparison, what is the total (before or after) text size of libc.so 
on the platforms for which you give these size increase figures?

Chris Metcalf's patch to support glibc builds with -Os (for at least some 
architectures) recently went into glibc.  It would seem a good idea for 
this sort of patch to avoid causing code size increases with -Os, so maybe 
parts of it should have __OPTIMIZE_SIZE__ conditionals.

-- 
Joseph S. Myers
joseph@codesourcery.com

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]