This is the mail archive of the mailing list for the libc-ports project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.

On Wed, Sep 04, 2013 at 12:37:33PM -0500, Ryan S. Arnold wrote:
> On Wed, Sep 4, 2013 at 6:03 AM, OndÅej BÃlka <> wrote:
> > On Wed, Sep 04, 2013 at 01:00:09PM +0530, Siddhesh Poyarekar wrote:
> >> 4. Measure the effect of dcache pressure on function performance
> >> 5. Measure effect of icache pressure on function performance.
> >>
> > Here you really need to base weigths on function usage patterns.
> > A bigger code size is acceptable for functions that are called more
> > often. You need to see distribution of how are calls clustered to get
> > full picture. A strcmp is least sensitive to icache concerns, as when it
> > is called its mostly 100 times over in tight loop so size is not big issue.
> > If same number of call is uniformnly spread through program we need
> > stricter criteria.
> Icache pressure is probably one of the more difficult things to
> measure with a benchmark.  I suppose it'd be easier with a pipeline
> analyzer.
> Can you explain how usage pattern analysis might reveal icache pressure?
With profiler its simple, I profiled firefox a while, results are here:

Now when you look to 'Delays between calls' graph you will see peak
which is likely caused by strcmp being called in loop.

>From graph about 2/3 of calls happen in less than 128 cycles since last
one. As there is limited number of cache lines that you can access in
128 cycles per call impact is smaller.

> I'm not sure how useful 'usage pattern' are when considering dcache
> pressure.  On Power we have data-cache prefetch instructions and since
> we know that dcache pressure is a reality, we will prefetch if our
> data sizes are large enough to out-weigh the overhead of prefetching,
> e.g., when the data size exceeds the cacheline size.
Very useful as overhead of prefetching is determined that this quantity. 
You can have two applications that often call memset with size 16000.

First one uses memset to refresh one static array which is entirely in
L1 cache and prefetching is harmful.

Second one does random access of 1GB of memory and prefetching would

Swithching to prefetching when you exceed cache size has advantage of
certainty that is will help.
Real treshold is lower as it is unlikely that large array got as
argument is only thing that occupies cache.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]