This is the mail archive of the
mailing list for the libc-ports project.
Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
- From: Siddhesh Poyarekar <siddhesh at redhat dot com>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: "Carlos O'Donell" <carlos at redhat dot com>, Will Newton <will dot newton at linaro dot org>, "libc-ports at sourceware dot org" <libc-ports at sourceware dot org>, Patch Tracking <patches at linaro dot org>
- Date: Wed, 4 Sep 2013 17:15:29 +0530
- Subject: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
- Authentication-results: sourceware.org; auth=none
- References: <CANu=DmiXLL9v1Z1KS0sBOs-pL8csEUGc9YE829_-tidKd-GruQ at mail dot gmail dot com> <5220F1F0 dot 80501 at redhat dot com> <CANu=DmhA9QvSe6RS72Db2P=yyjC72fsE8d4QZKHEcNiwqxNMvw at mail dot gmail dot com> <52260BD0 dot 6090805 at redhat dot com> <20130903173710 dot GA2028 at domone dot kolej dot mff dot cuni dot cz> <522621E2 dot 6020903 at redhat dot com> <20130903185721 dot GA3876 at domone dot kolej dot mff dot cuni dot cz> <5226354D dot 8000006 at redhat dot com> <20130904073008 dot GA4306 at spoyarek dot pnq dot redhat dot com> <20130904110333 dot GA6216 at domone dot kolej dot mff dot cuni dot cz>
On Wed, Sep 04, 2013 at 01:03:33PM +0200, OndÅej BÃlka wrote:
> > 1. Assume aligned input. Nothing should take (any noticeable)
> > performance away from align copies/moves
> Not very useful as this is extremely dependant on function measured. For
> functions like strcmp and strlen alignments are mostly random so aligned
> case does not say much. On opposite end of spectrum is memset which is
> almost always 8 byte aligned and unaligned performance does not make lot
> of sense.
Agreed. So for functions like memset/memcpy/memmove we heavily favour
aligned inputs. For strlen/strchr/memchr we strive for acceptable
average case performance, i.e. less variance in performance.
> > 2. Scale with size
> Not very important for several reasons. One is that big sizes are cold
> (just look in oprofile output that loops are less frequent than header.)
> Second reason is that if we look at caller large sizes are unlikely
I did not imply that we optimize for larger sizes - I meant that as a
general principle, the algorithm should scale reasonably for larger
sizes. A quadratic algorithm is bad even if it gives acceptable
performance for smaller sizes. I would consider that a pretty
important trait to monitor in the benchmark even if we won't really
get such implementations in practice.
> > 3. Provide acceptable performance for unaligned sizes without
> > penalizing the aligned case
> This is quite important case. It should be measured correctly, what is
> important is that alignment varies. This can be slower than when you
> pick fixed alignment and alignment varies in reality.
I agree that we need to measure unaligned cases correctly.
> > 4. Measure the effect of dcache pressure on function performance
> > 5. Measure effect of icache pressure on function performance.
> Here you really need to base weigths on function usage patterns.
> A bigger code size is acceptable for functions that are called more
> often. You need to see distribution of how are calls clustered to get
> full picture. A strcmp is least sensitive to icache concerns, as when it
> is called its mostly 100 times over in tight loop so size is not big issue.
> If same number of call is uniformnly spread through program we need
> stricter criteria.
That's not necessarily true. It may be true for specific applications
but I don't think an strcmp is always called in a tight loop. Do you
have a qualitative argument to prove that statement or is it just
based on dry runs?