This is the mail archive of the
mailing list for the libc-ports project.
Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
- From: "Joseph S. Myers" <joseph at codesourcery dot com>
- To: "Ryan S. Arnold" <ryan dot arnold at gmail dot com>
- Cc: Siddhesh Poyarekar <siddhesh at redhat dot com>, Carlos O'Donell <carlos at redhat dot com>, OndÅej BÃlka <neleai at seznam dot cz>, Will Newton <will dot newton at linaro dot org>, "libc-ports at sourceware dot org" <libc-ports at sourceware dot org>, Patch Tracking <patches at linaro dot org>
- Date: Thu, 5 Sep 2013 11:54:16 +0000
- Subject: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
- Authentication-results: sourceware.org; auth=none
- References: <CANu=DmiBHoymFKTvaW_VsdhWZEYwkfViz1tTeRgj7H80f0FntA at mail dot gmail dot com> <5220D30B dot 9080306 at redhat dot com> <CANu=DmiXLL9v1Z1KS0sBOs-pL8csEUGc9YE829_-tidKd-GruQ at mail dot gmail dot com> <5220F1F0 dot 80501 at redhat dot com> <CANu=DmhA9QvSe6RS72Db2P=yyjC72fsE8d4QZKHEcNiwqxNMvw at mail dot gmail dot com> <52260BD0 dot 6090805 at redhat dot com> <20130903173710 dot GA2028 at domone dot kolej dot mff dot cuni dot cz> <522621E2 dot 6020903 at redhat dot com> <20130903185721 dot GA3876 at domone dot kolej dot mff dot cuni dot cz> <5226354D dot 8000006 at redhat dot com> <20130904073008 dot GA4306 at spoyarek dot pnq dot redhat dot com> <CAAKybw87cyx67bpX=qjedrfjKxDmtgOfi_zCiaCfHGgx328Bsw at mail dot gmail dot com>
On Wed, 4 Sep 2013, Ryan S. Arnold wrote:
> Simply testing for alignment (not presuming aligned data) itself slows
> down the processing of aligned-data, but that's an unavoidable
> reality. I've chatted with some compiler folks about the possibility
> of branching directly to aligned case labels in string routines if the
> compiler is able to detect aligned data.. but was informed that this
> suggestion might get me burned at the stake.
See the ARM EABI __aeabi_mem* functions. At present the glibc versions
just wrap or alias the generic ones, so don't take advantage of the extra
alignment information (and the constraints on register clobbers also mean
plain ARM memcpy gets used for them rather than the NEON version). And
GCC doesn't detect alignment and generate calls to those functions (which
would be a pessimization anyway without glibc implementing these
functions more optimally). But the principle makes sense, subject to not
pessimizing real use cases by expanding libc code size unduly (different
entry points to functions may be better than duplicating large amounts of
code for each alignment case).
Joseph S. Myers