This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Inline mempcpy
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Wilco Dijkstra <wdijkstr at arm dot com>
- Cc: 'Joseph Myers' <joseph at codesourcery dot com>, libc-alpha at sourceware dot org, 'Carlos O'Donell' <carlos at redhat dot com>, munroesj at linux dot vnet dot ibm dot com
- Date: Sun, 24 May 2015 17:57:46 +0200
- Subject: Re: [PATCH] Inline mempcpy
- Authentication-results: sourceware.org; auth=none
- References: <A610E03AD50BFC4D95529A36D37FA55E769A83F7F7 at GEORGE dot Emea dot Arm dot com> <000201d0915f$87bb15d0$97314170$ at com> <alpine dot DEB dot 2 dot 10 dot 1505181152420 dot 4225 at digraph dot polyomino dot org dot uk> <000301d09180$ef6e9eb0$ce4bdc10$ at com> <alpine dot DEB dot 2 dot 10 dot 1505181700040 dot 20209 at digraph dot polyomino dot org dot uk> <000501d092f3$93bbc8d0$bb335a70$ at com>
On Wed, May 20, 2015 at 12:53:24PM +0100, Wilco Dijkstra wrote:
> > Joseph Myers wrote:
> > On Mon, 18 May 2015, Wilco Dijkstra wrote:
> >
> > This seems plausible, subject to getting per-architecture agreement (for
> > each architecture with mempcpy.S) on whether to define
> > _HAVE_STRING_ARCH_mempcpy. Although there may be the question of whether
> > __extern_always_inline should be defined at all for !__GNUC_PREREQ (3,2)
> > (i.e. when the always_inline attribute isn't supported).
>
> It would be good to fix the *always_inline defines, but I for now I've added
> an extra check for __GNUC_PREREQ (3,2) to be sure we don't fail to inline on
> really old GCCs. So here's the actual patch - I've disabled inlining for SPARC,
> and from previous comments it seems people prefer the inline mempcpy on x64/x86
> and PPC (I've included the maintainers for those arches to agree/veto).
>
> OK for commit?
>
> Wilco
>
Adhemerval already acked this for powerpc in this thread.
For x64 I obviously agree. I added optimized memcpy. As mempcpy I
submitted patch, then forgotten about it after about ten pings.
So on x64 on sandy bridge this will improve performance by around 50% on
larger strings as you see even in benchtest
builtin_memcpy simple_memcpy __memcpy_avx_unaligned __memcpy_ssse3_back __memcpy_ssse3 __memcpy_sse2_unaligned __memcpy_sse2
Length 448, alignment 0/ 0: 39.0625 791.812 35.9844 30.8438 34.7344 34.2344 53.5469
Length 448, alignment 28/ 0: 47.0625 791 57.9531 34.6562 43.4219 46.7344 67.4219
Length 448, alignment 0/28: 53.5781 790.766 79 37.1719 46.1875 57.1719 87.7812
Length 448, alignment 28/28: 37.5 790.719 47.2812 31.7188 35.0625 36.0781 60.9375
Length 464, alignment 0/ 0: 37.3125 817.516 37.9688 31.7188 33.7812 33.9062 56.2969
Length 464, alignment 29/ 0: 47.375 817.641 59.2812 35.4375 50.8281 46.9219 72.4219
Length 464, alignment 0/29: 56.5625 817.781 78.625 38.9688 47.5625 56.4688 96
Length 464, alignment 29/29: 39.4219 817.594 45.2656 31.3906 35.2031 36.7188 62.5469
Length 480, alignment 0/ 0: 38.2344 844.891 38 31.7188 34.2344 35.25 57.2969
Length 480, alignment 30/ 0: 48.5781 844.406 60.3438 35.9844 47.5625 47.4375 69.9531
Length 480, alignment 0/30: 55.4688 844.25 81.7031 37.2656 49.6406 55.1562 98.2188
Length 480, alignment 30/30: 37.1719 844.812 42.4219 32.5 35.4844 34.9688 65.7188
Length 496, alignment 0/ 0: 38.3281 870.906 38.9688 32.7656 36.5312 33.2656 64.25
Length 496, alignment 31/ 0: 43.8906 870.906 56.8906 35.6719 51.7969 43.75 72.1406
Length 496, alignment 0/31: 61.9531 871.094 79.1875 39.1094 47.7031 61.2188 104.359
Length 496, alignment 31/31: 42.6875 870.906 46 34.2344 34.9688 41.2656 63.7969
Length 4096, alignment 0/ 0: 244.297 6870.81 3477.34 238.188 227.344 241 457.766