This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction

From: Ling Ma <ling dot ma dot program at gmail dot com>
To: Andreas Jaeger <aj at suse dot com>
Cc: Ondřej Bílka <neleai at seznam dot cz>, Nix <nix at esperi dot org dot uk>, libc-alpha at sourceware dot org, hongjiu dot lu at intel dot com, ling dot ml at alibaba-inc dot com
Date: Mon, 10 Jun 2013 21:28:30 +0800
Subject: Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
References: <CAOGi=dMiD=_Qf1EJ=F3hfyQDtQubDEC5pjpXKDCHrUQwhr=vzg at mail dot gmail dot com> <20130605161954 dot GA26401 at domone dot kolej dot mff dot cuni dot cz> <CAOGi=dPWPaX5prcL-uAaqS6=_ehzKeBmAFMdwV6aU34jZ0eHtQ at mail dot gmail dot com> <20130606125511 dot GA28565 at domone dot kolej dot mff dot cuni dot cz> <CAOGi=dPs9geCtrWhU1L_0DEfOWOknpzFSLmYs4gbYzGX8Zn5Hg at mail dot gmail dot com> <20130607104613 dot GA6343 at domone dot kolej dot mff dot cuni dot cz> <8761xqru5w dot fsf at spindle dot srvr dot nix> <CAOGi=dMV5jaS2597cksd0mW84UDd06SovsBkL5=WPez-jZWg4g at mail dot gmail dot com> <20130607160749 dot GA28961 at domone dot kolej dot mff dot cuni dot cz> <CAOGi=dP2s4k2rg8TdKwj6V9-VzbOORGzeBmh-G=Fr1eM_OyDoA at mail dot gmail dot com> <20130607184550 dot GA9683 at domone dot kolej dot mff dot cuni dot cz> <CAOGi=dN2TMG8wO97Wd1qFBUOZ7LrjTO21qP-fCPty6Mp3aOcHw at mail dot gmail dot com> <51B576CC dot 5000001 at suse dot com>

CPU2006 benchmark is very hard to improve so that the above 5%
improvement for single core may become the goal of next generation
CPU, and the improvement number is much less for benchmark specjbb. We
hardly accept above 1% improvement of those industry benchmarks only
for optimized memcpy_avx2 even though it is the fastest.

we presented the results because of 2 reasons:
1) Haswell CPU has full capability of handling indirect jump
instruction in memmcpy_avx2 in real-world scenario.
2)if we continue to test the benchmark for more times, we will find
which is better. For example we can test memcpy_avx2, memcpy_new over
3 times respectively , if we find which has more times of better
results, although the difference is very small, the stable results can
give us the right answer.

Thanks
Ling

2013/6/10, Andreas Jaeger <aj@suse.com>:
> On 06/10/2013 08:17 AM, Ling Ma wrote:
>> Last week, we separated 403.gcc from cpu2006 benchmark and compiled
>> with additional option -mstringop-strategy=libcall to avoid rep_4byte,
>> rep_8byte, rep_byte that use rep movs instructions. 403.gcc has plenty
>> of branch instructions, and is very sensitive for branch prediction
>> miss rate. Currently we are concerning about whether memcpy_avx2 cause
>> more branch prediction miss over benefit from it in real world
>> scenario, so 403.gcc will help us to verify it.
>>
>> We tested 403.gcc linked with memcpy_new, 403.gcc linked with
>> memcpy_avx2 for 3 times respectively:
>>
>> 403.gcc for memcpy_new results are below: (bigger and better)
>> 1) 67.63718
>> 2) 66.899156
>> 3) 66.982456
>>
>> 403.gcc for memcpy_avx2 results are below:
>>
>> 1) 66.805236
>> 2) 67.29362
>> 3) 67.63718
>>
>> Above comparison results indicate memcpy_avx2 seem to be better,
>> and we would like to do more experiments.
>
>
> If I take the arithmetic mean of these I get:
> 67.17293066666666666666 vs 67.24534866666666666666
>
> That's far less than 1 percent - so not conclusive at all,
>
> Andreas
> --
>   Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi
>    SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
>     GF: Jeff Hawn,Jennifer Guild,Felix Imendörffer,HRB16746 (AG Nürnberg)
>      GPG fingerprint = 93A3 365E CE47 B889 DF7F  FED1 389A 563C C272 A126
>

Follow-Ups:
- Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: OndÅej BÃlka

References:
- Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: Ling Ma
- Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: OndÅej BÃlka
- Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: Ling Ma
- Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: OndÅej BÃlka
- Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: Ling Ma
- Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: OndÅej BÃlka
- Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: Nix
- Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: OndÅej BÃlka
- Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: Ling Ma
- Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: OndÅej BÃlka
- Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: Ling Ma
- Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: Andreas Jaeger

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]