This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Intel's new rte_memcpy()

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Luke Gorrie <luke at snabb dot co>
Cc: "H.J. Lu" <hjl dot tools at gmail dot com>, éå(åå) <ling dot ml at alibaba-inc dot com>, GNU C Library <libc-alpha at sourceware dot org>
Date: Mon, 2 Feb 2015 14:27:58 +0100
Subject: Re: Intel's new rte_memcpy()
Authentication-results: sourceware.org; auth=none
References: <CAA2XHbendDcfydewf2nrpPQkSsDWPdEH0SMsnqZAFsLF9q4Fzg at mail dot gmail dot com> <CAMe9rOpELuXQLvHQLLAeZitTTcz-xeg=ROoDm0dHe-fg4m-Jew at mail dot gmail dot com> <20150131184837 dot GA3539 at domone> <CAA2XHbcvDWjrshkdyo3++PnViOs5MdOC_+8qZoEb5UXJ21F1zA at mail dot gmail dot com>

On Mon, Feb 02, 2015 at 10:00:13AM +0100, Luke Gorrie wrote:
> On 31 January 2015 at 19:48, OndÅej BÃlka <neleai@seznam.cz> wrote:
> >
> > On Fri, Jan 30, 2015 at 09:03:50AM -0800, H.J. Lu wrote:
> > > On Fri, Jan 30, 2015 at 5:52 AM, Luke Gorrie <luke@snabb.co> wrote:
> > > > Should networking application developers adopt Intel's custom
> > > > implementation if (like me) they are absolutely dependent on good and
> > > > consistent performance of memcpy on all recent hardware (>= Sandy
> > > > Bridge) and Linux distributions? (and then -- what to do about
> > > > memmove?)
> > >
> > Definitely not.
> 
> 
> Thank you for the detailed feedback!
> 
> Questions... :-)
> 
> Is there a simple way that I can reproduce these benchmarks? (I am
> curious in general and I would also like to run this on the two-socket
> Xeon E5 machines that I test with.)
>
Download tarball I mentioned before

http://kam.mff.cuni.cz/~ondra/benchmark_string/memcpy_profile310115.tar.bz2

Then compile it with
make

That also prints a LD_PRELOAD=... that is used for profiling.

For profiling itself you do sequence

make reset # to clean previous profiling results
LD_PRELOAD=... bash
# now execute command that you want to profile
make rep

that creates result directory with graphs that I shown.

There is shortcut ./benchmark that runs benchmarks I shown earlier to
and moves them to result* directories.

> I would like to create relatively portable binaries that don't depend
> on recent glibc releases. For this purpose I am tempted to reference
> an older memcpy in my symbol table with this trick:
> 
> __asm__(".symver memcpy,memcpy@GLIBC_2.2.5");
> 
> Is that a reasonable idea? Is it likely to have a significant
> performance cost on some platforms? (in practice will this make memcpy
> act like memmove?)
> 
Depends on workload, if you do lot of large copies then avx2 improvement
is significant. I cannot predict effect in general so test it.

Follow-Ups:
- Re: Intel's new rte_memcpy()
  - From: Luke Gorrie

References:
- Re: Intel's new rte_memcpy()
  - From: Luke Gorrie

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]