This is the mail archive of the
mailing list for the glibc project.
Re: Intel's new rte_memcpy()
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Luke Gorrie <luke at snabb dot co>
- Cc: "H.J. Lu" <hjl dot tools at gmail dot com>, éå(åå) <ling dot ml at alibaba-inc dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Mon, 2 Feb 2015 14:27:58 +0100
- Subject: Re: Intel's new rte_memcpy()
- Authentication-results: sourceware.org; auth=none
- References: <CAA2XHbendDcfydewf2nrpPQkSsDWPdEH0SMsnqZAFsLF9q4Fzg at mail dot gmail dot com> <CAMe9rOpELuXQLvHQLLAeZitTTcz-xeg=ROoDm0dHe-fg4m-Jew at mail dot gmail dot com> <20150131184837 dot GA3539 at domone> <CAA2XHbcvDWjrshkdyo3++PnViOs5MdOC_+8qZoEb5UXJ21F1zA at mail dot gmail dot com>
On Mon, Feb 02, 2015 at 10:00:13AM +0100, Luke Gorrie wrote:
> On 31 January 2015 at 19:48, OndÅej BÃlka <firstname.lastname@example.org> wrote:
> > On Fri, Jan 30, 2015 at 09:03:50AM -0800, H.J. Lu wrote:
> > > On Fri, Jan 30, 2015 at 5:52 AM, Luke Gorrie <email@example.com> wrote:
> > > > Should networking application developers adopt Intel's custom
> > > > implementation if (like me) they are absolutely dependent on good and
> > > > consistent performance of memcpy on all recent hardware (>= Sandy
> > > > Bridge) and Linux distributions? (and then -- what to do about
> > > > memmove?)
> > >
> > Definitely not.
> Thank you for the detailed feedback!
> Questions... :-)
> Is there a simple way that I can reproduce these benchmarks? (I am
> curious in general and I would also like to run this on the two-socket
> Xeon E5 machines that I test with.)
Download tarball I mentioned before
Then compile it with
That also prints a LD_PRELOAD=... that is used for profiling.
For profiling itself you do sequence
make reset # to clean previous profiling results
# now execute command that you want to profile
that creates result directory with graphs that I shown.
There is shortcut ./benchmark that runs benchmarks I shown earlier to
and moves them to result* directories.
> I would like to create relatively portable binaries that don't depend
> on recent glibc releases. For this purpose I am tempted to reference
> an older memcpy in my symbol table with this trick:
> __asm__(".symver memcpy,memcpy@GLIBC_2.2.5");
> Is that a reasonable idea? Is it likely to have a significant
> performance cost on some platforms? (in practice will this make memcpy
> act like memmove?)
Depends on workload, if you do lot of large copies then avx2 improvement
is significant. I cannot predict effect in general so test it.