This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Intel's new rte_memcpy()
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: Luke Gorrie <luke at snabb dot co>, éå(åå) <ling dot ml at alibaba-inc dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Fri, 30 Jan 2015 09:03:50 -0800
- Subject: Re: Intel's new rte_memcpy()
- Authentication-results: sourceware.org; auth=none
- References: <CAA2XHbendDcfydewf2nrpPQkSsDWPdEH0SMsnqZAFsLF9q4Fzg at mail dot gmail dot com>
On Fri, Jan 30, 2015 at 5:52 AM, Luke Gorrie <luke@snabb.co> wrote:
> Howdy!
>
> I am hoping for some feedback and advice for me as an application developer.
>
> Intel have recently posted a couple of memcpy() implementations and
> suggested that these have significant advantages for networking
> applications. There is one for Sandy Bridge and one for Haswell. The
> proposal is that networking application developers would statically
> link one or both of these into their applications instead of
> dynamically linking with glibc. The proposal is part of their Data
> Plane Development Kit (dpdk.org).
>
> They explain it much better than I do:
> http://dpdk.org/ml/archives/dev/2014-November/008158.html
>
> and their code is here:
> https://gist.github.com/lukego/efc82a15bde5ec83cb1b
>
> My question to the list is this:
>
> Should networking application developers adopt Intel's custom
> implementation if (like me) they are absolutely dependent on good and
> consistent performance of memcpy on all recent hardware (>= Sandy
> Bridge) and Linux distributions? (and then -- what to do about
> memmove?)
>
> I have done some cursory benchmarks with cachebench:
> http://dpdk.org/ml/archives/dev/2015-January/011574.html
>
> ... with a correction to the rte_memcpy on Haswell results:
> http://dpdk.org/ml/archives/dev/2015-January/011691.html
>
I import it to hjl/memcpy branch at
https://sourceware.org/git/?p=glibc.git;a=summary
Here is the bench-memcpy comparison against __memcpy_avx_unaligned
on Haswell:
__memcpy_rte_avx __memcpy_avx_unaligned
Length 1, alignment 0/ 0: 9.64062 10.5625
Length 1, alignment 0/ 0: 9.26562 9.54688
Length 1, alignment 0/ 0: 8.75 9.45312
Length 1, alignment 0/ 0: 8.65625 9.125
Length 2, alignment 0/ 0: 10.7969 10
Length 2, alignment 1/ 0: 9.48438 8.98438
Length 2, alignment 0/ 1: 9.82812 8.89062
Length 2, alignment 1/ 1: 9.125 8.89062
Length 4, alignment 0/ 0: 11.3594 9.73438
Length 4, alignment 2/ 0: 9.64062 9.125
Length 4, alignment 0/ 2: 9.35938 8.79688
Length 4, alignment 2/ 2: 9.3125 8.20312
Length 8, alignment 0/ 0: 10.4375 9.17188
Length 8, alignment 3/ 0: 9.07812 7.59375
Length 8, alignment 0/ 3: 9.53125 7.95312
Length 8, alignment 3/ 3: 9.53125 7.95312
Length 16, alignment 0/ 0: 9.45312 10.7969
Length 16, alignment 4/ 0: 8.28125 10
Length 16, alignment 0/ 4: 8.51562 10.0469
Length 16, alignment 4/ 4: 8.09375 9.67188
Length 32, alignment 0/ 0: 7.40625 10.9375
Length 32, alignment 5/ 0: 8.32812 12.9844
Length 32, alignment 0/ 5: 7.40625 11.9219
Length 32, alignment 5/ 5: 7.21875 12
Length 64, alignment 0/ 0: 9.92188 14.0156
Length 64, alignment 6/ 0: 8.75 21.4062
Length 64, alignment 0/ 6: 8.98438 18.9531
Length 64, alignment 6/ 6: 8.46875 18.75
Length 128, alignment 0/ 0: 13.2188 24.75
Length 128, alignment 7/ 0: 13.2188 32.4844
Length 128, alignment 0/ 7: 15.7344 32.5312
Length 128, alignment 7/ 7: 13.4062 36.9531
Length 256, alignment 0/ 0: 14.9844 19.7812
Length 256, alignment 8/ 0: 17.6875 24.1094
Length 256, alignment 0/ 8: 37.7969 22.2969
Length 256, alignment 8/ 8: 17.5469 20.5781
Length 512, alignment 0/ 0: 20.6094 24.5312
Length 512, alignment 9/ 0: 22.6562 27.9219
Length 512, alignment 0/ 9: 71.6719 27.7031
Length 512, alignment 9/ 9: 30.8125 26.9062
Length 1024, alignment 0/ 0: 39.4219 36.3125
Length 1024, alignment 10/ 0: 37.6562 42.0312
Length 1024, alignment 0/10: 44.875 41.9375
Length 1024, alignment 10/10: 39.4219 43.3281
Length 2048, alignment 0/ 0: 64.6562 97.0469
Length 2048, alignment 11/ 0: 65.25 97.0781
Length 2048, alignment 0/11: 82.7969 607.281
Length 2048, alignment 11/11: 69.4844 138.047
Length 4096, alignment 0/ 0: 122.453 153.781
Length 4096, alignment 12/ 0: 158.016 181.328
Length 4096, alignment 0/12: 218.609 1104.64
Length 4096, alignment 12/12: 174.172 300.797
Length 8192, alignment 0/ 0: 243.156 275.859
Length 8192, alignment 13/ 0: 311.406 330.312
Length 8192, alignment 0/13: 394.359 1802.88
Length 8192, alignment 13/13: 289.312 532.922
Length 16384, alignment 0/ 0: 568.938 553.203
Length 16384, alignment 14/ 0: 683.812 671.844
Length 16384, alignment 0/14: 859.125 3364.7
Length 16384, alignment 14/14: 611.469 1001.5
Length 32768, alignment 0/ 0: 3704.61 3683.7
Length 32768, alignment 15/ 0: 3793.03 3845.58
Length 32768, alignment 0/15: 3776.97 5321.25
Length 32768, alignment 15/15: 3742.62 3986.92
Length 65536, alignment 0/ 0: 7831.95 7480.41
Length 65536, alignment 16/ 0: 8018.58 7784.28
Length 65536, alignment 0/16: 7914.61 10203.6
Length 65536, alignment 16/16: 7902.78 8019.75
Length 0, alignment 0/ 0: 11.8594 12.0938
Length 0, alignment 0/ 0: 10.0938 11.4062
Length 0, alignment 0/ 0: 9.5 11.3125
Length 0, alignment 0/ 0: 9.8125 11.0781
Length 1, alignment 0/ 0: 10.0938 10.2969
Length 1, alignment 1/ 0: 8.79688 9.07812
Length 1, alignment 0/ 1: 8.65625 9.17188
Length 1, alignment 1/ 1: 8.65625 9.21875
Length 2, alignment 0/ 0: 10.8438 10.2344
Length 2, alignment 2/ 0: 9.78125 8.75
Length 2, alignment 0/ 2: 9.03125 9.03125
Length 2, alignment 2/ 2: 9.03125 8.51562
Length 3, alignment 0/ 0: 9.5 9.5
Length 3, alignment 3/ 0: 8.9375 8.32812
Length 3, alignment 0/ 3: 8.98438 8.28125
Length 3, alignment 3/ 3: 8.9375 8.60938
Length 4, alignment 0/ 0: 12.7031 8.84375
Length 4, alignment 4/ 0: 10.1406 8.70312
Length 4, alignment 0/ 4: 9.40625 8.28125
Length 4, alignment 4/ 4: 9.71875 8.51562
Length 5, alignment 0/ 0: 9.59375 9.21875
Length 5, alignment 5/ 0: 9.03125 8.23438
Length 5, alignment 0/ 5: 8.70312 8.28125
Length 5, alignment 5/ 5: 8.75 8.51562
Length 6, alignment 0/ 0: 10.6562 9.03125
Length 6, alignment 6/ 0: 8.84375 8.23438
Length 6, alignment 0/ 6: 8.84375 8.28125
Length 6, alignment 6/ 6: 9.125 8.46875
Length 7, alignment 0/ 0: 10.6562 8.98438
Length 7, alignment 7/ 0: 8.70312 8.79688
Length 7, alignment 0/ 7: 8.84375 8.28125
Length 7, alignment 7/ 7: 9.07812 8.46875
Length 8, alignment 0/ 0: 11.0781 8.5625
Length 8, alignment 8/ 0: 9.07812 7.48438
Length 8, alignment 0/ 8: 9.07812 7.17188
Length 8, alignment 8/ 8: 9.07812 8.04688
Length 9, alignment 0/ 0: 10.2344 8.65625
Length 9, alignment 9/ 0: 8.84375 8.04688
Length 9, alignment 0/ 9: 8.375 7.53125
Length 9, alignment 9/ 9: 8.46875 8.04688
Length 10, alignment 0/ 0: 11.125 8.46875
Length 10, alignment 10/ 0: 9.5 7.07812
Length 10, alignment 0/10: 9.03125 7.53125
Length 10, alignment 10/10: 9.07812 7.90625
Length 11, alignment 0/ 0: 11.3594 8.375
Length 11, alignment 11/ 0: 8.79688 7.17188
Length 11, alignment 0/11: 8.79688 7.17188
Length 11, alignment 11/11: 9.71875 7.90625
Length 12, alignment 0/ 0: 10.9844 8.65625
Length 12, alignment 12/ 0: 8.79688 7.17188
Length 12, alignment 0/12: 8.89062 7.17188
Length 12, alignment 12/12: 8.10938 7.90625
Length 13, alignment 0/ 0: 10.8906 8.75
Length 13, alignment 13/ 0: 8.65625 8.04688
Length 13, alignment 0/13: 8.9375 7.21875
Length 13, alignment 13/13: 9.71875 7.45312
Length 14, alignment 0/ 0: 10.8906 8.5625
Length 14, alignment 14/ 0: 8.98438 7.625
Length 14, alignment 0/14: 8.84375 7.21875
Length 14, alignment 14/14: 8.79688 7.45312
Length 15, alignment 0/ 0: 11.3125 8.46875
Length 15, alignment 15/ 0: 9.03125 7.625
Length 15, alignment 0/15: 9.07812 7.57812
Length 15, alignment 15/15: 9.5 8.375
Length 16, alignment 0/ 0: 9.625 10.0469
Length 16, alignment 16/ 0: 6.9375 9.76562
Length 16, alignment 0/16: 6.46875 9.07812
Length 16, alignment 16/16: 8.0625 9.67188
Length 17, alignment 0/ 0: 8.75 10.0625
Length 17, alignment 17/ 0: 6.51562 9.59375
Length 17, alignment 0/17: 6.46875 9.03125
Length 17, alignment 17/17: 8.1875 9.67188
Length 18, alignment 0/ 0: 8.46875 10.0156
Length 18, alignment 18/ 0: 7.03125 9.54688
Length 18, alignment 0/18: 6.46875 9.125
Length 18, alignment 18/18: 7.92188 9.35938
Length 19, alignment 0/ 0: 8.20312 10.1406
Length 19, alignment 19/ 0: 6.51562 9.90625
Length 19, alignment 0/19: 6.46875 9.07812
Length 19, alignment 19/19: 8.09375 9.26562
Length 20, alignment 0/ 0: 8.79688 10.4219
Length 20, alignment 20/ 0: 6.51562 9.54688
Length 20, alignment 0/20: 6.51562 9.39062
Length 20, alignment 20/20: 8.1875 9.625
Length 21, alignment 0/ 0: 8.375 9.26562
Length 21, alignment 21/ 0: 7.07812 9.03125
Length 21, alignment 0/21: 7.07812 9.07812
Length 21, alignment 21/21: 8.09375 9.67188
Length 22, alignment 0/ 0: 8.28125 9.6875
Length 22, alignment 22/ 0: 6.46875 9.07812
Length 22, alignment 0/22: 6.46875 9.90625
Length 22, alignment 22/22: 8.09375 9.67188
Length 23, alignment 0/ 0: 8.375 10.2344
Length 23, alignment 23/ 0: 7.34375 9.95312
Length 23, alignment 0/23: 6.46875 9.125
Length 23, alignment 23/23: 8.28125 9.26562
Length 24, alignment 0/ 0: 8.89062 9.73438
Length 24, alignment 24/ 0: 6.46875 9.59375
Length 24, alignment 0/24: 6.42188 9.07812
Length 24, alignment 24/24: 8.1875 9.26562
Length 25, alignment 0/ 0: 8.15625 10.4219
Length 25, alignment 25/ 0: 6.46875 10.2344
Length 25, alignment 0/25: 6.42188 9.5
Length 25, alignment 25/25: 8.23438 10.1406
Length 26, alignment 0/ 0: 8.5625 9.64062
Length 26, alignment 26/ 0: 6.42188 9.90625
Length 26, alignment 0/26: 6.46875 9.4375
Length 26, alignment 26/26: 8.14062 9.21875
Length 27, alignment 0/ 0: 9.25 9.82812
Length 27, alignment 27/ 0: 6.5625 9.59375
Length 27, alignment 0/27: 6.51562 9.07812
Length 27, alignment 27/27: 8.09375 9.625
Length 28, alignment 0/ 0: 8.5625 9.59375
Length 28, alignment 28/ 0: 6.89062 9.5
Length 28, alignment 0/28: 6.46875 9.53125
Length 28, alignment 28/28: 7.73438 9.71875
Length 29, alignment 0/ 0: 8.375 10.375
Length 29, alignment 29/ 0: 6.46875 9.90625
Length 29, alignment 0/29: 6.42188 9.95312
Length 29, alignment 29/29: 8.04688 9.54688
Length 30, alignment 0/ 0: 8.5625 9.78125
Length 30, alignment 30/ 0: 7.03125 9.125
Length 30, alignment 0/30: 6.46875 9.125
Length 30, alignment 30/30: 7.78125 9.59375
Length 31, alignment 0/ 0: 8.60938 9.78125
Length 31, alignment 31/ 0: 7.03125 9.54688
Length 31, alignment 0/31: 6.51562 9.125
Length 31, alignment 31/31: 8.23438 9.3125
Length 48, alignment 0/ 0: 10.8906 10.2969
Length 48, alignment 3/ 0: 9.48438 11.0312
Length 48, alignment 0/ 3: 8.84375 11.0312
Length 48, alignment 3/ 3: 8.65625 10.4688
Length 80, alignment 0/ 0: 16.8906 13.9219
Length 80, alignment 5/ 0: 14.1875 22.2969
Length 80, alignment 0/ 5: 21.1719 18.4375
Length 80, alignment 5/ 5: 17.5469 15.6875
Length 96, alignment 0/ 0: 12.1406 13.6719
Length 96, alignment 6/ 0: 12.0625 21.3594
Length 96, alignment 0/ 6: 14.6562 19.0781
Length 96, alignment 6/ 6: 12.1406 19.3594
Length 112, alignment 0/ 0: 12.8438 12.75
Length 112, alignment 7/ 0: 14.2812 18.7969
Length 112, alignment 0/ 7: 17.125 17.875
Length 112, alignment 7/ 7: 14 17.3594
Length 144, alignment 0/ 0: 15.0312 25.2812
Length 144, alignment 9/ 0: 15.3125 32.5781
Length 144, alignment 0/ 9: 16.75 30.7188
Length 144, alignment 9/ 9: 15.5938 30.7188
Length 160, alignment 0/ 0: 12.8438 23.9688
Length 160, alignment 10/ 0: 12.9844 30.1562
Length 160, alignment 0/10: 20.6562 32.5312
Length 160, alignment 10/10: 13.0312 34.3438
Length 176, alignment 0/ 0: 14.1094 23.7812
Length 176, alignment 11/ 0: 16.5781 29.3281
Length 176, alignment 0/11: 25.5625 31.3281
Length 176, alignment 11/11: 16.0625 30.4844
Length 192, alignment 0/ 0: 14.8906 22.3438
Length 192, alignment 12/ 0: 17.7344 30.4375
Length 192, alignment 0/12: 25.4688 29.8281
Length 192, alignment 12/12: 14.8906 29.4219
Length 208, alignment 0/ 0: 15.875 21.4062
Length 208, alignment 13/ 0: 19.1719 29.9688
Length 208, alignment 0/13: 20.25 29.6875
Length 208, alignment 13/13: 17.1719 26.9844
Length 224, alignment 0/ 0: 14.9844 20.6562
Length 224, alignment 14/ 0: 16.0625 28.625
Length 224, alignment 0/14: 33.1875 27.375
Length 224, alignment 14/14: 13.7344 29.375
Length 240, alignment 0/ 0: 15.7344 19.4062
Length 240, alignment 15/ 0: 21.5 29.75
Length 240, alignment 0/15: 40.0625 27.6406
Length 240, alignment 15/15: 17.7812 24.25
Length 272, alignment 0/ 0: 17.2656 19.4531
Length 272, alignment 17/ 0: 20.2031 23.0938
Length 272, alignment 0/17: 20.2969 30.5781
Length 272, alignment 17/17: 19.0781 28.4844
Length 288, alignment 0/ 0: 14.25 24.5781
Length 288, alignment 18/ 0: 19.6875 31.1406
Length 288, alignment 0/18: 22.1562 28.5312
Length 288, alignment 18/18: 17.2188 26.4375
Length 304, alignment 0/ 0: 16.8906 23.7812
Length 304, alignment 19/ 0: 19.2656 30.3438
Length 304, alignment 0/19: 43.375 28.4844
Length 304, alignment 19/19: 20.5781 25.7812
Length 320, alignment 0/ 0: 18 23.5938
Length 320, alignment 20/ 0: 19.5469 30.0312
Length 320, alignment 0/20: 43.6562 27.2812
Length 320, alignment 20/20: 20.3906 25.0469
Length 336, alignment 0/ 0: 19.3594 22.5781
Length 336, alignment 21/ 0: 21.0312 28.8594
Length 336, alignment 0/21: 50.9688 26.3438
Length 336, alignment 21/21: 31.1406 24.9531
Length 352, alignment 0/ 0: 15.5469 21.9219
Length 352, alignment 22/ 0: 20.2031 29.0312
Length 352, alignment 0/22: 51.5781 27
Length 352, alignment 22/22: 20.5312 25.125
Length 368, alignment 0/ 0: 18.2969 21.5469
Length 368, alignment 23/ 0: 20.7188 25.375
Length 368, alignment 0/23: 56.3281 24.5781
Length 368, alignment 23/23: 24.0625 22.0625
Length 384, alignment 0/ 0: 18.0156 21.3594
Length 384, alignment 24/ 0: 21.8281 25.875
Length 384, alignment 0/24: 49.8438 24.25
Length 384, alignment 24/24: 24.1094 22.2031
Length 400, alignment 0/ 0: 20.3281 20.1094
Length 400, alignment 25/ 0: 22.4219 23.8281
Length 400, alignment 0/25: 51.0938 32.25
Length 400, alignment 25/25: 30.4844 32.7656
Length 416, alignment 0/ 0: 16.7969 25.875
Length 416, alignment 26/ 0: 21.7812 32.3438
Length 416, alignment 0/26: 52.2188 30.5312
Length 416, alignment 26/26: 24.5312 32.6719
Length 432, alignment 0/ 0: 19.5938 25.5938
Length 432, alignment 27/ 0: 22.7656 34.2031
Length 432, alignment 0/27: 67.5312 30.2031
Length 432, alignment 27/27: 26.8594 29.6406
Length 448, alignment 0/ 0: 18.625 24.5312
Length 448, alignment 28/ 0: 23.125 31.5156
Length 448, alignment 0/28: 66.6094 29.0781
Length 448, alignment 28/28: 27.0938 27.6562
Length 464, alignment 0/ 0: 21.0469 24.3438
Length 464, alignment 29/ 0: 22.0625 31.2188
Length 464, alignment 0/29: 63.7656 28.5781
Length 464, alignment 29/29: 36.5781 29.5
Length 480, alignment 0/ 0: 17.6875 24.1094
Length 480, alignment 30/ 0: 21.3125 31.1875
Length 480, alignment 0/30: 68.1875 28.2969
Length 480, alignment 30/30: 27.875 28.8594
Length 496, alignment 0/ 0: 21.2812 24.0625
Length 496, alignment 31/ 0: 22.1562 28.0625
Length 496, alignment 0/31: 72.2344 26.625
Length 496, alignment 31/31: 31.0469 27.6875
Length 4096, alignment 0/ 0: 123.391 154.516
__memcpy_rte_avx is faster in most cases.
--
H.J.