This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[PATCH 0/7] [BZ #19776] Improve x86-64 memcpy-sse2-unaligned.S
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: libc-alpha at sourceware dot org
- Cc: Ondrej Bilka <neleai at seznam dot cz>
- Date: Mon, 7 Mar 2016 09:36:23 -0800
- Subject: [PATCH 0/7] [BZ #19776] Improve x86-64 memcpy-sse2-unaligned.S
- Authentication-results: sourceware.org; auth=none
This set of patches improves x86-64 memcpy-sse2-unaligned.S by
1. Removing dead code.
2. Setting RAX to the return value at entrance.
3. Removing unnecessary code.
4. Adding entry points for __mempcpy_chk_sse2_unaligned,
__mempcpy_sse2_unaligned and __memcpy_chk_sse2_unaligned.
5. Enabling __mempcpy_chk_sse2_unaligned, __mempcpy_sse2_unaligned and
__memcpy_chk_sse2_unaligned.
bench-mempcpy shows
Ivy Bridge:
simple_mempcpy __mempcpy_avx_unaligned __mempcpy_ssse3_back __mempcpy_ssse3 __mempcpy_sse2_unaligned __mempcpy_sse2
Length 432, alignment 27/ 0: 1628.16 98.3906 73.5625 94.7344 67.1719 139.531
Length 432, alignment 0/27: 1627.84 148.891 80.625 98.8281 104.766 142.625
Length 432, alignment 27/27: 1631.03 90.5469 70.0312 69.5938 76.5469 123.969
Length 448, alignment 0/ 0: 1685.95 79.1875 65.1719 72.9062 70.1406 116.688
Length 448, alignment 28/ 0: 1685.84 89.4531 73.0156 99.5938 86.3594 138.203
Length 448, alignment 0/28: 1684.52 148.016 82.2812 94.8438 103.344 147.578
Length 448, alignment 28/28: 1684.42 86.4688 65.4062 70.0469 72.4688 123.422
Length 464, alignment 0/ 0: 1740.77 70.1406 66.2812 69.2656 71.25 118.234
Length 464, alignment 29/ 0: 1742.31 100.141 75.875 98.9219 83.375 145.594
Length 464, alignment 0/29: 1742.31 148.016 80.2969 107.766 102.031 154.531
Length 464, alignment 29/29: 1740.98 91.5469 64.8438 72.4531 71.4688 127.062
Length 480, alignment 0/ 0: 1967.2 76.875 66.625 71.1406 71.25 123.641
Length 480, alignment 30/ 0: 1799.02 94.3125 72.9062 103.797 80.2969 144.484
Length 480, alignment 0/30: 1797.47 148.453 82.6094 102.906 102.25 158.062
Length 480, alignment 30/30: 1799.02 90.8906 68.0469 69.5938 71.3594 124.844
Length 496, alignment 0/ 0: 1853.83 71.25 68.3906 71.9219 69.1406 123.422
Length 496, alignment 31/ 0: 1855.38 94.8438 74.3438 104.672 73.2344 148.125
Length 496, alignment 0/31: 1853.59 148.906 80.2969 109.297 114.703 163.016
Length 496, alignment 31/31: 1855.27 93.0781 71.4688 72.3438 84.2656 127.953
Length 4096, alignment 0/ 0: 14559.7 509.891 506.469 474.156 508.344 591.062
Nehalem:
simple_mempcpy __mempcpy_ssse3_back __mempcpy_ssse3 __mempcpy_sse2_unaligned __mempcpy_sse2
Length 432, alignment 27/ 0: 113.25 50.9531 64.0312 39.1406 77.6719
Length 432, alignment 0/27: 130.688 45.7969 63.9844 89.1562 133.078
Length 432, alignment 27/27: 118.266 34.4531 36 40.9688 70.7812
Length 448, alignment 0/ 0: 98.2969 34.3594 37.5 39.5156 56.2969
Length 448, alignment 28/ 0: 115.641 51.7969 64.6406 44.6719 77.2031
Length 448, alignment 0/28: 143.297 46.7812 64.9688 88.1719 137.25
Length 448, alignment 28/28: 118.453 34.4531 36.7969 40.0312 70.3125
Length 464, alignment 0/ 0: 101.156 36.0938 37.125 39.4688 63.6562
Length 464, alignment 29/ 0: 118.594 52.6875 69.1406 43.6875 79.9219
Length 464, alignment 0/29: 133.922 46.4062 71.0156 88.5 142.922
Length 464, alignment 29/29: 126.047 36.1406 39.375 39.3281 71.3438
Length 480, alignment 0/ 0: 104.203 36.1406 38.2969 39.2812 59.5312
Length 480, alignment 30/ 0: 120.375 53.2969 69.8438 47.25 80.5312
Length 480, alignment 0/30: 150 47.0625 69.9844 87.4219 148.125
Length 480, alignment 30/30: 126.375 37.9219 37.6875 39.2812 70.8281
Length 496, alignment 0/ 0: 107.016 37.5 39.0938 39.5156 67.2656
Length 496, alignment 31/ 0: 119.719 169.078 71.4375 45.6562 79.4531
Length 496, alignment 0/31: 139.641 47.25 71.2969 101.953 155.062
Length 496, alignment 31/31: 123.844 39.8438 40.6406 45.75 70.5469
Length 4096, alignment 0/ 0: 749.203 245.859 249.609 253.172 292.078
*** BLURB HERE ***
H.J. Lu (7):
Remove dead code from memcpy-sse2-unaligned.S
Don't use RAX as scratch register
Remove L(overlapping) from memcpy-sse2-unaligned.S
Add entry points for __mempcpy_sse2_unaligned and _chk functions
Enable __mempcpy_sse2_unaligned
Enable __mempcpy_chk_sse2_unaligned
Enable __memcpy_chk_sse2_unaligned
sysdeps/x86_64/multiarch/ifunc-impl-list.c | 6 ++
sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S | 125 ++++++++---------------
sysdeps/x86_64/multiarch/memcpy_chk.S | 23 +++--
sysdeps/x86_64/multiarch/mempcpy.S | 19 ++--
sysdeps/x86_64/multiarch/mempcpy_chk.S | 19 ++--
5 files changed, 85 insertions(+), 107 deletions(-)
--
2.5.0