This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH 0/7] [BZ #19776] Improve x86-64 memcpy-sse2-unaligned.S


This set of patches improves x86-64 memcpy-sse2-unaligned.S by

1. Removing dead code.
2. Setting RAX to the return value at entrance.
3. Removing unnecessary code.
4. Adding entry points for __mempcpy_chk_sse2_unaligned,
__mempcpy_sse2_unaligned and __memcpy_chk_sse2_unaligned.
5. Enabling __mempcpy_chk_sse2_unaligned, __mempcpy_sse2_unaligned and
__memcpy_chk_sse2_unaligned.

bench-mempcpy shows

Ivy Bridge:

                              simple_mempcpy __mempcpy_avx_unaligned __mempcpy_ssse3_back __mempcpy_ssse3 __mempcpy_sse2_unaligned __mempcpy_sse2
Length  432, alignment 27/ 0:	1628.16	98.3906	73.5625	94.7344	67.1719	139.531
Length  432, alignment  0/27:	1627.84	148.891	80.625	98.8281	104.766	142.625
Length  432, alignment 27/27:	1631.03	90.5469	70.0312	69.5938	76.5469	123.969
Length  448, alignment  0/ 0:	1685.95	79.1875	65.1719	72.9062	70.1406	116.688
Length  448, alignment 28/ 0:	1685.84	89.4531	73.0156	99.5938	86.3594	138.203
Length  448, alignment  0/28:	1684.52	148.016	82.2812	94.8438	103.344	147.578
Length  448, alignment 28/28:	1684.42	86.4688	65.4062	70.0469	72.4688	123.422
Length  464, alignment  0/ 0:	1740.77	70.1406	66.2812	69.2656	71.25	118.234
Length  464, alignment 29/ 0:	1742.31	100.141	75.875	98.9219	83.375	145.594
Length  464, alignment  0/29:	1742.31	148.016	80.2969	107.766	102.031	154.531
Length  464, alignment 29/29:	1740.98	91.5469	64.8438	72.4531	71.4688	127.062
Length  480, alignment  0/ 0:	1967.2	76.875	66.625	71.1406	71.25	123.641
Length  480, alignment 30/ 0:	1799.02	94.3125	72.9062	103.797	80.2969	144.484
Length  480, alignment  0/30:	1797.47	148.453	82.6094	102.906	102.25	158.062
Length  480, alignment 30/30:	1799.02	90.8906	68.0469	69.5938	71.3594	124.844
Length  496, alignment  0/ 0:	1853.83	71.25	68.3906	71.9219	69.1406	123.422
Length  496, alignment 31/ 0:	1855.38	94.8438	74.3438	104.672	73.2344	148.125
Length  496, alignment  0/31:	1853.59	148.906	80.2969	109.297	114.703	163.016
Length  496, alignment 31/31:	1855.27	93.0781	71.4688	72.3438	84.2656	127.953
Length 4096, alignment  0/ 0:	14559.7	509.891	506.469	474.156	508.344	591.062

Nehalem:

                             simple_mempcpy __mempcpy_ssse3_back __mempcpy_ssse3 __mempcpy_sse2_unaligned __mempcpy_sse2

Length  432, alignment 27/ 0:	113.25	50.9531	64.0312	39.1406	77.6719
Length  432, alignment  0/27:	130.688	45.7969	63.9844	89.1562	133.078
Length  432, alignment 27/27:	118.266	34.4531	36	40.9688	70.7812
Length  448, alignment  0/ 0:	98.2969	34.3594	37.5	39.5156	56.2969
Length  448, alignment 28/ 0:	115.641	51.7969	64.6406	44.6719	77.2031
Length  448, alignment  0/28:	143.297	46.7812	64.9688	88.1719	137.25
Length  448, alignment 28/28:	118.453	34.4531	36.7969	40.0312	70.3125
Length  464, alignment  0/ 0:	101.156	36.0938	37.125	39.4688	63.6562
Length  464, alignment 29/ 0:	118.594	52.6875	69.1406	43.6875	79.9219
Length  464, alignment  0/29:	133.922	46.4062	71.0156	88.5	142.922
Length  464, alignment 29/29:	126.047	36.1406	39.375	39.3281	71.3438
Length  480, alignment  0/ 0:	104.203	36.1406	38.2969	39.2812	59.5312
Length  480, alignment 30/ 0:	120.375	53.2969	69.8438	47.25	80.5312
Length  480, alignment  0/30:	150	47.0625	69.9844	87.4219	148.125
Length  480, alignment 30/30:	126.375	37.9219	37.6875	39.2812	70.8281
Length  496, alignment  0/ 0:	107.016	37.5	39.0938	39.5156	67.2656
Length  496, alignment 31/ 0:	119.719	169.078	71.4375	45.6562	79.4531
Length  496, alignment  0/31:	139.641	47.25	71.2969	101.953	155.062
Length  496, alignment 31/31:	123.844	39.8438	40.6406	45.75	70.5469
Length 4096, alignment  0/ 0:	749.203	245.859	249.609	253.172	292.078

*** BLURB HERE ***

H.J. Lu (7):
  Remove dead code from memcpy-sse2-unaligned.S
  Don't use RAX as scratch register
  Remove L(overlapping) from memcpy-sse2-unaligned.S
  Add entry points for __mempcpy_sse2_unaligned and _chk functions
  Enable __mempcpy_sse2_unaligned
  Enable __mempcpy_chk_sse2_unaligned
  Enable __memcpy_chk_sse2_unaligned

 sysdeps/x86_64/multiarch/ifunc-impl-list.c       |   6 ++
 sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S | 125 ++++++++---------------
 sysdeps/x86_64/multiarch/memcpy_chk.S            |  23 +++--
 sysdeps/x86_64/multiarch/mempcpy.S               |  19 ++--
 sysdeps/x86_64/multiarch/mempcpy_chk.S           |  19 ++--
 5 files changed, 85 insertions(+), 107 deletions(-)

-- 
2.5.0


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]