This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
RE: [PING] Re: String Functions for x86-64
- From: "Evandro Menezes" <evandro dot menezes at amd dot com>
- To: "'H. J. Lu'" <hjl at lucon dot org>
- Cc: "Rene Rebe" <rene at exactcode dot de>, libc-alpha at sourceware dot org, "Ulrich Drepper" <drepper at redhat dot com>, "Meissner, Michael" <michael dot meissner at amd dot com>
- Date: Fri, 28 Jul 2006 17:08:09 -0500
- Subject: RE: [PING] Re: String Functions for x86-64
HJ,
> When around 2MB, slow down is more than 100%.
On which system?
I get these results on a 3.2 GHz P4 with DDR2 PC2-3200, first with the
current memcpy:
Bytes/Cycle
Bytes: AA UU AU UA
8388608: 0.889 0.888 0.888 0.888 6291456: 0.884 0.88 0.88 0.881
4194304: 0.866 0.862 0.863 0.863 3145728: 0.853 0.849 0.848 0.85
2097152: 0.844 0.841 0.838 0.839 1572864: 0.833 0.828 0.827 0.827
1048576: 0.812 0.807 0.803 0.806 786432: 0.81 0.803 0.801 0.805
524288: 1.37 1.3 1.3 1.31 393216: 1.53 1.43 1.43 1.46
262144: 2.02 1.83 1.82 1.85 196608: 2.71 2.2 2.23 2.25
131072: 2.72 2.2 2.21 2.26 98304: 2.77 2.3 2.29 2.23
65536: 2.79 2.33 2.33 2.24 49152: 2.77 2.35 2.36 2.23
32768: 2.78 2.33 2.34 2.25 24576: 2.74 2.33 2.34 2.25
16384: 2.89 2.23 2.19 2.32 12288: 2.88 2.15 2.06 2.3
8192: 2.89 2.1 2.01 2.36 6144: 2.88 2.1 2.11 2.3
4096: 2.87 2.02 2 2.27 3072: 2.81 1.99 1.95 2.41
2048: 2.89 1.93 1.88 1.17 1536: 2.95 1.8 1.68 2.27
1024: 2.64 1.54 1.06 1.11 768: 2.56 1.4 1.41 1.98
512: 2.33 1.27 0.805 1.16 384: 2.34 1.19 1.2 1.39
256: 1.88 0.877 0.571 0.985 192: 1.5 0.686 0.696 1.23
128: 0.604 0.4 0.368 0.762 96: 0.828 0.444 0.444 1.09
64: 0.64 0.25 0.239 0.552 48: 0.273 0.279 0.279 0.273
32: 0.145 0.143 0.14 0.143 24: 0.14 0.143 0.143 0.143
16: 0.0784 0.0851 0.0833 0.0833 12: 0.0769 0.0769 0.0769 0.0769
8: 0.0476 0.0455 0.0455 0.0488 6: 0.0395 0.0395 0.0395 0.0395
4: 0.0208 0.0238 0.0222 0.0238 3: 0.0469 0.0469 0.0469 0.05
2: 0.0227 0.0263 0.0263 0.0333 1: 0.0156 0.0156 0.0156 0.0156
Now, with the new memcpy:
Bytes/Cycle
Bytes: AA UU AU UA
8388608: 1.15 1.19 1.2 1.21 6291456: 1.17 1.19 1.21 1.22
4194304: 1.17 1.21 1.22 1.23 3145728: 1.18 1.22 1.22 1.24
2097152: 1.23 1.23 1.23 1.25 1572864: 1.24 1.23 1.23 1.27
1048576: 1.29 1.27 1.27 1.31 786432: 1.37 1.36 1.31 1.38
524288: 1.56 1.5 1.4 1.53 393216: 1.84 1.7 1.53 1.74
262144: 2.75 2.09 1.75 2.29 196608: 2.77 2.12 1.74 2.29
131072: 2.83 2.07 1.72 2.24 98304: 2.94 2.11 1.69 2.2
65536: 3.04 2.04 1.71 2.14 49152: 3.19 2.01 1.75 2.07
32768: 3.41 1.93 1.79 1.95 24576: 3.78 1.81 1.77 1.84
16384: 5.15 1.77 1.79 1.65 12288: 5.18 1.81 1.73 1.65
8192: 4.92 1.85 1.77 1.77 6144: 4.56 1.83 1.81 1.6
4096: 4.72 1.8 1.91 1.82 3072: 4.09 1.77 1.87 1.54
2048: 3.58 1.77 1.78 0.9 1536: 3.66 1.65 1.71 1.42
1024: 2.94 2.17 1.12 1.22 768: 3.05 2.16 2.16 2.26
512: 2.91 1.88 1.12 1.2 384: 3.43 1.81 2.09 2
256: 1.28 0.914 0.914 1.14 192: 3.43 2.29 2.18 2.18
128: 1.03 0.914 0.727 0.941 96: 2.4 1.33 1.5 1.71
64: 0.941 0.571 0.667 0.727 48: 1.33 0.857 0.857 0.706
32: 0.421 0.32 0.381 0.615 24: 0.75 0.6 0.6 0.462
16: 0.222 0.267 0.333 0.308 12: 0.429 0.333 0.3 0.231
8: 0.2 0.182 0.182 0.167 6: 0.214 0.167 0.167 0.15
4: 0.0667 0.0909 0.0909 0.0833 3: 0.0938 0.0833 0.0833 0.0682
2: 0.0333 0.0556 0.05 0.05 1: 0.0357 0.0278 0.0278 0.0208
Where, A is when either source or destination is 8-byte aligned and U is an
average when either is unaligned between 1 and 7.
HTH
_______________________________________________________
Evandro Menezes GNU Tools Team
512-602-9940 Advanced Micro Devices
evandro.menezes@amd.com Austin, TX