This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH RFC] Imporve 64bit memcpy performance for Haswell CPU with AVX instruction


Yes, so I refined the code and sent the latest version according to
your comments.

Now new memmove code as below as gzipped attachement :

+#ifdef USE_AS_MEMMOVE
+L(gobble_mem_fwd_llc_start):
+#endif
+	mov	%rdx, %rcx
+	mov	%rdx, %rcx
+	rep	movsb
+	ret
+
+	.p2align 4
+L(gobble_big_data_fwd):
+#ifdef USE_AS_MEMMOVE
+	mov	%rsi, %r10
+	sub	%rdi, %r10
+	cmp	%rcx, %r10
+	jb	L(gobble_mem_fwd_llc_start)

Ling: if the code go here, rdx > rcx, but if the distance between rsi
and rdi is smaller than rcx, the dst and src are must overlap, because
the distance is located in LLC,
that means src can help dst to get LLC hit. So we jump back, instead
of using non-temporary store mode.

+#endif


2014-07-10 21:36 GMT+08:00, OndÅej BÃlka <neleai@seznam.cz>:
> On Mon, Jul 07, 2014 at 10:04:27AM +0800, Ling Ma wrote:
>> Any comments ?
>>
> did you see my previous mail?
>

Attachment: memcpy-avx-unaligned.patch.tar.gz
Description: GNU Zip compressed data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]