This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH RFC] Imporve 64bit memcpy performance for Haswell CPU with AVX instruction


On Fri, Jul 11, 2014 at 09:20:58AM +0800, Ling Ma wrote:
> Yes, so I refined the code and sent the latest version according to
> your comments.
> 
> Now new memmove code as below as gzipped attachement :
> 
> +#ifdef USE_AS_MEMMOVE
> +L(gobble_mem_fwd_llc_start):
> +#endif
> +	mov	%rdx, %rcx
> +	mov	%rdx, %rcx
> +	rep	movsb
> +	ret
> +
> +	.p2align 4
> +L(gobble_big_data_fwd):
> +#ifdef USE_AS_MEMMOVE
> +	mov	%rsi, %r10
> +	sub	%rdi, %r10
> +	cmp	%rcx, %r10
> +	jb	L(gobble_mem_fwd_llc_start)
> 
> Ling: if the code go here, rdx > rcx, but if the distance between rsi
> and rdi is smaller than rcx, the dst and src are must overlap, because
> the distance is located in LLC,
> that means src can help dst to get LLC hit. So we jump back, instead
> of using non-temporary store mode.
> 
And do you have application where this actually happen? You lose
on performance every time this does not happen and given how rare are
large inputs I doubt it this will pay for itself.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]