This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor

From: "Zhangxuelei (Derek)" <zhangxuelei4 at huawei dot com>
To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>, libc-alpha <libc-alpha at sourceware dot org>, yikunkero <yikunkero at gmail dot com>, jiangyikun <jiangyikun at huawei dot com>
Cc: nd <nd at arm dot com>
Date: Tue, 19 Nov 2019 13:34:18 +0000
Subject: Re: [PATCH 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor

Hi Wilco,

> So is there a way to force write streaming, for example by aligning
> the source rather than the destination or use particular instructions?

Difference walk direactions of dst make the align offset different, further resulted in the different performance between memcpy-walk and memmove-walk. 

In new patch, we use dst aligned rather than src aligned to solve this problem, and now both memcpy-walk and memmove-walk performe well as before with dst_unaligned code removed.

> In order to select the right memmove implementation, multiarch/
> memmove.c needs similar changes as multiarch/memcpy.c.
>
> Also since the memmove entry sequence does both check for medium
> and large cases, the full overlap check should be done in both.

As reminded, full overlap check is done now in large cases, and memmove.c is also added to new patch.

In addition, is there any reviews for the latest memset_kunpeng patch as below:
https://sourceware.org/ml/libc-alpha/2019-11/msg00044.html

Cheers,
Xuelei

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]