This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor

From: Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>
To: Xuelei Zhang <zhangxuelei4 at huawei dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
Cc: nd <nd at arm dot com>
Date: Tue, 15 Oct 2019 12:04:43 +0000
Subject: Re: [PATCH 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor
Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=P2sNhUts5uL59yWjAIGgfg6XAiW3XwlE73XGMbf3AL4=; b=GDKICYalN/PFFAY23K1fddPMw3KRl3ujXKFOSjbJhO/UKUeFYlTVGtZddnP0RGFFj3bMAhE9EgrYOIS7OXFhcTm0eQmArXwDP7jNLybm/3XOd7CcTLNt9Y6SdmzYjU2ZTZYM0svy3dJow0eGKhinUWZCbyZAfUCvoNSWyr4iwBg2HHkFIRX5LrgpGZui99jk1IkkYd/YPz2fLWbKTmkhsfs0VmIo4X3B1XIDJS+mF3dWXhrQ64ffkI4bONHenegLFUYFqWO1wV9SlUIor9Kw4Qi8EHtP7WsaW0OjdwcXxJSGEz1T9urTbhIp8j+VHmL4bTxROUaedmyvVgMvNoew1g==
Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Y/oOYOElm7WoMBokJdnr2jjnUQsFUk74+OiT+xKMwV/YKH/0ICA829EGnjsG/HTm5OIhPHirDHYoqHyS1l59lWiNx4EVyMUZYSKcwqPbfOwhbzVIJA4avPO9D08aPGCynwwl62qtr7JSMN+TNMQSaOOBXIaVq5ds0QPUds2L62YRTfmGWlkgbro8YQHKZmWTj/vFEx3xY8bR+d7XPrhrVzKLj5Qir/32fWJLWq3KCUU+66jhWHSkRInaME4PmmtNu6ItUSvasjtHXMdcDKwLUh+PflH6Yi29Jibl49+fzl2MSWzM4VayPTjFBSHOXTrAypjwZKmwPx/SClWt+qAwaw==
Original-authentication-results: spf=none (sender IP is ) smtp.mailfrom=Szabolcs dot Nagy at arm dot com;
References: <20191014034456.11548-1-zhangxuelei4@huawei.com>

On 14/10/2019 04:44, Xuelei Zhang wrote:
> This is an optimized implementation of the memcpy and memmove on the
> Huawei Kunpeng processor.
> 
> Based on the prefetch mechanism on Kunpeng arch, branch to handle 96
> to 2K bytes in memcpy is written without prfm instruction. Hence,
> memcpy has an optimization effect above 128 bytes, 18% improvement
> for copies above 2K bytes, and 38% for larger bytes, such as 32M
> bytes around.
> 
> And for memmove, there are two main changes: i) Q register is used
> instead of  X register. ii) dst address is aligned instead of src
> address aligned to improve store operation. Hence, memmove
> implementation also has improvement above 128 bytes, that about 30%
> for 2k to 8M bytes, and about 50% for 32M or more.

i'd like to work on a generic memcpy that's acceptable
instead of minor variations of memcpy per uarch, i'll
have to take a look why this one is different from all
the others.

it would be nice to see the memcpy-random benchmarks too.

stopping prefetch at 2k is surprising.

References:
- [PATCH 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor
  - From: Xuelei Zhang

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]