This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor

From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
To: "Zhangxuelei (Derek)" <zhangxuelei4 at huawei dot com>, Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>, libc-alpha <libc-alpha at sourceware dot org>, yikunkero <yikunkero at gmail dot com>, jiangyikun <jiangyikun at huawei dot com>
Cc: nd <nd at arm dot com>
Date: Thu, 31 Oct 2019 14:04:08 +0000
Subject: Re: [PATCH 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor
Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=z4Nm0nhCjKl0QESCEm9RT/w6503ERdGCCQw/auw/SrA=; b=NaEBGgK9NCzm4jzApLefBWfOn5O7wAQKmA7pRfAvRYyqMBT5g0ofbRIAaQ11hV/uMb0GINhMDYy1rHG3gb5hWyCo24cZDu5pzDDK/YHyMlHWDa2cgPiicV0DYJnhR1RposbdkgXSd77Siwm7UjnlszpRVXq8ICxhGQ1DFixJ7gyWodh0iibStxTT8ZEgRXQfSRVKtulo55KY8iNQPSBL33humRqdptD0s5AfZIPdXjgRE8+cBGYQGvvC7WzDWM4jto4kBC0K2XRgUs/+R0nMi1uw0JQ8uOHTD6J+V7a2s4UE/hj/aczwdMkRliaaf6JwfiDD+e/I7ZQNJZuE3S0nyg==
Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=g59N9XKLzYhE22/qS7pokoh2Or98kRVlWonYqP8GmuTm0I3Deyb8THXGkfmWhVl/0YyqpFfrI3ZhlKtVIH4nahQLGbMSVVERpS7Or21wVOrshitoi/3ur/CcAAYSvU5kdbv6hhuCEZFCUtJ4Q7+zPPfw90sVfZzyQoP+ENBvGjZR9xGwl1KP446ulosGJO6xtK413eqscr/fOG7GdQyppIIn0G6uws97bJYqWfstlOyqwDCYQnztlPPGcD4UjNQeChciZA1p83ICLn7pwDlqoW5k63Mwy76rJQgx7yBXuWs1Bs/FiZOANI0Zw1IjqAwxCodsYLDWHLk3wQALWUgTzQ==
Original-authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco dot Dijkstra at arm dot com;
References: <8DC571DDDE171B4094D3D33E9685917BD854D1@DGGEMI529-MBX.china.huawei.com>,<VI1PR0801MB2127BB63FE88F8DA87FD250E83610@VI1PR0801MB2127.eurprd08.prod.outlook.com>,<8DC571DDDE171B4094D3D33E9685917BD9F88D@DGGEMI529-MBX.china.huawei.com>

Hi Derek,

>> There shouldn't be any performance difference between the two cases.

> With the help of our professors, the reason of different performance in unaligned
> case has been found. It is because the streaming write feature of Kunpeng processor
> is not triggered in unaligned case, that means the data needs to be read first and
> then written, time-consumingly. 

So is there a way to force write streaming, for example by aligning the source rather
than the destination or use particular instructions?

> And the implementation of v1 patch happens to avoid this problem, which seems a
> better choice for Kunpeng processor at now.

I don't believe it always helps - there is still a large factor between good and bad cases,
like these results from the v1 memcpy:

   length=1048578:    305883.00 (  0.00%)	   182002.00 ( 40.00%)	   120063.00 ( 60.00%)	   292063.00 (  4.00%)	   306638.00

Here the Falkor variant is 2.4 times faster...

Wilco

References:
- Re: [PATCH 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor
  - From: Zhangxuelei (Derek)
- Re: [PATCH 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]