This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: "Zhangxuelei (Derek)" <zhangxuelei4 at huawei dot com>, Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>, libc-alpha <libc-alpha at sourceware dot org>, yikunkero <yikunkero at gmail dot com>, jiangyikun <jiangyikun at huawei dot com>
- Cc: nd <nd at arm dot com>
- Date: Thu, 31 Oct 2019 14:04:08 +0000
- Subject: Re: [PATCH 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor
- Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=z4Nm0nhCjKl0QESCEm9RT/w6503ERdGCCQw/auw/SrA=; b=NaEBGgK9NCzm4jzApLefBWfOn5O7wAQKmA7pRfAvRYyqMBT5g0ofbRIAaQ11hV/uMb0GINhMDYy1rHG3gb5hWyCo24cZDu5pzDDK/YHyMlHWDa2cgPiicV0DYJnhR1RposbdkgXSd77Siwm7UjnlszpRVXq8ICxhGQ1DFixJ7gyWodh0iibStxTT8ZEgRXQfSRVKtulo55KY8iNQPSBL33humRqdptD0s5AfZIPdXjgRE8+cBGYQGvvC7WzDWM4jto4kBC0K2XRgUs/+R0nMi1uw0JQ8uOHTD6J+V7a2s4UE/hj/aczwdMkRliaaf6JwfiDD+e/I7ZQNJZuE3S0nyg==
- Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=g59N9XKLzYhE22/qS7pokoh2Or98kRVlWonYqP8GmuTm0I3Deyb8THXGkfmWhVl/0YyqpFfrI3ZhlKtVIH4nahQLGbMSVVERpS7Or21wVOrshitoi/3ur/CcAAYSvU5kdbv6hhuCEZFCUtJ4Q7+zPPfw90sVfZzyQoP+ENBvGjZR9xGwl1KP446ulosGJO6xtK413eqscr/fOG7GdQyppIIn0G6uws97bJYqWfstlOyqwDCYQnztlPPGcD4UjNQeChciZA1p83ICLn7pwDlqoW5k63Mwy76rJQgx7yBXuWs1Bs/FiZOANI0Zw1IjqAwxCodsYLDWHLk3wQALWUgTzQ==
- Original-authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco dot Dijkstra at arm dot com;
- References: <8DC571DDDE171B4094D3D33E9685917BD854D1@DGGEMI529-MBX.china.huawei.com>,<VI1PR0801MB2127BB63FE88F8DA87FD250E83610@VI1PR0801MB2127.eurprd08.prod.outlook.com>,<8DC571DDDE171B4094D3D33E9685917BD9F88D@DGGEMI529-MBX.china.huawei.com>
Hi Derek,
>> There shouldn't be any performance difference between the two cases.
> With the help of our professors, the reason of different performance in unaligned
> case has been found. It is because the streaming write feature of Kunpeng processor
> is not triggered in unaligned case, that means the data needs to be read first and
> then written, time-consumingly.
So is there a way to force write streaming, for example by aligning the source rather
than the destination or use particular instructions?
> And the implementation of v1 patch happens to avoid this problem, which seems a
> better choice for Kunpeng processor at now.
I don't believe it always helps - there is still a large factor between good and bad cases,
like these results from the v1 memcpy:
length=1048578: 305883.00 ( 0.00%) 182002.00 ( 40.00%) 120063.00 ( 60.00%) 292063.00 ( 4.00%) 306638.00
Here the Falkor variant is 2.4 times faster...
Wilco