This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v2] aarch64: Optimized implementation of strcpy
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: Xuelei Zhang <zhangxuelei4 at huawei dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, "siddhesh at gotplt dot org" <siddhesh at gotplt dot org>, Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>, "jiangyikun at huawei dot com" <jiangyikun at huawei dot com>, "yikunkero at gmail dot com" <yikunkero at gmail dot com>
- Cc: nd <nd at arm dot com>
- Date: Tue, 22 Oct 2019 17:54:08 +0000
- Subject: Re: [PATCH v2] aarch64: Optimized implementation of strcpy
- Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=RlukFhTypfP6mmhenbkK8tRUx7MMavplh8WceozxXHk=; b=hMOLMp41JAq4wEEjVHxyyr0dXDitLbuhzXNWPqk404e1o7b/t370WAVarjMxAwA4ywvFhAOARptZT+fAyyf9z7drzLTavto8mRgmCd6EPE+A2PTlqtU22K3B3KxtdcRozcKt8QofDC44sS321R5DYk9Zx+hfyoNQkJXXMhQGclFJNeOrJ1LdFgn54uPVZUVbUP4/zL8wBLW/KcNzN7NbYHQi9J912A2U4P6JDiZh9PjoQ2v3jYvL3q2iV4PjRj0u39+f2ZTnzmEzdjx4Fc6mHzuoiLUg7xo6jqWjayaIcmF85Su+w5W40UpznpkvKzAPC88Up+5C6lhearNkAp9WLg==
- Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=lBnkMvpbYizPpxoSMcpBMZOZiovwag6GoZ87fzsenUyotRqrul8A/AhUojdeAtldHFMAGigHF+7APXUhS0CmfFrVNLgUQiGwByP+5mYRa5samQHT1qaB1CJ3MKN4nhadTbtLR+NlY3cUpXc9ba5D/+g9JbpAP2C7Idzw/XVJYU0ZcQpXkVadsNhS0sOs5M1qpmoTQDmZ77XkvnXEfql/fdU7yZbiwWSSRjjaC8OrcHNP6ulcpUyK3k2Uk1c8c3T5Qs/6LfBNNa5iB9gwEcdm4D36ryeEOa2/EwDx4DXX0lRNtRKAxi9rR7kFvkALjRxeo1heO74FYbJrfITzm5yKfA==
- Original-authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco dot Dijkstra at arm dot com;
- References: <20191022093930.10588-1-zhangxuelei4@huawei.com>
Hi Xuelei,
> Optimize the strcpy implementation by using vector loads and operations
> in main loop.Compared to aarch64/strcpy.S, it reduces latency of cases
> in bench-strlen by 5%~18% when the length of src is greater than 64
> bytes, with gains throughout the benchmark.
This is OK. I tried it on a few microarchitectures, and it's either as fast or
faster on long strings.
Wilco