This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH v2 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: "Zhangxuelei (Derek)" <zhangxuelei4 at huawei dot com>, Yikun Jiang <yikunkero at gmail dot com>
- Cc: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>, Siddhesh Poyarekar <siddhesh at gotplt dot org>, jiangyikun <jiangyikun at huawei dot com>, Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>
- Date: Tue, 29 Oct 2019 14:34:05 +0000
- Subject: Re: [PATCH v2 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor
- Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8q5yXpkPhsowZNuC7u0Ot8+POjPvqqpR5QJ1qNt6OU8=; b=WvTAhGzakuTZgZLMxlu7o7M3b5qICCnJe6CLlfJBv4+xVyezOhsZ2UBIsklZJuUlpbdMv+a6vRnhqvC+vi9DW9VWfzBAAZyVh/RHCkJeP4olJGqPfVPxcpc9iMUdmeXukNYmmNuY9ZmcAj5XYok8nHyH/8YLM3H21tv/Rln3OoCCtspjtcclyJfg+tbJHgR8L6kZ3vD9Yd9iiMxuh/6ZY29ZvvjGWNd5RE53I50pFOiu/BJiepRwiTajDzxOJViVs6GHto6NysAX+9k5UpIeqb4wQDiW6DatdIRKyMjLNhNt/yJgtSh+r0hWnbAgXl/xgzDC+hWR2ADq3eQydkc2KQ==
- Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OJp8doLeviS3tnrD5+TMYAVi+7qaZ1r4K9lAGk2v874C4EIBHI4+zPhKyRHyr1HPgaF4he9rPg3bjAAJnjec3IjJfI6eOcW3rh6aWiijCILTGdxddgbgLi+Kk1HAwB7dLcbo9DyTk2Nq+rdjouKQYwo2srZ6yvZ+2dBeGSvHuxpCA7jg0dvvWSYiac6x68D2gbWJJiBr83/uSqU4QVPkAK43m6PEQRiycmyykp7jSf6aDtXP1pqrIlSmYVFRxYH+PMz6KRexoxGqsLCfFQx9tDA7qQ+Gr3sJAhZvyOybN+ZHtrAfmSJEyYAFor+PI3okizlvc8siZKfh+5THrUhvnQ==
- Original-authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco dot Dijkstra at arm dot com;
- References: <8DC571DDDE171B4094D3D33E9685917BD87078@DGGEMI529-MBX.china.huawei.com>
>> Note that memchr_strlen significantly outperforms the fastest strlen
>> on sizes larger than 256, so I don't think that using uminv to test
>> for zeroes is the fastest approach.
> Indeedly, but memchr_strlen really has poor performance before 256 bytes,
Well that means memchr can be sped up for small sizes. While it is more
complex than strlen, it shouldn't be significantly slower.
> and if we mix this method into current version, we may need a length count
> and judge it more than 256 bytes or not in each loop, is this way cheap?
That may be possible, eg. by unrolling the first 64-128 bytes and using a loop
optimized for throughput for anything larger (on the assumption that if a
string is larger than 128, it is likely much larger).
However my point was that while the uminv sequence is simple and small, it's not
the fastest, so ultimately we need to find an alternative sequence which works
better for all the generic string functions which search for a character (strlen, strnlen,
memchr, memrchr, rawmemchr, strchr, strnchr, strchrnul, strcpy, strncpy).
> And we think small size is more important for strlen.
Absolutely, handling small cases quickly is essential for all string functions.