This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v2] aarch64: Optimized implementation of memcmp
- From: "Zhangxuelei (Derek)" <zhangxuelei4 at huawei dot com>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, "siddhesh at gotplt dot org" <siddhesh at gotplt dot org>, Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>, jiangyikun <jiangyikun at huawei dot com>, "yikunkero at gmail dot com" <yikunkero at gmail dot com>
- Cc: nd <nd at arm dot com>
- Date: Wed, 23 Oct 2019 14:31:07 +0000
- Subject: Re: [PATCH v2] aarch64: Optimized implementation of memcmp
Hi Wilco,
> It seems there are some regressions in the 8-16 byte range,
> presumably due to handling these sizes differently.
Yep, we judge 16 byte rather than 8 byte at the beginning of function, resulting in 8-16 byte range to be judged and jumped once more. But it impacts less on small sizes and benefits more on middle and large sizes.
> So why not use 2xCSEL rather than a branch across the moves?
> That's going to be faster since the branch will be hard to predict.
Great! This can reduce one branch prediction, and I have modified as suggested.
Other problems like unused label and format is also corrected.
And the patch v3 link: https://sourceware.org/ml/libc-alpha/2019-10/msg00684.html
Xuelei