This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH v2] aarch64: Optimized implementation of memcmp

From: "Zhangxuelei (Derek)" <zhangxuelei4 at huawei dot com>
To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, "siddhesh at gotplt dot org" <siddhesh at gotplt dot org>, Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>, jiangyikun <jiangyikun at huawei dot com>, "yikunkero at gmail dot com" <yikunkero at gmail dot com>
Cc: nd <nd at arm dot com>
Date: Wed, 23 Oct 2019 14:31:07 +0000
Subject: Re: [PATCH v2] aarch64: Optimized implementation of memcmp

Hi Wilco,

> It seems there are some regressions in the 8-16 byte range,
> presumably due to handling these sizes differently.

Yep, we judge 16 byte rather than 8 byte at the beginning of function, resulting in 8-16 byte range to be judged and jumped once more. But it impacts less on small sizes and benefits more on middle and large sizes.

> So why not use 2xCSEL rather than a branch across the moves?
> That's going to be faster since the branch will be hard to predict.

Great! This can reduce one branch prediction, and I have modified as suggested.

Other problems like unused label and format is also corrected.

And the patch v3 link: https://sourceware.org/ml/libc-alpha/2019-10/msg00684.html


Xuelei

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]