This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor


Hi Derek,

>> Note that memchr_strlen significantly outperforms the fastest strlen
>> on sizes larger than 256, so I don't think that using uminv to test
>> for zeroes is the fastest approach.
>
> Indeedly, but memchr_strlen really has poor performance before 256 bytes,

Well that means memchr can be sped up for small sizes. While it is more
complex than strlen, it shouldn't be significantly slower.

> and if we mix this method into current version, we may need a length count
> and judge it more than 256 bytes or not in each loop, is this way cheap?

That may be possible, eg. by unrolling the first 64-128 bytes and using a loop
optimized for throughput for anything larger (on the assumption that if a
string is larger than 128, it is likely much larger).

However my point was that while the uminv sequence is simple and small, it's not
the fastest, so ultimately we need to find an alternative sequence which works
better for all the generic string functions which search for a character (strlen, strnlen,
memchr, memrchr, rawmemchr, strchr, strnchr, strchrnul, strcpy, strncpy).

> And we think small size is more important for strlen.

Absolutely, handling small cases quickly is essential for all string functions.

Wilco

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]