This version uses general register based memory instruction to load
data, because vector register based is slightly slower in emag.
Character-matching is performed on 16-byte (both size and alignment)
memory block in parallel each iteration.
* sysdeps/aarch64/memchr.S (__memchr): Rename to MEMCHR.
[!MEMCHR](MEMCHR): Set to __memchr.
* sysdeps/aarch64/multiarch/Makefile (sysdep_routines):
Add memchr_generic and memchr_base.
* sysdeps/aarch64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Add memchr ifuncs.
* sysdeps/aarch64/multiarch/memchr.c: New file.
* sysdeps/aarch64/multiarch/memchr_generic.S: Likewise.
* sysdeps/aarch64/multiarch/memchr_base.S: Likewise.
---
ChangeLog | 12 ++
sysdeps/aarch64/memchr.S | 10 +-
sysdeps/aarch64/multiarch/Makefile | 1 +
sysdeps/aarch64/multiarch/ifunc-impl-list.c | 3 +
sysdeps/aarch64/multiarch/memchr.c | 41 +++++
sysdeps/aarch64/multiarch/memchr_base.S | 223 ++++++++++++++++++++++++++++
sysdeps/aarch64/multiarch/memchr_generic.S | 33 ++++
7 files changed, 320 insertions(+), 3 deletions(-)
create mode 100644 sysdeps/aarch64/multiarch/memchr.c
create mode 100644 sysdeps/aarch64/multiarch/memchr_base.S
create mode 100644 sysdeps/aarch64/multiarch/memchr_generic.S
diff --git a/ChangeLog b/ChangeLog
index b4c07e2..6386b1e 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,17 @@
2018-12-17 Feng Xue <fxue@os.amperecomputing.com>
+ * sysdeps/aarch64/memchr.S (__memchr): Rename to MEMCHR.
+ [!MEMCHR](MEMCHR): Set to __memchr.
+ * sysdeps/aarch64/multiarch/Makefile (sysdep_routines):
+ Add memchr_generic and memchr_base.
+ * sysdeps/aarch64/multiarch/ifunc-impl-list.c
+ (__libc_ifunc_impl_list): Add memchr ifuncs.
+ * sysdeps/aarch64/multiarch/memchr.c: New file.
+ * sysdeps/aarch64/multiarch/memchr_generic.S: Likewise.
+ * sysdeps/aarch64/multiarch/memchr_base.S: Likewise.