This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] [aarch64][v2] Add an ASIMD variant of strlen for falkor
- From: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
- To: Siddhesh Poyarekar <siddhesh at sourceware dot org>, libc-alpha at sourceware dot org
- Cc: nd at arm dot com, pinskia at gmail dot com
- Date: Wed, 15 Aug 2018 14:49:11 +0100
- Subject: Re: [PATCH] [aarch64][v2] Add an ASIMD variant of strlen for falkor
- References: <20180813163859.29388-1-siddhesh@sourceware.org>
On 13/08/18 17:38, Siddhesh Poyarekar wrote:
This variant of strlen uses vector loads and operations to reduce the
size of the code and also eliminate the non-ascii fallback. This
works very well for falkor because of its two vector units and
efficient vector ops. In the best case it reduces latency of cases in
bench-strlen by 48%, with gains throughout the benchmark.
strlen-walk also sees uniform gains in the 5%-15% range.
Overall the routine appears to work better than the stock one for falkor
regardless of the benchmark, length of string or cache state.
The same cannot be said of a53 and a72 though. a53 performance was
greatly reduced and for a72 it was a bit of a mixed bag, slightly on the
negative side but I reckon it might be fast in some situations.
Changes from v1:
- Renamed *_falkor to *_asimd to make the interface cleaner for other
cores to use.
* sysdeps/aarch64/strlen.S (__strlen): Rename to STRLEN.
[!STRLEN](STRLEN): Set to __strlen.
* sysdeps/aarch64/multiarch/strlen.c: New file.
* sysdeps/aarch64/multiarch/strlen_generic.S: Likewise.
* sysdeps/aarch64/multiarch/strlen_asimd.S: Likewise.
* sysdeps/aarch64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Add strlen.
* sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add
strlen_generic and strlen_asimd.
CC: szabolcs.nagy@arm.com
CC: pinskia@gmail.com
please fix the memmove comments in strlen_generic.S,
with that fixed it's ok to commit.
Reviewed-By: Szabolcs Nagy <szabolcs.nagy@arm.com>
+++ b/sysdeps/aarch64/multiarch/strlen_generic.S
@@ -0,0 +1,42 @@
+/* A Generic Optimized strlen implementation for AARCH64.
+ Copyright (C) 2018 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* The actual strlen and memmove code is in ../strlen.S. If we are
+ building libc this file defines __strlen_generic and __memmove_generic.
+ Otherwise the include of ../strlen.S will define the normal __strlen
+ and__memmove entry points. */
+
the memmove comment seems to be copied from memcpy_generic.S
but does not apply here.
+#include <sysdep.h>
+
+#if IS_IN (libc)
+
+# define STRLEN __strlen_generic
+
+/* Do not hide the generic versions of strlen and memmove, we use them
+ internally. */
likewise.
+# undef libc_hidden_builtin_def
+# define libc_hidden_builtin_def(name)
+
+# ifdef SHARED
+/* It doesn't make sense to send libc-internal strlen calls through a PLT. */
+ .globl __GI_strlen; __GI_strlen = __strlen_generic
+# endif
+
+#endif
+
+#include "../strlen.S"