This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][AArch64] Inline mempcpy again


On 07/05/2018 06:33 PM, Adhemerval Zanella wrote:
If optimizing mempcpy is really required I think a better option would
to provide the optimized based on current memcpy/memmove. I have created
an implementation [1] which provides the expected optimized mempcpy with
the cost of only extra 'mov' instruction on both memcpy and memmove (to
use the same memcpy/memmove code)

[1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/aarch64-mempcpy

I had proposed the exact same thing for __memcpy_chk[1] for aarch64, which was rejected under the pretext that this should be handled completely by gcc. If that consensus has changed then I'd like to propose that patch again as well.

However, I do understand that this is much better off being fixed in gcc so we should probably try and understand the limitations of doing that first. Wilco, does anything prevent gcc from doing this optimization for mempcpy or __memcpy_chk?

Siddhesh

[1] http://sourceware-org.1504.n7.nabble.com/PATCH-0-2-Multiarch-hooks-for-memcpy-variants-td463236.html


--
diff --git a/include/string.h b/include/string.h
index 069efd0b87010e5fdb64c87ced7af1dc4f54f232..46b90b8f346149f075fad026e562dfb27b658969 100644
--- a/include/string.h
+++ b/include/string.h
@@ -197,4 +197,23 @@ extern char *__strncat_chk (char *__restrict __dest,
  			    size_t __len, size_t __destlen) __THROW;
  #endif
+#if defined __USE_GNU && defined __OPTIMIZE__ \
+    && defined __extern_always_inline && __GNUC_PREREQ (3,2) \
+    && defined _INLINE_mempcpy
+
+#undef mempcpy
+#undef __mempcpy
+
+#define mempcpy(dest, src, n) __mempcpy_inline (dest, src, n)
+#define __mempcpy(dest, src, n) __mempcpy_inline (dest, src, n)
+
+__extern_always_inline void *
+__mempcpy_inline (void *__restrict __dest,
+		  const void *__restrict __src, size_t __n)
+{
+  return (char *) memcpy (__dest, __src, __n) + __n;
+}
+
+#endif
+
  #endif
diff --git a/sysdeps/aarch64/string_private.h b/sysdeps/aarch64/string_private.h
index 09dedbf3db40cf06077a44af992b399a6b37b48d..8b8fdddcc17a3f69455e72efe9c3616d2d33abe2 100644
--- a/sysdeps/aarch64/string_private.h
+++ b/sysdeps/aarch64/string_private.h
@@ -18,3 +18,6 @@
/* AArch64 implementations support efficient unaligned access. */
  #define _STRING_ARCH_unaligned 1
+
+/* Inline mempcpy since GCC doesn't optimize it (PR70140).  */
+#define _INLINE_mempcpy 1



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]