This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: RFC: Rewrite x86-64 IFUNC selector in C
- From: Erich Elsen <eriche at google dot com>
- To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- Cc: Siddhesh Poyarekar <siddhesh at gotplt dot org>, "H.J. Lu" <hjl dot tools at gmail dot com>, "Carlos O'Donell" <carlos at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Thu, 25 May 2017 14:25:10 -0700
- Subject: Re: RFC: Rewrite x86-64 IFUNC selector in C
- Authentication-results: sourceware.org; auth=none
- References: <CAMe9rOq7fCtNSfhQN=QXGjSRkKNfWwC4c9c_kqb4iFbpmNYBEA@mail.gmail.com> <f07f563b-e74f-e2ec-38f5-5f092f73f490@gotplt.org> <4a16e1e8-9baf-7b75-41b0-e25a127c649a@linaro.org>
Ok, I'll get started then.
Are there any general comments about the attached conversion for
memcpy? Just so I don't repeat the same wrong thing many times.
Thanks,
Erich
On Thu, May 25, 2017 at 5:04 AM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
> On 25/05/2017 02:50, Siddhesh Poyarekar wrote:
>> On Thursday 25 May 2017 03:19 AM, H.J. Lu wrote:
>>> Most of time, C is easier to maintain than assembly code. I think
>>> it is a good idea in general. What do other people think?
>>
>> +1, C > assembly wherever possible. Use assembly only if the compiler
>> is incapable of producing the output you want (and if that actually
>> matters) and then reevaluate every few releases.
>>
>> Siddhesh
>>
>
>> +1 as well, now we have the required infrastructure and other ports
>> are already using C implementation I see no reason to keep x86 using
>> assembly ones.
From a2957f5a0b21f9588e8756228b11b86f886b0f4c Mon Sep 17 00:00:00 2001
From: Erich Elsen <eriche@google.com>
Date: Tue, 23 May 2017 12:29:24 -0700
Subject: [PATCH] add memcpy.c
---
sysdeps/x86_64/multiarch/memcpy.c | 70 +++++++++++++++++++++++++++++++++++++++
1 file changed, 70 insertions(+)
create mode 100644 sysdeps/x86_64/multiarch/memcpy.c
diff --git a/sysdeps/x86_64/multiarch/memcpy.c b/sysdeps/x86_64/multiarch/memcpy.c
new file mode 100644
index 0000000000..b0ff8c71fd
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/memcpy.c
@@ -0,0 +1,70 @@
+#include "cpu-features.h"
+#include "init-arch.h"
+#include "shlib-compat.h"
+#include <stdlib.h>
+
+typedef void * (*memcpy_fn)(void *, const void *, size_t);
+
+extern void * __memcpy_erms(void *dest, const void *src, size_t n);
+extern void * __memcpy_sse2_unaligned(void *dest, const void *src, size_t n);
+extern void * __memcpy_sse2_unaligned_erms(void *dest, const void *src, size_t n);
+extern void * __memcpy_ssse3(void *dest, const void *src, size_t n);
+extern void * __memcpy_ssse3_back(void *dest, const void *src, size_t n);
+extern void * __memcpy_avx_unaligned(void *dest, const void *src, size_t n);
+extern void * __memcpy_avx_unaligned_erms(void *dest, const void *src, size_t n);
+extern void * __memcpy_avx512_unaligned(void *dest, const void *src, size_t n);
+extern void * __memcpy_avx512_unaligned_erms(void *dest, const void *src, size_t n);
+
+/* Defined in cacheinfo.c */
+extern long int __x86_shared_cache_size attribute_hidden;
+extern long int __x86_shared_cache_size_half attribute_hidden;
+extern long int __x86_data_cache_size attribute_hidden;
+extern long int __x86_data_cache_size_half attribute_hidden;
+extern long int __x86_shared_non_temporal_threshold attribute_hidden;
+
+static void * select_memcpy_impl(void) {
+ const struct cpu_features* cpu_features_struct_p = __get_cpu_features ();
+
+ if (CPU_FEATURES_ARCH_P(cpu_features_struct_p, Prefer_ERMS)) {
+ return __memcpy_erms;
+ }
+
+ if (CPU_FEATURES_ARCH_P(cpu_features_struct_p, AVX512F_Usable)) {
+ if (CPU_FEATURES_ARCH_P(cpu_features_struct_p, Prefer_No_VZEROUPPER))
+ return __memcpy_avx512_unaligned_erms;
+ return __memcpy_avx512_unaligned;
+ }
+
+ if (CPU_FEATURES_ARCH_P(cpu_features_struct_p, AVX_Fast_Unaligned_Load)) {
+ if (CPU_FEATURES_CPU_P(cpu_features_struct_p, ERMS)) {
+ return __memcpy_avx_unaligned_erms;
+
+ }
+ return __memcpy_avx_unaligned;
+ }
+ else {
+ if (CPU_FEATURES_ARCH_P(cpu_features_struct_p, Fast_Unaligned_Copy)) {
+ if (CPU_FEATURES_CPU_P(cpu_features_struct_p, ERMS)) {
+ return __memcpy_sse2_unaligned_erms;
+
+ }
+ return __memcpy_sse2_unaligned;
+ }
+ else {
+ if (!CPU_FEATURES_CPU_P(cpu_features_struct_p, SSSE3)) {
+ return __memcpy_sse2_unaligned;
+
+ }
+ if (CPU_FEATURES_ARCH_P(cpu_features_struct_p, Fast_Copy_Backward)) {
+ return __memcpy_ssse3_back;
+
+ }
+ return __memcpy_ssse3;
+ }
+ }
+}
+
+void *__new_memcpy(void *dest, const void *src, size_t n)
+ __attribute__ ((ifunc ("select_memcpy_impl")));
+
+versioned_symbol(libc, __new_memcpy, memcpy, GLIBC_2_14);
--
2.13.0.219.gdb65acc882-goog