This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH][X86_64] Set bit_Fast_Unaligned_Load for Excavator family CPU's
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- Cc: "Pawar, Amit" <Amit dot Pawar at amd dot com>, "Carlos O'Donell" <carlos at redhat dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Date: Thu, 14 Jan 2016 08:18:03 -0800
- Subject: Re: [PATCH][X86_64] Set bit_Fast_Unaligned_Load for Excavator family CPU's
- Authentication-results: sourceware.org; auth=none
- References: <SN1PR12MB07330EBFA0ED52D659F4F46C97EF0 at SN1PR12MB0733 dot namprd12 dot prod dot outlook dot com> <5695C851 dot 6020700 at redhat dot com> <SN1PR12MB07333D887B4EA80E111107FC97CB0 at SN1PR12MB0733 dot namprd12 dot prod dot outlook dot com> <56967E30 dot 2010203 at redhat dot com> <SN1PR12MB0733C3509730137F269BE52597CC0 at SN1PR12MB0733 dot namprd12 dot prod dot outlook dot com> <5697C7CC dot 1080305 at linaro dot org>
On Thu, Jan 14, 2016 at 8:07 AM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
> OK from my part (you still need x86 maintainers ack to push it upstream).
I cleaned it up. This is what I checked in. Thanks.
> On 14-01-2016 12:52, Pawar, Amit wrote:
>> (a) Ask Adhemerval for an exception to provide this IFUNC tweak for AMD CPUs.
>> Done.
>>
>> (b) Once granted an exception, add your patch to the list of blockers here:
>> https://sourceware.org/glibc/wiki/Release/2.23#Release_blockers.3F
>> Sure.
>>
>> Again, please post your new patch as quickly as possible.
>> I have filed a bug for this. https://sourceware.org/bugzilla/show_bug.cgi?id=19467
>> PFA patch and if OK please commit it in from my side.
>>
>> Thanks
>> Amit
>>
--
H.J.
From d7890e6947114785755ae5b1cf5310491092ee0b Mon Sep 17 00:00:00 2001
From: Amit Pawar <Amit.Pawar@amd.com>
Date: Thu, 14 Jan 2016 20:06:02 +0530
Subject: [PATCH] Set index_Fast_Unaligned_Load for Excavator family CPUs
GLIBC benchtest testcases shows SSE2_Unaligned based implementations
are performing faster compare to SSE2 based implementations for
routines: strcmp, strcat, strncat, stpcpy, stpncpy, strcpy, strncpy
and strstr. Flag index_Fast_Unaligned_Load is set for Excavator family
0x15h CPU's. This makes SSE2_Unaligned based implementations as
default for these routines.
[BZ #19467]
* sysdeps/x86/cpu-features.c (init_cpu_features): Set
index_Fast_Unaligned_Load flag for Excavator family CPUs.
---
ChangeLog | 6 ++++++
sysdeps/x86/cpu-features.c | 8 ++++++++
2 files changed, 14 insertions(+)
diff --git a/ChangeLog b/ChangeLog
index 424f731..054998f 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2016-01-14 Amit Pawar <amit.pawar@amd.com>
+
+ [BZ #19467]
+ * sysdeps/x86/cpu-features.c (init_cpu_features): Set
+ index_Fast_Unaligned_Load flag for Excavator family CPUs.
+
2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net>
* sysdeps/s390/nptl/tls.h (struct tcbhead_t): Add __private_ss field.
diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index e6bd4c9..218ff2b 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -154,6 +154,14 @@ init_cpu_features (struct cpu_features *cpu_features)
cpu_features->cpuid[COMMON_CPUID_INDEX_80000001].ebx,
cpu_features->cpuid[COMMON_CPUID_INDEX_80000001].ecx,
cpu_features->cpuid[COMMON_CPUID_INDEX_80000001].edx);
+
+ if (family == 0x15)
+ {
+ /* "Excavator" */
+ if (model >= 0x60 && model <= 0x7f)
+ cpu_features->feature[index_Fast_Unaligned_Load]
+ |= bit_Fast_Unaligned_Load;
+ }
}
else
kind = arch_kind_other;
--
2.5.0