This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[PATCH] x86-64: Compile branred.c with -mprefer-vector-width=128
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: GNU C Library <libc-alpha at sourceware dot org>
- Date: Fri, 7 Jun 2019 10:59:46 -0700
- Subject: [PATCH] x86-64: Compile branred.c with -mprefer-vector-width=128
-O3 with AVX vectorizes some loops in sysdeps/ieee754/dbl-64/branred.c
with 256-bit vector instructions, which leads to store forward stall:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579
There is no easy fix in compiler. This patch limits vector width to
128 bits to work around this issue. It improves performance of sin
and cos by more than 40% on Skylake compiled with -O3 -march=skylake.
OK for master branch?
* sysdeps/x86_64/fpu/Makefile (CFLAGS-branred.c): New. Set
to -mprefer-vector-width=128.
--
H.J.
From 53f43ccf241896d37b759ac416df0ef0ccd2da0e Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Fri, 17 May 2019 14:23:03 -0700
Subject: [PATCH] x86-64: Compile branred.c with -mprefer-vector-width=128
-O3 with AVX vectorizes some loops in sysdeps/ieee754/dbl-64/branred.c
with 256-bit vector instructions, which leads to store forward stall:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579
There is no easy fix in compiler. This patch limits vector width to
128 bits to work around this issue. It improves performance of sin
and cos by more than 40% on Skylake compiled with -O3 -march=skylake.
* sysdeps/x86_64/fpu/Makefile (CFLAGS-branred.c): New. Set
to -mprefer-vector-width=128.
---
sysdeps/x86_64/fpu/Makefile | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
index 2b7d69bb50..b5f9589021 100644
--- a/sysdeps/x86_64/fpu/Makefile
+++ b/sysdeps/x86_64/fpu/Makefile
@@ -237,3 +237,7 @@ CFLAGS-test-float-libmvec-sincosf-avx512.c = -DREQUIRE_AVX512F
CFLAGS-test-float-libmvec-sincosf-avx512-main.c = $(libmvec-sincos-cflags) $(float-vlen16-arch-ext-cflags)
endif
endif
+
+ifeq ($(subdir),math)
+CFLAGS-branred.c = -mprefer-vector-width=128
+endif
--
2.20.1