sysdeps/x86_64/fpu/e_expf.S has /* Here if |x| is Inf */ lea L(SP_INF_0)(%rip), %rdx /* depending on sign of x: */ movss (%rdx,%rax,4), %xmm0 /* return zero or Inf */ ret ... .section .rodata.cst8,"aM",@progbits,8 ... .p2align 2 L(SP_INF_0): .long 0x7f800000 /* single precision Inf */ .long 0 /* single precision zero */ .type L(SP_INF_0), @object ASM_SIZE_DIRECTIVE(L(SP_INF_0)) Since L(SP_INF_0) is accessed as an array of 4-byte elements, it can't be put in .section .rodata.cst8,"aM",@progbits,8
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/expf/master has been created at a13f5e6e34a6160607c8ce9448c618b9ae024364 (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a13f5e6e34a6160607c8ce9448c618b9ae024364 commit a13f5e6e34a6160607c8ce9448c618b9ae024364 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Aug 15 08:45:34 2017 -0700 x86-64: Optimize e_expf with FMA [BZ #21912] [BZ #21912] * sysdeps/x86_64/fpu/multiarch/e_expf-fma.S: New file. * sysdeps/x86_64/fpu/multiarch/e_expf-sse2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/e_expf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/ifunc-fma.h: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5c18dfae535d8dd308a034280176c771b4065664 commit 5c18dfae535d8dd308a034280176c771b4065664 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Aug 15 10:34:22 2017 -0700 x86-64: Put L(SP_INF_0) in .rodata.cst4 section [BZ #21955] sysdeps/x86_64/fpu/e_expf.S has /* Here if |x| is Inf */ lea L(SP_INF_0)(%rip), %rdx /* depending on sign of x: */ movss (%rdx,%rax,4), %xmm0 /* return zero or Inf */ ret ... .section .rodata.cst8,"aM",@progbits,8 ... .p2align 2 L(SP_INF_0): .long 0x7f800000 /* single precision Inf */ .long 0 /* single precision zero */ .type L(SP_INF_0), @object ASM_SIZE_DIRECTIVE(L(SP_INF_0)) Since L(SP_INF_0) is accessed as an array of 4-byte elements, it should be placed in .section .rodata.cst4,"aM",@progbits,4 [BZ #21955] * sysdeps/x86_64/fpu/e_expf.S (L(SP_INF_0)): Place it in .rodata.cst4 section. -----------------------------------------------------------------------
L(SP_RANGE) have the same issue.
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/pr21955/master has been created at 25ccb7689da648a69a4da6957b6f62a09bcd5d76 (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=25ccb7689da648a69a4da6957b6f62a09bcd5d76 commit 25ccb7689da648a69a4da6957b6f62a09bcd5d76 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Aug 15 10:34:22 2017 -0700 x86-64: Put L(SP_RANGE)/L(SP_INF_0) in .rodata.cst4 section [BZ #21955] sysdeps/x86_64/fpu/e_expf.S has lea L(SP_RANGE)(%rip), %rdx /* load over/underflow bound */ cmpl (%rdx,%rax,4), %ecx /* |x|<under/overflow bound ? */ ... /* Here if |x| is Inf */ lea L(SP_INF_0)(%rip), %rdx /* depending on sign of x: */ movss (%rdx,%rax,4), %xmm0 /* return zero or Inf */ ret ... .section .rodata.cst8,"aM",@progbits,8 ... .p2align 2 L(SP_RANGE): /* single precision overflow/underflow bounds */ .long 0x42b17217 /* if x>this bound, then result overflows */ .long 0x42cff1b4 /* if x<this bound, then result underflows */ .type L(SP_RANGE), @object ASM_SIZE_DIRECTIVE(L(SP_RANGE)) .p2align 2 L(SP_INF_0): .long 0x7f800000 /* single precision Inf */ .long 0 /* single precision zero */ .type L(SP_INF_0), @object ASM_SIZE_DIRECTIVE(L(SP_INF_0)) Since L(SP_RANGE) and L(SP_INF_0) are accessed as arrays of 4-byte elements, they should be placed in .rodata.cst4 section. [BZ #21955] * sysdeps/x86_64/fpu/e_expf.S (L(SP_RANGE)): Place it in .rodata.cst4 section. (L(SP_INF_0)): Likewise. -----------------------------------------------------------------------
.section .rodata.cst8,"aM",@progbits,8 ... .p2align 2 L(SP_RANGE): /* single precision overflow/underflow bounds */ .long 0x42b17217 /* if x>this bound, then result overflows */ .long 0x42cff1b4 /* if x<this bound, then result underflows */ .type L(SP_RANGE), @object ASM_SIZE_DIRECTIVE(L(SP_RANGE)) .p2align 2 L(SP_INF_0): .long 0x7f800000 /* single precision Inf */ .long 0 /* single precision zero */ .type L(SP_INF_0), @object ASM_SIZE_DIRECTIVE(L(SP_INF_0)) Since L(SP_RANGE) and L(SP_INF_0) are in .rodata.cst8 section, they must be aligned to 8 bytes.
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/pr21955/master has been deleted was 25ccb7689da648a69a4da6957b6f62a09bcd5d76 - Log ----------------------------------------------------------------- 25ccb7689da648a69a4da6957b6f62a09bcd5d76 x86-64: Put L(SP_RANGE)/L(SP_INF_0) in .rodata.cst4 section [BZ #21955] -----------------------------------------------------------------------
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/pr21955/master has been created at 39245565fc0523eece29721c4590639ccebb6145 (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=39245565fc0523eece29721c4590639ccebb6145 commit 39245565fc0523eece29721c4590639ccebb6145 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Aug 15 10:34:22 2017 -0700 x86-64: Align L(SP_RANGE)/L(SP_INF_0) to 8 bytes [BZ #21955] sysdeps/x86_64/fpu/e_expf.S has lea L(SP_RANGE)(%rip), %rdx /* load over/underflow bound */ cmpl (%rdx,%rax,4), %ecx /* |x|<under/overflow bound ? */ ... /* Here if |x| is Inf */ lea L(SP_INF_0)(%rip), %rdx /* depending on sign of x: */ movss (%rdx,%rax,4), %xmm0 /* return zero or Inf */ ret ... .section .rodata.cst8,"aM",@progbits,8 ... .p2align 2 L(SP_RANGE): /* single precision overflow/underflow bounds */ .long 0x42b17217 /* if x>this bound, then result overflows */ .long 0x42cff1b4 /* if x<this bound, then result underflows */ .type L(SP_RANGE), @object ASM_SIZE_DIRECTIVE(L(SP_RANGE)) .p2align 2 L(SP_INF_0): .long 0x7f800000 /* single precision Inf */ .long 0 /* single precision zero */ .type L(SP_INF_0), @object ASM_SIZE_DIRECTIVE(L(SP_INF_0)) Since L(SP_RANGE) and L(SP_INF_0) are in .rodata.cst8 section, they must be aligned to 8 bytes. [BZ #21955] * sysdeps/x86_64/fpu/e_expf.S (L(SP_RANGE)): Aligned to 8 bytes. (L(SP_INF_0)): Likewise. -----------------------------------------------------------------------
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, master has been updated via f59f7adb4a00b7784cab1becdf257366104587b7 (commit) from 6b11a6ad714e7f2bb83556c77d2306e55a94ca54 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f59f7adb4a00b7784cab1becdf257366104587b7 commit f59f7adb4a00b7784cab1becdf257366104587b7 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Aug 15 14:04:59 2017 -0700 x86-64: Align L(SP_RANGE)/L(SP_INF_0) to 8 bytes [BZ #21955] sysdeps/x86_64/fpu/e_expf.S has lea L(SP_RANGE)(%rip), %rdx /* load over/underflow bound */ cmpl (%rdx,%rax,4), %ecx /* |x|<under/overflow bound ? */ ... /* Here if |x| is Inf */ lea L(SP_INF_0)(%rip), %rdx /* depending on sign of x: */ movss (%rdx,%rax,4), %xmm0 /* return zero or Inf */ ret ... .section .rodata.cst8,"aM",@progbits,8 ... .p2align 2 L(SP_RANGE): /* single precision overflow/underflow bounds */ .long 0x42b17217 /* if x>this bound, then result overflows */ .long 0x42cff1b4 /* if x<this bound, then result underflows */ .type L(SP_RANGE), @object ASM_SIZE_DIRECTIVE(L(SP_RANGE)) .p2align 2 L(SP_INF_0): .long 0x7f800000 /* single precision Inf */ .long 0 /* single precision zero */ .type L(SP_INF_0), @object ASM_SIZE_DIRECTIVE(L(SP_INF_0)) Since L(SP_RANGE) and L(SP_INF_0) are in .rodata.cst8 section, they must be aligned to 8 bytes. [BZ #21955] * sysdeps/x86_64/fpu/e_expf.S (L(SP_RANGE)): Aligned to 8 bytes. (L(SP_INF_0)): Likewise. ----------------------------------------------------------------------- Summary of changes: ChangeLog | 6 ++++++ sysdeps/x86_64/fpu/e_expf.S | 4 ++-- 2 files changed, 8 insertions(+), 2 deletions(-)
Fixed for 2.27.
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/pr21955/master has been deleted was 39245565fc0523eece29721c4590639ccebb6145 - Log ----------------------------------------------------------------- 39245565fc0523eece29721c4590639ccebb6145 x86-64: Align L(SP_RANGE)/L(SP_INF_0) to 8 bytes [BZ #21955] -----------------------------------------------------------------------
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/fma/2.26 has been updated via 6d5f5b16bc4bd3945e138509d7986a5231ab5ee6 (commit) via ce3e7f4136a9f5943328c74511542834ca05811b (commit) from 7e7b5de8ffc9ac8fda45b988cde5650004bdbca7 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=6d5f5b16bc4bd3945e138509d7986a5231ab5ee6 commit 6d5f5b16bc4bd3945e138509d7986a5231ab5ee6 Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Aug 16 08:43:35 2017 -0700 x86-64: Optimize e_expf with FMA [BZ #21912] FMA optimized e_expf improves performance by more than 50% on Skylake. [BZ #21912] * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add e_expf-fma. * sysdeps/x86_64/fpu/multiarch/e_expf-fma.S: New file. * sysdeps/x86_64/fpu/multiarch/e_expf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/ifunc-fma.h: Likewise. (cherry picked from commit 24a2e6588d2e0c91b4003878b0625d4a9360e8f3) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ce3e7f4136a9f5943328c74511542834ca05811b commit ce3e7f4136a9f5943328c74511542834ca05811b Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Aug 15 14:04:59 2017 -0700 x86-64: Align L(SP_RANGE)/L(SP_INF_0) to 8 bytes [BZ #21955] sysdeps/x86_64/fpu/e_expf.S has lea L(SP_RANGE)(%rip), %rdx /* load over/underflow bound */ cmpl (%rdx,%rax,4), %ecx /* |x|<under/overflow bound ? */ ... /* Here if |x| is Inf */ lea L(SP_INF_0)(%rip), %rdx /* depending on sign of x: */ movss (%rdx,%rax,4), %xmm0 /* return zero or Inf */ ret ... .section .rodata.cst8,"aM",@progbits,8 ... .p2align 2 L(SP_RANGE): /* single precision overflow/underflow bounds */ .long 0x42b17217 /* if x>this bound, then result overflows */ .long 0x42cff1b4 /* if x<this bound, then result underflows */ .type L(SP_RANGE), @object ASM_SIZE_DIRECTIVE(L(SP_RANGE)) .p2align 2 L(SP_INF_0): .long 0x7f800000 /* single precision Inf */ .long 0 /* single precision zero */ .type L(SP_INF_0), @object ASM_SIZE_DIRECTIVE(L(SP_INF_0)) Since L(SP_RANGE) and L(SP_INF_0) are in .rodata.cst8 section, they must be aligned to 8 bytes. [BZ #21955] * sysdeps/x86_64/fpu/e_expf.S (L(SP_RANGE)): Aligned to 8 bytes. (L(SP_INF_0)): Likewise. (cherry picked from commit f59f7adb4a00b7784cab1becdf257366104587b7) ----------------------------------------------------------------------- Summary of changes: sysdeps/x86_64/fpu/e_expf.S | 4 +- sysdeps/x86_64/fpu/multiarch/Makefile | 3 + sysdeps/x86_64/fpu/multiarch/e_expf-fma.S | 182 +++++++++++++++++++++++++++++ sysdeps/x86_64/fpu/multiarch/e_expf.c | 26 ++++ sysdeps/x86_64/fpu/multiarch/ifunc-fma.h | 34 ++++++ 5 files changed, 247 insertions(+), 2 deletions(-) create mode 100644 sysdeps/x86_64/fpu/multiarch/e_expf-fma.S create mode 100644 sysdeps/x86_64/fpu/multiarch/e_expf.c create mode 100644 sysdeps/x86_64/fpu/multiarch/ifunc-fma.h