This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Disable powf/log2?f/exp2?f optimization for single-precision Arm FPU


Hi,

New optimized powf, logf, log2f, expf and exp2f yield worse performance
on Arm targets with only single precision instructions because the double
precision arithmetic is then implemented via softfloat routines. This
patch uses the old implementation when double precision instructions are
not available on Arm targets.

Testing: Built newlib with GCC's rmprofile Arm multilibs and compared
before/after -> only the above functions are changed and calls to them
(name change from logf to __ieee754_logf and similar). Testing the
changed function on a panel of values yields the same result before the
original patches to improve them and after this one. Double checking the
performance by looping the same panel of values being tested on Arm
Cortex-M4 does show the performance regression is fixed.

Patch in git format-patch format (has anyone a better way of saying this?) attached.

Best regards,

Thomas
>From 7d8b9d685daf9a45274cfaf819a1dbc8c92a752a Mon Sep 17 00:00:00 2001
From: Thomas Preud'homme <thomas.preudhomme@arm.com>
Date: Thu, 18 Jan 2018 15:26:39 +0000
Subject: [PATCH] Disable powf/log2?f/exp2?f optimization for single-precision
 Arm FPU

New optimized powf, logf, log2f, expf and exp2f yield worse performance
on Arm targets with only single precision instructions because the
double precision arithmetic is then implemented via softfloat routines.
This patch uses the old implementation when double precision
instructions are not available on Arm targets.

Testing: Built newlib with GCC's rmprofile Arm multilibs and compared
before/after -> only the above functions are changed and calls to them
(name change from logf to __ieee754_logf and similar). Testing the
changed function on a panel of values yields the same result before the
original patches to improve them and after this one. Double checking the
performance by looping the same panel of values being tested on Arm
Cortex-M4 does show the performance regression is fixed.
---
 newlib/libc/include/machine/ieeefp.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/newlib/libc/include/machine/ieeefp.h b/newlib/libc/include/machine/ieeefp.h
index b1b4466..9fbef84 100644
--- a/newlib/libc/include/machine/ieeefp.h
+++ b/newlib/libc/include/machine/ieeefp.h
@@ -78,7 +78,9 @@
 # else
 #  define __IEEE_BIG_ENDIAN
 # endif
-# define __OBSOLETE_MATH_DEFAULT 0
+# if __ARM_FP & 0x8
+#  define __OBSOLETE_MATH_DEFAULT 0
+# endif
 #else
 # define __IEEE_BIG_ENDIAN
 # ifdef __ARMEL__
-- 
2.7.4


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]