This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.

From: Patrick McGehearty <patrick dot mcgehearty at oracle dot com>
To: libc-alpha at sourceware dot org
Date: Fri, 20 Oct 2017 10:04:05 -0500
Subject: Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
Authentication-results: sourceware.org; auth=none
References: <1508172962-97543-1-git-send-email-patrick.mcgehearty@oracle.com> <alpine.DEB.2.20.1710181706300.28577@digraph.polyomino.org.uk> <6d6104ad-b846-68b3-8f87-3216d1e52412@oracle.com> <alpine.DEB.2.20.1710192235050.9397@digraph.polyomino.org.uk>

On 10/19/2017 5:48 PM, Joseph Myers wrote:

On Thu, 19 Oct 2017, Patrick McGehearty wrote:

For the copyright issue, would it be appropriate to move the
previous copyright notice to right before __exp1?
I'm reluctant to move __exp1 as that might also require changes
to Makefiles which are not currently required.

There should be exactly one copyright notice with a given copyright holder
in a given source file, so only one set of dates (2001-2017) given.  The
"IBM Accurate Mathematical Library" comment might be updated to describe
what parts of the file come from where.

I'll see what I can do.

For slowexp, I have some reservations that removing slowexp.o
might cause older object modules to break due to the missing
reference. I know they should not be directly referencing
an internal function, but still...
Anyway, I can't find any other direct usage of __slowexp.
I will remove all references to __slowexp and see
if anything breaks to show I missed a reference.

I find the following files have references, including some more
files to be removed.

This list seems to be missing (at least)
sysdeps/powerpc/power4/fpu/Makefile (CPPFLAGS-slowexp.c setting).

Thanks. I missed that in my grep review.

benchtests/strcol1-inputs/filelist#C and filelist#en_US.UTF-8
have references to slowexp*.c
I'm supposing those should also be removed.

I know some people do edit this benchmark input when removing files, but I
think that's inappropriate; it's just a test input that happens to be
based on a list of files in some version of glibc, there is no need for it
to correspond to current glibc, and in the interests of comparability of
benchmark results it would be best for it never to change.

Ok, I'll leave those alone.

I've been uncomfortable with hex floats approach
as it only works for ieee754 representations
that use base 2. I admit that is most current machines.

This file is in a directory for such a binary format, and it's tuned for
that particular format (regarding polynomial size chosen, etc.).  Any code
written for decimal formats would naturally use decimal constants, but in
code written for binary formats, hex constants are appropriate for such
precomputed constants to make clear exactly what value / representation is
intended.  (And in the 0x1p-54 case, it makes the code a lot more readable
to put that in hex.)

As noted in other email, I'll do the hex conversion.

Underneath the definitions of SET_RESTORE_ROUND, it ultimately relies
on __fegetround() and __fesetround(). The extra cost for
SET_RESTORE_ROUND is that it requires a flag to always be set (mode
changed or did not change). A short time the flag must tested to
determine if the rounding mode needs to be restored. If the compiler
puts that flag on the stack or in memory, the reading of a value that
was just written to cache triggers a "Read after Write" HW hazard,
causing a typical delay of 30-40 cycles. Avoiding the test also
avoids a possible mis-predicted branch.  For M7 and earlier, Sparc
branch prediction is not ideal and mis-prediction is expensive.  The
code I provided avoids the need to set the flag or test it by
duplicating small segments of code for each case.

Well, there are lots of lower-level interfaces such as libc_fesetround
that could be used if appropriate.  Maybe additional such interfaces need
to be added to support such cases of duplicating code.  It would seem
natural to start with the existing interfaces (SET_RESTORE_ROUND), with a
view to a followup possibly then adding further optimizations, just as the
expf changes started by adding something wrapped by the existing wrapper,
then arranged to avoid a wrapper in subsequent patches in the series.

I'll test Wilco's suggestion of using get_rounding_mode and libc_fesetround.

Guaranteeing "inexact" is not part of the goals for most libm functions,
so I expect you can remove that term.

The "inexact" test was required to pass the (make check) math lib tests.

You'll need to explain more.  For functions which are not fully defined by
a binding to IEEE operations, both spurious and missing "inexact" should
be allowed by the testsuite.

It was months ago that I made the inexact modifications. It was part
of dealing with failure to report underflow/overflow/NAN conditions.
I've forgotten the exact details.
I'll remove the inexact code and report what test failures are seen if any.

- patrick

References:
- [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  - From: Patrick McGehearty
- Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  - From: Joseph Myers
- Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  - From: Patrick McGehearty
- Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  - From: Joseph Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]