This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.

From: Patrick McGehearty <patrick dot mcgehearty at oracle dot com>
To: Joseph Myers <joseph at codesourcery dot com>
Cc: libc-alpha at sourceware dot org
Date: Mon, 1 Jan 2018 10:31:50 -0600
Subject: Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
Authentication-results: sourceware.org; auth=none
References: <1514590916-114435-1-git-send-email-patrick.mcgehearty@oracle.com> <alpine.DEB.2.20.1801010131530.28505@digraph.polyomino.org.uk>

On 12/31/2017 7:36 PM, Joseph Myers wrote:

On Fri, 29 Dec 2017, Patrick McGehearty wrote:

Version 9 of proposed patch.

Replaced get_rounding_mode and libc_fesetround() with SET_RESTORE_ROUND
to avoid Intel rounding mode issue which showed as test failures in
tgamma_upward. Adds noticable overhead for platforms that incur
significant cost when rounding mode is already FE_TONEAREST. Added
SET_RESTORE_ROUND to two more cases which resolved 1 ulp rounding
errors for cexp().

Is the "5x" in the subject still correct?  (The subject line of a patch
should be suitable for the summary line of the commit message, so must
always reflect the current patch version accurately.  Likewise, the text
of the patch submission, minus anything about changes relative to a
previous patch version, must be suitable for the longer part of the commit
message.)

I can't be sure the "make bench" test gives representative data sincemany of the test valuesseem at the extreme ranges and may trigger the "extremely precise andslow" paths inthe old code. Those tests still show very large improves (mean: old 5473vs new 418).I'll reconfirm with my own "typical values" tests. I used 5x as a valuethat I consideredvery conservative originally to allow for platform to platformvariations and for differences

with different ideas of "typical values".

Expanded the scaling table from 64 entries to 128 entries, renaming
invln2_64 to invln2_256 as well as ln2_64hi and ln2_64lo to ln2_256hi
and ln2_256lo. That reduces the 1 ulp error rate per 1000 values from
1.6 to 0.6.

I think this expansion - and corresponding increase in cache usage - is a
bad idea.  Table size should be kept down, consistent with suitable
accuracy of e.g. < 1 ulp in this case.


The table size verses error rate tradeoff:
 32 entries (512 bytes) 2.95
 64 entries (1K bytes)  1.62
128 entries (2K bytes)  0.98
256 entries (4K bytes)  0.65
That is the number of 1 ulp diffs from the old version per 1000 test values.

As far as I can determine, the old code had an error rate 0.0 values per1000.

I was going for more accuracy to reduce the chances for
pushback from the field once this change starts getting used
more widely.

With typical L3 caches now measured in Mbytes/thread
and L2 caches at least 64Kbytes/thread if not 256Kbytes/thread
having modestly larger tables is a reasonable tradeoff,
especially since we are trading so much performance improvement
but giving up some accuracy. I retained the 64 and 128 entry
versions, so I can switch out the table size easily.


This patch version also appears to be missing the fixups I made as part of
committing the previous patch.  See
<https://sourceware.org/ml/libc-alpha/2017-12/msg00649.html>.  In addition
to the extra slowexp.c removals I noted there, I also needed to remove
trailing whitespace from a few lines - you should make sure your patch
does not include any additions of lines with trailing spaces, as the
commit hooks will reject any push that adds such lines.

I'll fix the i386, ia64, and m68k issues and look for trailingwhitespace in all the changed files.


- patrick

Follow-Ups:
- Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  - From: Joseph Myers

References:
- Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  - From: Joseph Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]