This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
- From: Patrick McGehearty <patrick dot mcgehearty at oracle dot com>
- To: Joseph Myers <joseph at codesourcery dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Mon, 1 Jan 2018 10:31:50 -0600
- Subject: Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
- Authentication-results: sourceware.org; auth=none
- References: <1514590916-114435-1-git-send-email-patrick.mcgehearty@oracle.com> <alpine.DEB.2.20.1801010131530.28505@digraph.polyomino.org.uk>
On 12/31/2017 7:36 PM, Joseph Myers wrote:
On Fri, 29 Dec 2017, Patrick McGehearty wrote:
Version 9 of proposed patch.
Replaced get_rounding_mode and libc_fesetround() with SET_RESTORE_ROUND
to avoid Intel rounding mode issue which showed as test failures in
tgamma_upward. Adds noticable overhead for platforms that incur
significant cost when rounding mode is already FE_TONEAREST. Added
SET_RESTORE_ROUND to two more cases which resolved 1 ulp rounding
errors for cexp().
Is the "5x" in the subject still correct? (The subject line of a patch
should be suitable for the summary line of the commit message, so must
always reflect the current patch version accurately. Likewise, the text
of the patch submission, minus anything about changes relative to a
previous patch version, must be suitable for the longer part of the commit
message.)
I can't be sure the "make bench" test gives representative data since
many of the test values
seem at the extreme ranges and may trigger the "extremely precise and
slow" paths in
the old code. Those tests still show very large improves (mean: old 5473
vs new 418).
I'll reconfirm with my own "typical values" tests. I used 5x as a value
that I considered
very conservative originally to allow for platform to platform
variations and for differences
with different ideas of "typical values".
Expanded the scaling table from 64 entries to 128 entries, renaming
invln2_64 to invln2_256 as well as ln2_64hi and ln2_64lo to ln2_256hi
and ln2_256lo. That reduces the 1 ulp error rate per 1000 values from
1.6 to 0.6.
I think this expansion - and corresponding increase in cache usage - is a
bad idea. Table size should be kept down, consistent with suitable
accuracy of e.g. < 1 ulp in this case.
The table size verses error rate tradeoff:
32 entries (512 bytes) 2.95
64 entries (1K bytes) 1.62
128 entries (2K bytes) 0.98
256 entries (4K bytes) 0.65
That is the number of 1 ulp diffs from the old version per 1000 test values.
As far as I can determine, the old code had an error rate 0.0 values per
1000.
I was going for more accuracy to reduce the chances for
pushback from the field once this change starts getting used
more widely.
With typical L3 caches now measured in Mbytes/thread
and L2 caches at least 64Kbytes/thread if not 256Kbytes/thread
having modestly larger tables is a reasonable tradeoff,
especially since we are trading so much performance improvement
but giving up some accuracy. I retained the 64 and 128 entry
versions, so I can switch out the table size easily.
This patch version also appears to be missing the fixups I made as part of
committing the previous patch. See
<https://sourceware.org/ml/libc-alpha/2017-12/msg00649.html>. In addition
to the extra slowexp.c removals I noted there, I also needed to remove
trailing whitespace from a few lines - you should make sure your patch
does not include any additions of lines with trailing spaces, as the
commit hooks will reject any push that adds such lines.
I'll fix the i386, ia64, and m68k issues and look for trailing
whitespace in all the changed files.
- patrick