This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.


On 12/31/2017 7:36 PM, Joseph Myers wrote:
On Fri, 29 Dec 2017, Patrick McGehearty wrote:

Version 9 of proposed patch.

Replaced get_rounding_mode and libc_fesetround() with SET_RESTORE_ROUND
to avoid Intel rounding mode issue which showed as test failures in
tgamma_upward. Adds noticable overhead for platforms that incur
significant cost when rounding mode is already FE_TONEAREST. Added
SET_RESTORE_ROUND to two more cases which resolved 1 ulp rounding
errors for cexp().
Is the "5x" in the subject still correct?  (The subject line of a patch
should be suitable for the summary line of the commit message, so must
always reflect the current patch version accurately.  Likewise, the text
of the patch submission, minus anything about changes relative to a
previous patch version, must be suitable for the longer part of the commit
message.)
I can't be sure the "make bench" test gives representative data since many of the test values seem at the extreme ranges and may trigger the "extremely precise and slow" paths in the old code. Those tests still show very large improves (mean: old 5473 vs new 418). I'll reconfirm with my own "typical values" tests.  I used 5x as a value that I considered very conservative originally to allow for platform to platform variations and for differences
with different ideas of "typical values".

Expanded the scaling table from 64 entries to 128 entries, renaming
invln2_64 to invln2_256 as well as ln2_64hi and ln2_64lo to ln2_256hi
and ln2_256lo. That reduces the 1 ulp error rate per 1000 values from
1.6 to 0.6.
I think this expansion - and corresponding increase in cache usage - is a
bad idea.  Table size should be kept down, consistent with suitable
accuracy of e.g. < 1 ulp in this case.

The table size verses error rate tradeoff:
 32 entries (512 bytes) 2.95
 64 entries (1K bytes)  1.62
128 entries (2K bytes)  0.98
256 entries (4K bytes)  0.65
That is the number of 1 ulp diffs from the old version per 1000 test values.
As far as I can determine, the old code had an error rate 0.0 values per 1000.
I was going for more accuracy to reduce the chances for
pushback from the field once this change starts getting used
more widely.

With typical L3 caches now measured in Mbytes/thread
and L2 caches at least 64Kbytes/thread if not 256Kbytes/thread
having modestly larger tables is a reasonable tradeoff,
especially since we are trading so much performance improvement
but giving up some accuracy. I retained the 64 and 128 entry
versions, so I can switch out the table size easily.


This patch version also appears to be missing the fixups I made as part of
committing the previous patch.  See
<https://sourceware.org/ml/libc-alpha/2017-12/msg00649.html>.  In addition
to the extra slowexp.c removals I noted there, I also needed to remove
trailing whitespace from a few lines - you should make sure your patch
does not include any additions of lines with trailing spaces, as the
commit hooks will reject any push that adds such lines.

I'll fix the i386, ia64, and m68k issues and look for trailing whitespace in all the changed files.

- patrick


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]