This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Only set rounding mode in SET_RESTORE_ROUND_53BIT


On Thu, May 30, 2013 at 02:29:17PM +0530, Siddhesh Poyarekar wrote:
> On Wed, May 29, 2013 at 07:33:19PM +0530, Siddhesh Poyarekar wrote:
> > I'm trying to understand the code that sets and restores the floating
> > point state to try and eek out some performance from it.  I haven't
> > been able to figure out the need for SET_RESTORE_ROUND_53BIT vs
> > SET_RESTORE_ROUND_53 for x86_64 - I can see that it's needed for x87:
> > 
> > SET_RESTORE_ROUND (used in exp, log) only sets and restores the
> > rounding mode and does not touch the exception masks.  This was
> > holdexcept_setround, i.e. the same implementation as
> > SET_RESTORE_ROUND_53BIT before, but I assume the clearing of exception
> > mask, etc. was removed because it could result in delay in raising
> > floating point exceptions resulting from some operations within these
> > functions if exceptions are masked.
> > 
> > SET_RESTORE_ROUND_53BIT (used in sin/cos) sets the rounding mode,
> > exception mask and clears all exception flags.  Why can't we simply
> > get away with setting and restoring just the rounding mode like in
> > exp, log, etc?  That opens up a good opportunity for optimization
> > where I could avoid most of the set/restore code in the default case.
> > sin and cos run almost twice as fast in some cases.
> 
> So to put all this in code, the below patch is what I intend to do.
> Instead of modifying exception flags, SET_RESTORE_ROUND_53BIT only
> sets rounding mode and floating point mode (in x87) and restores it on
> exit from the lexical block.  I tested x86_64 as well as i386 with
> --disable-sse to verify that this does not cause any regressions in
> the testsuite.  Would this change be correct?

Sorry, I keep replying to myself, but I forgot to test this for
possible benchmark improvements.  Here are the results for sin, cos
and tan, which was the functions affected by this change:

Before:

cos(): ITERS:3.1054e+08: TOTAL:28904.2Mcy, MAX:3916.53cy, MIN:69.603cy, 10743.8 calls/Mcy
sin(): ITERS:2.86419e+08: TOTAL:28911.1Mcy, MAX:333.326cy, MIN:46.76cy, 9906.88 calls/Mcy
tan(): ITERS:6.7984e+07: TOTAL:28908.9Mcy, MAX:3693.01cy, MIN:420.518cy, 2351.66 calls/Mcy

After:

cos(): ITERS:4.8486e+08: TOTAL:28890.5Mcy, MAX:188.859cy, MIN:57.562cy, 16782.7 calls/Mcy
sin(): ITERS:4.06791e+08: TOTAL:28904.6Mcy, MAX:210.524cy, MIN:47.592cy, 14073.6 calls/Mcy
tan(): ITERS:7.3346e+07: TOTAL:28907.1Mcy, MAX:800.115cy, MIN:388.597cy, 2537.3 calls/Mcy

So that amounts to ~56% improvmeent for cos, 42% for sin and 7.8% for
tan.

So there is a definite performance advantage for just this patch as well.

Siddhesh


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]