This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Only set rounding mode in SET_RESTORE_ROUND_53BIT
- From: Siddhesh Poyarekar <siddhesh at redhat dot com>
- To: libc-alpha at sourceware dot org
- Cc: Richard Henderson <rth at twiddle dot net>, carlos at redhat dot com
- Date: Thu, 30 May 2013 18:19:08 +0530
- Subject: Re: [PATCH] Only set rounding mode in SET_RESTORE_ROUND_53BIT
- References: <20130529140319 dot GL2145 at spoyarek dot pnq dot redhat dot com> <20130530085917 dot GO2145 at spoyarek dot pnq dot redhat dot com>
On Thu, May 30, 2013 at 02:29:17PM +0530, Siddhesh Poyarekar wrote:
> On Wed, May 29, 2013 at 07:33:19PM +0530, Siddhesh Poyarekar wrote:
> > I'm trying to understand the code that sets and restores the floating
> > point state to try and eek out some performance from it. I haven't
> > been able to figure out the need for SET_RESTORE_ROUND_53BIT vs
> > SET_RESTORE_ROUND_53 for x86_64 - I can see that it's needed for x87:
> >
> > SET_RESTORE_ROUND (used in exp, log) only sets and restores the
> > rounding mode and does not touch the exception masks. This was
> > holdexcept_setround, i.e. the same implementation as
> > SET_RESTORE_ROUND_53BIT before, but I assume the clearing of exception
> > mask, etc. was removed because it could result in delay in raising
> > floating point exceptions resulting from some operations within these
> > functions if exceptions are masked.
> >
> > SET_RESTORE_ROUND_53BIT (used in sin/cos) sets the rounding mode,
> > exception mask and clears all exception flags. Why can't we simply
> > get away with setting and restoring just the rounding mode like in
> > exp, log, etc? That opens up a good opportunity for optimization
> > where I could avoid most of the set/restore code in the default case.
> > sin and cos run almost twice as fast in some cases.
>
> So to put all this in code, the below patch is what I intend to do.
> Instead of modifying exception flags, SET_RESTORE_ROUND_53BIT only
> sets rounding mode and floating point mode (in x87) and restores it on
> exit from the lexical block. I tested x86_64 as well as i386 with
> --disable-sse to verify that this does not cause any regressions in
> the testsuite. Would this change be correct?
Sorry, I keep replying to myself, but I forgot to test this for
possible benchmark improvements. Here are the results for sin, cos
and tan, which was the functions affected by this change:
Before:
cos(): ITERS:3.1054e+08: TOTAL:28904.2Mcy, MAX:3916.53cy, MIN:69.603cy, 10743.8 calls/Mcy
sin(): ITERS:2.86419e+08: TOTAL:28911.1Mcy, MAX:333.326cy, MIN:46.76cy, 9906.88 calls/Mcy
tan(): ITERS:6.7984e+07: TOTAL:28908.9Mcy, MAX:3693.01cy, MIN:420.518cy, 2351.66 calls/Mcy
After:
cos(): ITERS:4.8486e+08: TOTAL:28890.5Mcy, MAX:188.859cy, MIN:57.562cy, 16782.7 calls/Mcy
sin(): ITERS:4.06791e+08: TOTAL:28904.6Mcy, MAX:210.524cy, MIN:47.592cy, 14073.6 calls/Mcy
tan(): ITERS:7.3346e+07: TOTAL:28907.1Mcy, MAX:800.115cy, MIN:388.597cy, 2537.3 calls/Mcy
So that amounts to ~56% improvmeent for cos, 42% for sin and 7.8% for
tan.
So there is a definite performance advantage for just this patch as well.
Siddhesh