This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] [powerpc] SET_RESTORE_ROUND improvements


On 8/5/19 11:39 AM, Paul E Murphy wrote:
> On 8/2/19 9:54 PM, Paul A. Clarke wrote:
>> SET_RESTORE_ROUND uses libc_feholdsetround_ppc_ctx and
>> libc_feresetround_ppc_ctx to bracket a block of code where the floating point
>> rounding mode must be set to a certain value.
>>
>> For the *prologue*, libc_feholdsetround_ppc_ctx is used and performs:
>> 1. Read/save FPSCR.
>> 2. Create new value for FPSCR with new rounding mode and enables cleared.
>> 3. If new value is different than current value,
>>     a. If transitioning from a state where some exceptions enabled,
>>        enter "ignore exceptions / non-stop" mode.
>>     b. Write new value to FPSCR.
>>     c. Put a mark on the wall indicating the FPSCR was changed.
>>
>> (1) uses the 'mffs' instruction.  On POWER9, the lighter weight 'mffsl'
>> instruction can be used, but it doesn't return all of the bits in the FPSCR.
>> fegetenv_status uses 'mffsl' on POWER9, 'mffs' otherwise, and can thus be
>> used instead of fegetenv_register.
>> (3b) uses 'mtfsf 0b11111111' to write the entire FPSCR, so it must
>> instead use 'mtfsf 0b00000011' to write just the enables and the mode,
>> because some of the rest of the bits are not valid if 'mffsl' was used.
>> fesetenv_mode uses 'mtfsf 0b00000011' on POWER9, 'mtfsf 0b11111111'
>> otherwise.
>>
>> For the *epilogue*, libc_feresetround_ppc_ctx checks the mark on the wall, then
>> calls libc_feresetround_ppc, which just calls __libc_femergeenv_ppc with
>> parameters such that it performs:
>> 1. Retreive saved value of FPSCR, saved in prologue above.
>> 2. Read FPSCR.
>> 3. Create new value of FPSCR where:
>>     - Summary bits and exception indicators = current OR saved.
>>     - Rounding mode and enables = saved.
>>     - Status bits = current.
>> 4. If transitioning from some exceptions enabled to none,
>>     enter "ignore exceptions / non-stop" mode.
>> 5. If transitioning from no exceptions enabled to some,
>>     enter "catch exceptions" mode.
>> 6. Write new value to FPSCR.
>>
>> The summary bits are hardwired to the exception indicators, so there is no
>> need to restore any saved summary bits.
>> The exception indicator bits, which are sticky and remain set unless
>> explicitly cleared, would only need to be restored if the code block
>> might explicitly clear any of them.  This is certainly not expected.
>>
>> So, the only bits that need to be restored are the enables and the mode.
>> If it is the case that only those bits are to be restored, there is no need to
>> read the FPSCR.  Steps (2) and (3) are unnecessary, and step (6) only needs to
>> write the bits being restored.
>>
>> We know we are transitioning out of "ignore exceptions" mode, so step (4) is
>> unnecessary, and in step (6), we only need to check the state we are
>> entering.
>>
>> 2019-08-02  Paul A. Clarke  <pc@us.ibm.com>
>>
>>     * sysdeps/powerpc/fpu/fenv_private.h (libc_feholdsetround_ppc_ctx):
>>     Utilize lightweight FPSCR read if possible, set fewer FPSCR bits
>>     if possible.
>>     (libc_feresetround_ppc):  Replace call to __libc_femergeenv_ppc
>>     with simpler required steps, set fewer FPSCR bits if possible.
>> ---
>>   sysdeps/powerpc/fpu/fenv_private.h | 18 ++++++++++++++----
>>   1 file changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/sysdeps/powerpc/fpu/fenv_private.h b/sysdeps/powerpc/fpu/fenv_private.h
>> index 8c126f9..397df3f 100644
>> --- a/sysdeps/powerpc/fpu/fenv_private.h
>> +++ b/sysdeps/powerpc/fpu/fenv_private.h
>> @@ -132,7 +132,17 @@ libc_fesetenv_ppc (const fenv_t *envp)
>>   static __always_inline void
>>   libc_feresetround_ppc (fenv_t *envp)
>>   {
>> -  __libc_femergeenv_ppc (envp, _FPU_MASK_TRAPS_RN, _FPU_MASK_FRAC_INEX_RET_CC);
>> +  fenv_union_t new = { .fenv = *envp };
>> +
>> +  /* If the old env has no enabled exceptions and the new env has any enabled
>> +     exceptions, then unmask SIGFPE in the MSR FE0/FE1 bits.  This will put the
>> +     hardware into "precise mode" and may cause the FPU to run slower on some
>> +     hardware.  */
>> +  if ((new.l & _FPU_ALL_TRAPS) != 0)
>> +    (void) __fe_nomask_env_priv ();
>> +
>> +  /* Atomically enable and raise (if appropriate) exceptions set in `new'.  */
>> +  fesetenv_mode (new.fenv);
>>   }
>>     static __always_inline int
>> @@ -176,16 +186,16 @@ libc_feholdsetround_ppc_ctx (struct rm_ctx *ctx, int r)
> 
> As a sanity check, my understanding is that this function is used in two ways:
> 
>   1. Stash rounding/exception bits for the cases noted above
>   2. Stash sticky bits and the above These are the _NOEX variants of the SET_RESTORE macros.
> 
> Both cases use this function to stash some or all of the fenv state. However, the second case quietly discards any exceptions while the block is executed.
> 
> For case 2, would this now clear all sticky bits when restoring the old environment? I'm not sure if I am missing a layer of macro indirection.

No, I think you found a bug in the patch.  I'll work to fix this and send a v2 when ready.

Thanks for the review!
PC


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]