[PATCH] [powerpc] fe{en,dis}ableexcept, fesetmode: optimize FPSCR accesses
Paul Clarke
pc@us.ibm.com
Fri Aug 2 04:01:00 GMT 2019
On 8/1/19 7:04 PM, Paul E Murphy wrote:
> On 8/1/19 2:36 PM, Paul A. Clarke wrote:
>> Since fe{en,dis}ableexcept() and fesetmode() read-modify-write just the
>> "mode" (exception enable and rounding mode) bits of the Floating Point Status
>> Control Register (FPSCR), the lighter weight 'mffsl' instruction can be used
>> to read the FPSCR (enables and rounding mode), and 'mtfsf 0b00000011' can be
>> used to write just those bits back to the FPSCR. The net is better performance.
>>
>> In addition, fe{en,dis}ableexcept() read the FPSCR again after writing it, or
>> they determine that it doesn't need to be written because it is not changing.
>> In either case, the local variable holds the current values of the enable
>> bits in the FPSCR. This local variable can be used instead of again reading
>> the FPSCR.
>>
>> Also, that value of the FPSCR which is read the second time is validated
>> against the requested enables. Since the write can't fail, this validation
>> step is unnecessary, and can be removed.
>
> The write to FPSCR may not fail, but it may not change all the
> requested bits. e.g fedisableexcept(FE_INVALID_SQRT). Should the
> existing behavior be preserved?
That's an interesting question. The current unpatched code will apparently accept all possible exception bits, enable (or disable) the ones it can, and return -1 if the set (or unset) bits don't match the requested bits.
For the INVALID bits, these are transformed into FE_INVALID iff all of the defined INVALID bits are set.
So, the case of "feenableexcept (FE_OVERFLOW|FE_INVALID_SQRT)" would result in FE_OVERFLOW being enabled, yet the function returns failure because FE_INVALID_SQRT is not. Is the side effect of enabling FE_OVERFLOW but returning failure appropriate? Should the enablement be reverted in that case (all-or-nothing enabled) to avoid side-effects on failure?
Are you just asking about the whether the requested bits _can_ be set? That could be determined without reading (or re-reading) the FPSCR. Should that be done up-front, before attempting to set the FPSCR at all, and returning failure if so?
Is there a crisp definition of the expected operation of these functions? The man page is wanting.
>> Finally, convert the local macros in fesetmode.c to more generally useful
>> macros in fenv_libc.h.
>>
>> 2019-08-01 Paul A. Clarke <pc@us.ibm.com>
>>
>> Â Â Â Â * sysdeps/powerpc/fpu/fenv_libc.h (fesetenv_mode): New.
>> Â Â Â Â (FPSCR_FPRF_MASK): New. (FPSCR_STATUS_MASK): New.
>> Â Â Â Â * sysdeps/powerpc/fpu/feenablxcpt.c (feenableexcept): Use lighter-
>> Â Â Â Â weight access to FPSCR; remove unnecessary second FPSCR read and
>> Â Â Â Â validate.
>> Â Â Â Â * sysdeps/powerpc/fpu/fedisblxcpt.c (fedisableexcept): Likewise.
>> Â Â Â Â * sysdeps/powerpc/fpu/fesetmode.c (fesetmode): Use lighter-weight
>> Â Â Â Â access to FPSCR; Use macros in fenv_libc.h in favor of local.
>> ---
>> Â sysdeps/powerpc/fpu/fedisblxcpt.c |Â 8 +++-----
>> Â sysdeps/powerpc/fpu/feenablxcpt.c |Â 9 +++------
>>  sysdeps/powerpc/fpu/fenv_libc.h  | 10 +++++++++-
>>  sysdeps/powerpc/fpu/fesetmode.c  | 15 +++++----------
>> Â 4 files changed, 20 insertions(+), 22 deletions(-)
>>
>> diff --git a/sysdeps/powerpc/fpu/fedisblxcpt.c b/sysdeps/powerpc/fpu/fedisblxcpt.c
>> index 5cc8799..fa666b0 100644
>> --- a/sysdeps/powerpc/fpu/fedisblxcpt.c
>> +++ b/sysdeps/powerpc/fpu/fedisblxcpt.c
>> @@ -26,7 +26,7 @@ fedisableexcept (int excepts)
>> Â Â Â int result, new;
>>
>>    /* Get current exception mask to return. */
>> -Â fe.fenv = curr.fenv = fegetenv_register ();
>> +Â fe.fenv = curr.fenv = fegetenv_status ();
>> Â Â Â result = fenv_reg_to_exceptions (fe.l);
>>
>> Â Â Â if ((excepts & FE_ALL_INVALID) == FE_ALL_INVALID)
>> @@ -36,13 +36,11 @@ fedisableexcept (int excepts)
>> Â Â Â fe.l &= ~ fenv_exceptions_to_reg (excepts);
>>
>> Â Â Â if (fe.l != curr.l)
>> -Â Â Â fesetenv_register (fe.fenv);
>> +Â Â Â fesetenv_mode (fe.fenv);
>>
>> -Â new = __fegetexcept ();
>> +Â new = fenv_reg_to_exceptions (fe.l);
>> Â Â Â if (new == 0 && result != 0)
>> Â Â Â Â Â (void)__fe_mask_env ();
>>
>> -Â if ((new & excepts) != 0)
>> -Â Â Â result = -1;
>> Â Â Â return result;
>> Â }
>> diff --git a/sysdeps/powerpc/fpu/feenablxcpt.c b/sysdeps/powerpc/fpu/feenablxcpt.c
>> index 3b64398..c658290 100644
>> --- a/sysdeps/powerpc/fpu/feenablxcpt.c
>> +++ b/sysdeps/powerpc/fpu/feenablxcpt.c
>> @@ -26,7 +26,7 @@ feenableexcept (int excepts)
>> Â Â Â int result, new;
>>
>>    /* Get current exception mask to return. */
>> -Â fe.fenv = curr.fenv = fegetenv_register ();
>> +Â fe.fenv = curr.fenv = fegetenv_status ();
>> Â Â Â result = fenv_reg_to_exceptions (fe.l);
>>
>> Â Â Â if ((excepts & FE_ALL_INVALID) == FE_ALL_INVALID)
>> @@ -36,14 +36,11 @@ feenableexcept (int excepts)
>> Â Â Â fe.l |= fenv_exceptions_to_reg (excepts);
>>
>> Â Â Â if (fe.l != curr.l)
>> -Â Â Â fesetenv_register (fe.fenv);
>> +Â Â Â fesetenv_mode (fe.fenv);
>>
>> -Â new = __fegetexcept ();
>> +Â new = fenv_reg_to_exceptions (fe.l);
>> Â Â Â if (new != 0 && result == 0)
>> Â Â Â Â Â (void) __fe_nomask_env_priv ();
>>
>> -Â if ((new & excepts) != excepts)
>> -Â Â Â result = -1;
>> -
>> Â Â Â return result;
>> Â }
>> diff --git a/sysdeps/powerpc/fpu/fenv_libc.h b/sysdeps/powerpc/fpu/fenv_libc.h
>> index 853239f..8ba4832 100644
>> --- a/sysdeps/powerpc/fpu/fenv_libc.h
>> +++ b/sysdeps/powerpc/fpu/fenv_libc.h
>> @@ -70,6 +70,11 @@ extern const fenv_t *__fe_mask_env (void) attribute_hidden;
>> Â Â Â Â Â Â Â Â Â __builtin_mtfsf (0xff, d); \
>> Â Â Â Â Â } while(0)
>>
>> +/* Set the last 2 nibbles of the FPSCR, which contain the
>> +Â Â exception enables and the rounding mode.
>> +  'fegetenv_status' retrieves these bits by reading the FPSCR. */
>> +#define fesetenv_mode(env) __builtin_mtfsf (0b00000011, (env));
>> +
>> Â /* This very handy macro:
>> Â Â Â Â - Sets the rounding mode to 'round to nearest';
>> Â Â Â Â - Sets the processor into IEEE mode; and
>> @@ -206,8 +211,11 @@ enum {
>> Â Â Â (FPSCR_VE_MASK|FPSCR_OE_MASK|FPSCR_UE_MASK|FPSCR_ZE_MASK|FPSCR_XE_MASK)
>> Â #define FPSCR_BASIC_EXCEPTIONS_MASK \
>> Â Â Â (FPSCR_VX_MASK|FPSCR_OX_MASK|FPSCR_UX_MASK|FPSCR_ZX_MASK|FPSCR_XX_MASK)
>> -
>> +#define FPSCR_FPRF_MASK \
>> +Â (FPSCR_FPRF_C_MASK|FPSCR_FPRF_FL_MASK|FPSCR_FPRF_FG_MASK| \
>> +Â Â FPSCR_FPRF_FE_MASK|FPSCR_FPRF_FU_MASK)
>> Â #define FPSCR_CONTROL_MASK (FPSCR_ENABLES_MASK|FPSCR_NI_MASK|FPSCR_RN_MASK)
>> +#define FPSCR_STATUS_MASK (FPSCR_FR_MASK|FPSCR_FI_MASK|FPSCR_FPRF_MASK)
>>
>> Â /* The bits in the FENV(1) ABI for exceptions correspond one-to-one with bits
>> Â Â Â Â in the FPSCR, albeit shifted to different but corresponding locations.
>> diff --git a/sysdeps/powerpc/fpu/fesetmode.c b/sysdeps/powerpc/fpu/fesetmode.c
>> index 4f4f71a..e92559b 100644
>> --- a/sysdeps/powerpc/fpu/fesetmode.c
>> +++ b/sysdeps/powerpc/fpu/fesetmode.c
>> @@ -19,11 +19,6 @@
>> Â #include <fenv_libc.h>
>> Â #include <fpu_control.h>
>>
>> -#define _FPU_MASK_ALL (_FPU_MASK_ZM | _FPU_MASK_OM | _FPU_MASK_UMÂ Â Â \
>> -Â Â Â Â Â Â Â Â Â Â Â Â Â Â | _FPU_MASK_XM | _FPU_MASK_IM)
>> -
>> -#define FPU_STATUS 0xbffff700ULL
>> -
>> Â int
>> Â fesetmode (const femode_t *modep)
>> Â {
>> @@ -32,18 +27,18 @@ fesetmode (const femode_t *modep)
>>    /* Logic regarding enabled exceptions as in fesetenv. */
>>
>> Â Â Â new.fenv = *modep;
>> -Â old.fenv = fegetenv_register ();
>> -Â new.l = (new.l & ~FPU_STATUS) | (old.l & FPU_STATUS);
>> +Â old.fenv = fegetenv_status ();
>> +Â new.l = (new.l & ~FPSCR_STATUS_MASK) | (old.l & FPSCR_STATUS_MASK);
>>
>> Â Â Â if (old.l == new.l)
>> Â Â Â Â Â return 0;
>>
>> -Â if ((old.l & _FPU_MASK_ALL) == 0 && (new.l & _FPU_MASK_ALL) != 0)
>> +Â if ((old.l & FPSCR_ENABLES_MASK) == 0 && (new.l & FPSCR_ENABLES_MASK) != 0)
>> Â Â Â Â Â (void) __fe_nomask_env_priv ();
>>
>> -Â if ((old.l & _FPU_MASK_ALL) != 0 && (new.l & _FPU_MASK_ALL) == 0)
>> +Â if ((old.l & FPSCR_ENABLES_MASK) != 0 && (new.l & FPSCR_ENABLES_MASK) == 0)
>> Â Â Â Â Â (void) __fe_mask_env ();
>>
>> -Â fesetenv_register (new.fenv);
>> +Â fesetenv_mode (new.fenv);
>> Â Â Â return 0;
>> Â }
PC
More information about the Libc-alpha
mailing list