Bug 6411

Summary: PowerPC: Extend fpu fenv operations to operate on 64-bit FPSCR
Product: glibc Reporter: Ryan S. Arnold <rsa>
Component: libcAssignee: Ulrich Drepper <drepper.fsp>
Status: RESOLVED FIXED    
Severity: enhancement CC: glibc-bugs
Priority: P1 Flags: fweimer: security-
Version: unspecified   
Target Milestone: 2.9   
Host: powerpc64-linux Target: powerpc64-linux
Build: powerpc64-linux Last reconfirmed: 2008-09-04 15:42:46
Attachments: 64-bit FPSCR enablement patch for Power6.
64-bit FPSCR enablement patch that applies against CVS head 20080804
Patch which uses bits 28-31 of fpu_control_t rather than breaking ABI by expanding fpu_control_t to 64-bits
Implements dynamic check for hwcap decimal feature to preserve the ABI.
Same as previous patch, but with a typo fixed in the ChangeLog.
enable 64-bit FPSCR but don't touch fpu_control.h
Dynamically select correct mtfsf insn based on PPC_FEATURE_HAS_DFP
Respin of the patch with a test case verifying correct behavior.

Description Ryan S. Arnold 2008-04-15 21:13:50 UTC
The included patch provides the following changes to extend the FPSCR
(floating point status and control register) to 64-bits for PowerPC
POWER6 and POWER6x:

o For PowerPC in general it redefines fpu_control_t from an unsigned
long int to an unsigned long long int so that the entire 64-bit FPSCR on
POWER6 can be saved/restored.  On non-POWER6 hardware the high-order
32-bits are simply discarded/ignored for all operations.

  It is necessary to change it for all PowerPC because the loader is
built as the default PowerPC arch (i.e. non-POWER6) and uses the
rtld_global_ro struct which contains an fpu_control_t and therefore must
be able to load a libc built for POWER6 or non-POWER6 (which also uses
the rtld_global_ro struct).

  Any difference in size of fpu_control_t between the loader and libc
will cause libc to access rtld_global_ro struct members at the wrong
offsets from where the structure was populated by the loader.

  The impact is minimal since currently the fpu_control_t is initially
populated from a double anyway and the fpu_control_t type is not used
dynamically in any external function and thus, no symbol versioning is
required.

o Provides a conditional implementation of _FPU_GETCW and _FPU_SETCW
which operate on either the entire 64-bit FPSCR on POWER6[x] or simply
the low-order 32-bits of the fpu_control_t on non-POWER6[x].

o The fegetenv() function will exhibit an undefined high-order word in the
fenv_t.  Likewise we leave the high-order 32-bits 'undefined' in the unsigned
long long int fpu_control_t. 

o Provides granular control over the high order 32 bits of the FPSCR
that aren't reserved on POWER6[x].

o Provides an overridden sysdeps/powerpc/math/test-fpucw that verifies
that the high-order word is saved to the FPSCR and restored using the
_FPU_[GET|SET]CW macros.  It also verifies that there are no deleterious
effects on non-POWER6 systems.

o Provides conditional definitions of the PowerPC fenv helper macros
(fesetenv_register & fegetenv_register) which are used by the fenv
functions to set/get the contents of the FPSCR.

o Provides sysdeps/powerpc/math/test-powerpc-fenv which tests the
high-order word save/restore using the fe[set|get]env_register
functions.

o Replaces the FPSCR restore with a macro which will provide a 64-bit
restore on POWER6 for swapcontext and setcontext.

o Replaces some of the magic number fenv masks in the fenv functions
with a define that better describes what the magic number means.

I successfully executed build and runtime tests (make check) on POWER5
(32-bit FPSCR) as well as POWER6 (64-bit FPSCR).
Comment 1 Ryan S. Arnold 2008-04-15 21:14:26 UTC
Created attachment 2698 [details]
64-bit FPSCR enablement patch for Power6.
Comment 2 Ryan S. Arnold 2008-08-05 15:11:33 UTC
Created attachment 2893 [details]
64-bit FPSCR enablement patch that applies against CVS head 20080804

This patch will apply atop cvs-head.
Comment 3 Ulrich Drepper 2008-08-13 01:33:32 UTC
You're changing an exported type.  That's not possible.  You have to find
another way.  From what I have seen there are only a few bits of the
fpu_control_t used.  There is no reason why this type must exactly correspond to
the values you get from the control register.

If and when more than 32 *useful* bits are used in the word you have to extend
the type and break the ABI.  But this shouldn't be necessary today.

On the formatting: indent all the preprocessor directives correctly (i.e., add
spaces after the initial #).
Comment 4 Ryan S. Arnold 2008-08-13 04:49:55 UTC
(In reply to comment #3)
> You're changing an exported type.  That's not possible.  You have to find
> another way.  From what I have seen there are only a few bits of the
> fpu_control_t used.  There is no reason why this type must exactly correspond to
> the values you get from the control register.

I understand your argument but I can't find any published information which
would indicate that there is anything but a 1:1 correspondence between the
fpu_control_t and the FPSCR's bits.

Is the intention of the fpu_control_t to only allow users to get (and in some
cases set) the values that correspond to the published masks, i.e. just rounding
control masks and some exception status?

If this is the case then I'll just use some bits that correspond to the 32-bit
FPSCR's exception enable bits that are marked 'reserved' in fpu_control.h for
the DRN (Decimal Rounding control bits).

> On the formatting: indent all the preprocessor directives correctly (i.e., add
> spaces after the initial #).

OK.

Thanks for the review.
Comment 5 Ulrich Drepper 2008-08-13 05:33:53 UTC
(In reply to comment #4)
> I understand your argument but I can't find any published information which
> would indicate that there is anything but a 1:1 correspondence between the
> fpu_control_t and the FPSCR's bits.

That's a non-standard header, there is no documentation.  But it's an exported
header.  Nobody is supposed to use the values of this type without the
functions/macros in this header.  So there is no problem with using a different
representation.


> If this is the case then I'll just use some bits that correspond to the 32-bit
> FPSCR's exception enable bits that are marked 'reserved' in fpu_control.h for
> the DRN (Decimal Rounding control bits).

That is the intend of my comment.
Comment 6 Ryan S. Arnold 2008-08-15 03:16:57 UTC
Created attachment 2909 [details]
Patch which uses bits 28-31 of fpu_control_t rather than breaking ABI by expanding fpu_control_t to 64-bits

The attached patch uses bits 28-31 of the fpu_control_t type for the DRN
(really bit-31 is reserved for future DRN usage per ISA 2.05).	The macros
[sg]et the Decimal Rounding direction field to/from the FPSCR from/to the  high
4-bits of the fpu_control_t.  This patch doesn't break the ABI.

The rest of the patch is the same as previous patches though I've now extended
the POWER6 test-fpucw test to verify that the correct bits are showing up in
the 64-bit FPSCR.
Comment 7 Ulrich Drepper 2008-08-17 03:49:17 UTC
(In reply to comment #6)
> Patch which uses bits 28-31 of fpu_control_t rather than breaking ABI by
> expanding fpu_control_t to 64-bits

Why is it necessary to have arch-dependent binaries for this?  It's really a
single instruction.  Why not make this a dynamic decision based on the CPU in use?


I told you before the formatting is off.  Indent the preprocessor directives.

Also, the attribution in the initial comment is

   Contributed by Name <email>.

Furthermore, the ChangeLog always start capitalized and with a full stop.

Finally, change the state from WAITING when you add a reply.
Comment 8 Ryan S. Arnold 2008-08-18 22:04:56 UTC
(In reply to comment #7)
> Why is it necessary to have arch-dependent binaries for this?  It's really a
> single instruction.  Why not make this a dynamic decision based on the CPU in use?

Yes it is a single instruction where the new form takes an additional parameter
to indicate the length of the FPSCR to operate upon.  The content of the FPSCR
has always been loaded/stored from a double, but until ISA 2.05 the actual FPSCR
was only 32-bits.

The _ARCH_PWR6_ define is used by GCC to indicate ISA 2.05 conformance.  This
will evaluate as 'true' on a POWER7 as well, and any future architecture that is
ISA 2.05 conforming.  The 64-bit FPSCR is not an optional part of ISA 2.05.  We
don't envision another change in the FPSCR's size for quite some time since
there are 27 available reserve bits for future ISA additions.

So it seems an unnecessary burden to make a dynamic check at runtime for
something that won't change for a very long time.

> I told you before the formatting is off.  Indent the preprocessor directives.

I had two different formats of indentation going on so I'm hoping that you mean:
#ifdef FOO
# define BAR
# ifdef FAZ
#  #define BAZ
# endif
#endif

> Also, the attribution in the initial comment is
> 
>    Contributed by Name <email>.

Okay.

> Furthermore, the ChangeLog always start capitalized and with a full stop.

Yes.

> Finally, change the state from WAITING when you add a reply.

Okay, I'll document this on the wiki under bugzilla policies.

I'll have a patch ready soon.
Comment 9 Ulrich Drepper 2008-08-19 00:38:57 UTC
(In reply to comment #8)
> So it seems an unnecessary burden to make a dynamic check at runtime for
> something that won't change for a very long time.

You miss the point.  We have to get back to having only one DSO.  We'll have
some technologies to handle widely diverging functions but for cases like this,
which are trivial to resolve, we should switch dynamically.

> #ifdef FOO
> # define BAR
> # ifdef FAZ
> #  #define BAZ
> # endif
> #endif

That's the correct format.
Comment 10 Ryan S. Arnold 2008-08-19 16:15:55 UTC
(In reply to comment #9)
> You miss the point.  We have to get back to having only one DSO.  We'll have
> some technologies to handle widely diverging functions but for cases like this,
> which are trivial to resolve, we should switch dynamically.

Point taken.  Since there is no public interface to the auxv hwcap information
I'll have to take the following steps in order to provide the dynamic feature
check to a macro that is available to userspace applications.

1.) Export a boolean symbol for runtime querying of the FPSCR feature
availability, e.g. something like __ppc_feature_dfp_available.

2.) In __libc_start_main search the auxv hwcap for the specific feature, i.e.
PPC_FEATURE_HAS_DFP, and set the symbol __ppc_feature_dfp_available.

3.) In fpu_control.h __[SG]ET_FPUCW access the boolean symbol for feature
checks.  This way applications which #include fpu_control.h and use said macros
are able gather the feature availability.

The other methods for gathering the hwcap information from userspace
applications are expensive at best (map and query /proc/self/auxv) and
unreliable at worst (read off the end of the ENV variable array or have access
to the applications argument variables).

Implication: Publish feature availability permanently, i.e. ABI addition.

Benefit: The expense of the feature check is mitigated to application startup.

Alternative: Use the above mentioned method but rather than exporting the symbol
allow the userspace application to access an 'attribute_hidden' symbol if that
is possible.

Ulrich what direction do you prefer?  Have I missed another alternative?
Comment 11 Ryan S. Arnold 2008-08-21 17:20:11 UTC
Created attachment 2917 [details]
Implements dynamic check for hwcap decimal feature to preserve the ABI.

The attached patch implements a dynamic check for PPC_FEATURE_HAS_DFP in the
_[GS]ET_FPUCW macros and only if it finds it will it operate on the 64-bit
FPSCR.

This method necessitates that some of the hwcap information be exposed.  When
libc is loaded it will query the hwcap for the dfp feature bit and set an
exported variable.  Users who include fpu_control and use _[GS]ET_FPUCW will
have these macros inlined into their code and will therefore reference the libc
provided symbol.
Comment 12 Ryan S. Arnold 2008-08-21 17:26:48 UTC
Created attachment 2918 [details]
Same as previous patch, but with a typo fixed in the ChangeLog.

(In reply to comment #11)

> The attached patch implements a dynamic check for PPC_FEATURE_HAS_DFP in the
> _[GS]ET_FPUCW macros and only if it finds it will it operate on the 64-bit
> FPSCR.

Whoops, I meant: _FPU_[GS]ETCW

I fixed a typo in the ChangeLog of the patch as well.
Comment 13 Ulrich Drepper 2008-09-01 22:06:25 UTC
(In reply to comment #10)
> 1.) Export a boolean symbol for runtime querying of the FPSCR feature
> availability, e.g. something like __ppc_feature_dfp_available.

We never export variables.  Always export functions.  And is this case it should
be a function to do the whole work and not just to query which instruction to use.


> 2.) In __libc_start_main search the auxv hwcap for the specific feature, i.e.
> PPC_FEATURE_HAS_DFP, and set the symbol __ppc_feature_dfp_available.

This information only has to be passed from ld.so into libm or libc (see above).
 The data can be strored in rtld_globcal_ro.
Comment 14 Ryan S. Arnold 2008-09-04 15:41:50 UTC
Created attachment 2938 [details]
enable 64-bit FPSCR but don't touch fpu_control.h

Hi Ulrich,

After talking with Steve Munroe we're concerned with how previous users may be
using the control word vs. how they should be.	They should only be relying on
bits that have masks provided by fpu_control.h but our feeling is that they're
probably relying on a 1:1 correspondence with the bits in the 32-bit FPSCR.

With this in mind I think we should probably leave the _FPU_[GS]ETCW macros as
they currently are and direct users to get the high 32-bits of the 64-bit FPSCR
in another way.  This way we don't break any existing applications which
stupidly rely on 1:1 mapping of FPSCR to fpu_control_t.

The rest of the patch still stands. I've respun the patch to remove
modification of fpu_control.h which means I can eliminate the new tests as
well.

In my standalone DFP library I'll create a wrapper around fpu_control.h that
does an include_next after the following check:

#ifdef #ifdef __STDC_WANT_DEC_FP__
# warning "Please use the fe_dec_getround() and fe_dec_setround() functions to
query the decimal rounding mode from the FPSCR."
#endif

This would then cover the bases for soft-decimal rounding mode as well.
Comment 15 Ryan S. Arnold 2008-09-12 21:58:23 UTC
Carlos Eduardo Seo is going submitting his VSX ucontext changes soon.  I'm going
to regenerate a patch set predicated on that patch set.  In my new patches I'm
going to have RESTORE_FPSCR dynamically check the hwcap for DFP support and use
the 32-bit or 64-bit FPSCR restore insn dynamically.
Comment 16 Ryan S. Arnold 2008-10-17 23:55:10 UTC
Created attachment 3005 [details]
Dynamically select correct mtfsf insn based on PPC_FEATURE_HAS_DFP

This patch will dynamically select the corerct mtfsf insn based on
PPC_FEATURE_HAS_DFP.
Comment 17 Ryan S. Arnold 2008-11-11 14:51:45 UTC
Ping.

I believe that this patch is ready for review and check-in.
Comment 18 Ryan S. Arnold 2008-11-13 21:06:25 UTC
Created attachment 3061 [details]
Respin of the patch with a test case verifying correct behavior.

This patch has an added test case and a hwcap accessor fix.

The test case verifies the behavior of [set|get|swap]context in the following
scenarios:

power5 -m32 --enable-kernel=2.4.21 [no-altivec, 32-bit FPSCR]
power5 -m64 --enable-kernel=2.4.21 [no-altivec, 32-bit FPSCR]
power5 -m32 --enable-kernel=2.6.16 [no-altivec, 32-bit FPSCR]
power5 -m64 --enable-kernel=2.6.16 [no-altivec, 32-bit FPSCR]

power6 -m32 --enable-kernel=2.4.21 [altivec, 64-bit FPSCR]
power6 -m64 --enable-kernel=2.4.21 [altivec, 64-bit FPSCR]
power6 -m32 --enable-kernel=2.6.16 [altivec, 64-bit FPSCR]
power6 -m64 --enable-kernel=2.6.16 [altivec, 64-bit FPSCR]


power6 -mcpu=power6 -m32 --enable-kernel=2.4.21 [altivec, 64-bit FPSCR]
power6 -mcpu=power6 -m64 --enable-kernel=2.4.21 [altivec, 64-bit FPSCR]
power6 -mcpu=power6 -m32 --enable-kernel=2.6.16 [altivec, 64-bit FPSCR]
power6 -mcpu=power6 -m64 --enable-kernel=2.6.16 [altivec, 64-bit FPSCR]

So I've verified with the test case that the given scenarios pass.

2.4.21 is prior to the swapcontext syscall so it tests the
[get|set|swap]context-common code path.

2.6.16 contains the swapcontext syscall so it test the syscall path for said
function invocations.

The -mcpu=power6 switch indicates that this GLIBC is built for ISA 2.05
compliance at a minimum, whereby a dynamic check for the 64-bit FPSCR doesn't
need to be made since ISA 2.05 and later have a 64-bit FPSCR by default.

This testcase can be added to in the future for testing any special register
preservation across context calls.

Please consider for check-in.  Everything is behaving properly as verified by
the patch and the build passes make check.
Comment 19 Ulrich Drepper 2008-11-17 02:50:25 UTC
I've applied the last patch.