The included patch provides the following changes to extend the FPSCR (floating point status and control register) to 64-bits for PowerPC POWER6 and POWER6x: o For PowerPC in general it redefines fpu_control_t from an unsigned long int to an unsigned long long int so that the entire 64-bit FPSCR on POWER6 can be saved/restored. On non-POWER6 hardware the high-order 32-bits are simply discarded/ignored for all operations. It is necessary to change it for all PowerPC because the loader is built as the default PowerPC arch (i.e. non-POWER6) and uses the rtld_global_ro struct which contains an fpu_control_t and therefore must be able to load a libc built for POWER6 or non-POWER6 (which also uses the rtld_global_ro struct). Any difference in size of fpu_control_t between the loader and libc will cause libc to access rtld_global_ro struct members at the wrong offsets from where the structure was populated by the loader. The impact is minimal since currently the fpu_control_t is initially populated from a double anyway and the fpu_control_t type is not used dynamically in any external function and thus, no symbol versioning is required. o Provides a conditional implementation of _FPU_GETCW and _FPU_SETCW which operate on either the entire 64-bit FPSCR on POWER6[x] or simply the low-order 32-bits of the fpu_control_t on non-POWER6[x]. o The fegetenv() function will exhibit an undefined high-order word in the fenv_t. Likewise we leave the high-order 32-bits 'undefined' in the unsigned long long int fpu_control_t. o Provides granular control over the high order 32 bits of the FPSCR that aren't reserved on POWER6[x]. o Provides an overridden sysdeps/powerpc/math/test-fpucw that verifies that the high-order word is saved to the FPSCR and restored using the _FPU_[GET|SET]CW macros. It also verifies that there are no deleterious effects on non-POWER6 systems. o Provides conditional definitions of the PowerPC fenv helper macros (fesetenv_register & fegetenv_register) which are used by the fenv functions to set/get the contents of the FPSCR. o Provides sysdeps/powerpc/math/test-powerpc-fenv which tests the high-order word save/restore using the fe[set|get]env_register functions. o Replaces the FPSCR restore with a macro which will provide a 64-bit restore on POWER6 for swapcontext and setcontext. o Replaces some of the magic number fenv masks in the fenv functions with a define that better describes what the magic number means. I successfully executed build and runtime tests (make check) on POWER5 (32-bit FPSCR) as well as POWER6 (64-bit FPSCR).
Created attachment 2698 [details] 64-bit FPSCR enablement patch for Power6.
Created attachment 2893 [details] 64-bit FPSCR enablement patch that applies against CVS head 20080804 This patch will apply atop cvs-head.
You're changing an exported type. That's not possible. You have to find another way. From what I have seen there are only a few bits of the fpu_control_t used. There is no reason why this type must exactly correspond to the values you get from the control register. If and when more than 32 *useful* bits are used in the word you have to extend the type and break the ABI. But this shouldn't be necessary today. On the formatting: indent all the preprocessor directives correctly (i.e., add spaces after the initial #).
(In reply to comment #3) > You're changing an exported type. That's not possible. You have to find > another way. From what I have seen there are only a few bits of the > fpu_control_t used. There is no reason why this type must exactly correspond to > the values you get from the control register. I understand your argument but I can't find any published information which would indicate that there is anything but a 1:1 correspondence between the fpu_control_t and the FPSCR's bits. Is the intention of the fpu_control_t to only allow users to get (and in some cases set) the values that correspond to the published masks, i.e. just rounding control masks and some exception status? If this is the case then I'll just use some bits that correspond to the 32-bit FPSCR's exception enable bits that are marked 'reserved' in fpu_control.h for the DRN (Decimal Rounding control bits). > On the formatting: indent all the preprocessor directives correctly (i.e., add > spaces after the initial #). OK. Thanks for the review.
(In reply to comment #4) > I understand your argument but I can't find any published information which > would indicate that there is anything but a 1:1 correspondence between the > fpu_control_t and the FPSCR's bits. That's a non-standard header, there is no documentation. But it's an exported header. Nobody is supposed to use the values of this type without the functions/macros in this header. So there is no problem with using a different representation. > If this is the case then I'll just use some bits that correspond to the 32-bit > FPSCR's exception enable bits that are marked 'reserved' in fpu_control.h for > the DRN (Decimal Rounding control bits). That is the intend of my comment.
Created attachment 2909 [details] Patch which uses bits 28-31 of fpu_control_t rather than breaking ABI by expanding fpu_control_t to 64-bits The attached patch uses bits 28-31 of the fpu_control_t type for the DRN (really bit-31 is reserved for future DRN usage per ISA 2.05). The macros [sg]et the Decimal Rounding direction field to/from the FPSCR from/to the high 4-bits of the fpu_control_t. This patch doesn't break the ABI. The rest of the patch is the same as previous patches though I've now extended the POWER6 test-fpucw test to verify that the correct bits are showing up in the 64-bit FPSCR.
(In reply to comment #6) > Patch which uses bits 28-31 of fpu_control_t rather than breaking ABI by > expanding fpu_control_t to 64-bits Why is it necessary to have arch-dependent binaries for this? It's really a single instruction. Why not make this a dynamic decision based on the CPU in use? I told you before the formatting is off. Indent the preprocessor directives. Also, the attribution in the initial comment is Contributed by Name <email>. Furthermore, the ChangeLog always start capitalized and with a full stop. Finally, change the state from WAITING when you add a reply.
(In reply to comment #7) > Why is it necessary to have arch-dependent binaries for this? It's really a > single instruction. Why not make this a dynamic decision based on the CPU in use? Yes it is a single instruction where the new form takes an additional parameter to indicate the length of the FPSCR to operate upon. The content of the FPSCR has always been loaded/stored from a double, but until ISA 2.05 the actual FPSCR was only 32-bits. The _ARCH_PWR6_ define is used by GCC to indicate ISA 2.05 conformance. This will evaluate as 'true' on a POWER7 as well, and any future architecture that is ISA 2.05 conforming. The 64-bit FPSCR is not an optional part of ISA 2.05. We don't envision another change in the FPSCR's size for quite some time since there are 27 available reserve bits for future ISA additions. So it seems an unnecessary burden to make a dynamic check at runtime for something that won't change for a very long time. > I told you before the formatting is off. Indent the preprocessor directives. I had two different formats of indentation going on so I'm hoping that you mean: #ifdef FOO # define BAR # ifdef FAZ # #define BAZ # endif #endif > Also, the attribution in the initial comment is > > Contributed by Name <email>. Okay. > Furthermore, the ChangeLog always start capitalized and with a full stop. Yes. > Finally, change the state from WAITING when you add a reply. Okay, I'll document this on the wiki under bugzilla policies. I'll have a patch ready soon.
(In reply to comment #8) > So it seems an unnecessary burden to make a dynamic check at runtime for > something that won't change for a very long time. You miss the point. We have to get back to having only one DSO. We'll have some technologies to handle widely diverging functions but for cases like this, which are trivial to resolve, we should switch dynamically. > #ifdef FOO > # define BAR > # ifdef FAZ > # #define BAZ > # endif > #endif That's the correct format.
(In reply to comment #9) > You miss the point. We have to get back to having only one DSO. We'll have > some technologies to handle widely diverging functions but for cases like this, > which are trivial to resolve, we should switch dynamically. Point taken. Since there is no public interface to the auxv hwcap information I'll have to take the following steps in order to provide the dynamic feature check to a macro that is available to userspace applications. 1.) Export a boolean symbol for runtime querying of the FPSCR feature availability, e.g. something like __ppc_feature_dfp_available. 2.) In __libc_start_main search the auxv hwcap for the specific feature, i.e. PPC_FEATURE_HAS_DFP, and set the symbol __ppc_feature_dfp_available. 3.) In fpu_control.h __[SG]ET_FPUCW access the boolean symbol for feature checks. This way applications which #include fpu_control.h and use said macros are able gather the feature availability. The other methods for gathering the hwcap information from userspace applications are expensive at best (map and query /proc/self/auxv) and unreliable at worst (read off the end of the ENV variable array or have access to the applications argument variables). Implication: Publish feature availability permanently, i.e. ABI addition. Benefit: The expense of the feature check is mitigated to application startup. Alternative: Use the above mentioned method but rather than exporting the symbol allow the userspace application to access an 'attribute_hidden' symbol if that is possible. Ulrich what direction do you prefer? Have I missed another alternative?
Created attachment 2917 [details] Implements dynamic check for hwcap decimal feature to preserve the ABI. The attached patch implements a dynamic check for PPC_FEATURE_HAS_DFP in the _[GS]ET_FPUCW macros and only if it finds it will it operate on the 64-bit FPSCR. This method necessitates that some of the hwcap information be exposed. When libc is loaded it will query the hwcap for the dfp feature bit and set an exported variable. Users who include fpu_control and use _[GS]ET_FPUCW will have these macros inlined into their code and will therefore reference the libc provided symbol.
Created attachment 2918 [details] Same as previous patch, but with a typo fixed in the ChangeLog. (In reply to comment #11) > The attached patch implements a dynamic check for PPC_FEATURE_HAS_DFP in the > _[GS]ET_FPUCW macros and only if it finds it will it operate on the 64-bit > FPSCR. Whoops, I meant: _FPU_[GS]ETCW I fixed a typo in the ChangeLog of the patch as well.
(In reply to comment #10) > 1.) Export a boolean symbol for runtime querying of the FPSCR feature > availability, e.g. something like __ppc_feature_dfp_available. We never export variables. Always export functions. And is this case it should be a function to do the whole work and not just to query which instruction to use. > 2.) In __libc_start_main search the auxv hwcap for the specific feature, i.e. > PPC_FEATURE_HAS_DFP, and set the symbol __ppc_feature_dfp_available. This information only has to be passed from ld.so into libm or libc (see above). The data can be strored in rtld_globcal_ro.
Created attachment 2938 [details] enable 64-bit FPSCR but don't touch fpu_control.h Hi Ulrich, After talking with Steve Munroe we're concerned with how previous users may be using the control word vs. how they should be. They should only be relying on bits that have masks provided by fpu_control.h but our feeling is that they're probably relying on a 1:1 correspondence with the bits in the 32-bit FPSCR. With this in mind I think we should probably leave the _FPU_[GS]ETCW macros as they currently are and direct users to get the high 32-bits of the 64-bit FPSCR in another way. This way we don't break any existing applications which stupidly rely on 1:1 mapping of FPSCR to fpu_control_t. The rest of the patch still stands. I've respun the patch to remove modification of fpu_control.h which means I can eliminate the new tests as well. In my standalone DFP library I'll create a wrapper around fpu_control.h that does an include_next after the following check: #ifdef #ifdef __STDC_WANT_DEC_FP__ # warning "Please use the fe_dec_getround() and fe_dec_setround() functions to query the decimal rounding mode from the FPSCR." #endif This would then cover the bases for soft-decimal rounding mode as well.
Carlos Eduardo Seo is going submitting his VSX ucontext changes soon. I'm going to regenerate a patch set predicated on that patch set. In my new patches I'm going to have RESTORE_FPSCR dynamically check the hwcap for DFP support and use the 32-bit or 64-bit FPSCR restore insn dynamically.
Created attachment 3005 [details] Dynamically select correct mtfsf insn based on PPC_FEATURE_HAS_DFP This patch will dynamically select the corerct mtfsf insn based on PPC_FEATURE_HAS_DFP.
Ping. I believe that this patch is ready for review and check-in.
Created attachment 3061 [details] Respin of the patch with a test case verifying correct behavior. This patch has an added test case and a hwcap accessor fix. The test case verifies the behavior of [set|get|swap]context in the following scenarios: power5 -m32 --enable-kernel=2.4.21 [no-altivec, 32-bit FPSCR] power5 -m64 --enable-kernel=2.4.21 [no-altivec, 32-bit FPSCR] power5 -m32 --enable-kernel=2.6.16 [no-altivec, 32-bit FPSCR] power5 -m64 --enable-kernel=2.6.16 [no-altivec, 32-bit FPSCR] power6 -m32 --enable-kernel=2.4.21 [altivec, 64-bit FPSCR] power6 -m64 --enable-kernel=2.4.21 [altivec, 64-bit FPSCR] power6 -m32 --enable-kernel=2.6.16 [altivec, 64-bit FPSCR] power6 -m64 --enable-kernel=2.6.16 [altivec, 64-bit FPSCR] power6 -mcpu=power6 -m32 --enable-kernel=2.4.21 [altivec, 64-bit FPSCR] power6 -mcpu=power6 -m64 --enable-kernel=2.4.21 [altivec, 64-bit FPSCR] power6 -mcpu=power6 -m32 --enable-kernel=2.6.16 [altivec, 64-bit FPSCR] power6 -mcpu=power6 -m64 --enable-kernel=2.6.16 [altivec, 64-bit FPSCR] So I've verified with the test case that the given scenarios pass. 2.4.21 is prior to the swapcontext syscall so it tests the [get|set|swap]context-common code path. 2.6.16 contains the swapcontext syscall so it test the syscall path for said function invocations. The -mcpu=power6 switch indicates that this GLIBC is built for ISA 2.05 compliance at a minimum, whereby a dynamic check for the 64-bit FPSCR doesn't need to be made since ISA 2.05 and later have a 64-bit FPSCR by default. This testcase can be added to in the future for testing any special register preservation across context calls. Please consider for check-in. Everything is behaving properly as verified by the patch and the build passes make check.
I've applied the last patch.