[PATCH v2 03/10] i386: Use generic fmod
Adhemerval Zanella Netto
adhemerval.zanella@linaro.org
Thu Mar 28 18:22:08 GMT 2024
On 28/03/24 13:00, H.J. Lu wrote:
> On Thu, Mar 28, 2024 at 8:57 AM Adhemerval Zanella Netto
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>>
>> On 28/03/24 12:55, H.J. Lu wrote:
>>> On Thu, Mar 28, 2024 at 8:48 AM Adhemerval Zanella Netto
>>> <adhemerval.zanella@linaro.org> wrote:
>>>>
>>>>
>>>>
>>>> On 28/03/24 12:42, H.J. Lu wrote:
>>>>> On Thu, Mar 28, 2024 at 8:14 AM Adhemerval Zanella Netto
>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 28/03/24 11:51, H.J. Lu wrote:
>>>>>>> On Thu, Mar 28, 2024 at 7:11 AM Adhemerval Zanella Netto
>>>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 27/03/24 18:38, H.J. Lu wrote:
>>>>>>>>> On Wed, Mar 27, 2024 at 1:37 PM Adhemerval Zanella Netto
>>>>>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 27/03/24 16:55, H.J. Lu wrote:
>>>>>>>>>>> On Wed, Mar 27, 2024 at 12:40 PM Adhemerval Zanella
>>>>>>>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> The benchtest results shows a slight improvement (Ryzen 5900, gcc
>>>>>>>>>>>> 13.2.1):
>>>>>>>>>>>>
>>>>>>>>>>>> * sysdeps/i386/fpu/e_fmod.S:
>>>>>>>>>>>> "fmod": {
>>>>>>>>>>>> "subnormals": {
>>>>>>>>>>>> "duration": 3.68855e+09,
>>>>>>>>>>>> "iterations": 2.12608e+08,
>>>>>>>>>>>> "max": 62.012,
>>>>>>>>>>>> "min": 16.798,
>>>>>>>>>>>> "mean": 17.349
>>>>>>>>>>>> },
>>>>>>>>>>>> "normal": {
>>>>>>>>>>>> "duration": 3.88459e+09,
>>>>>>>>>>>> "iterations": 7.168e+06,
>>>>>>>>>>>> "max": 2879.12,
>>>>>>>>>>>> "min": 16.909,
>>>>>>>>>>>> "mean": 541.934
>>>>>>>>>>>> },
>>>>>>>>>>>> "close-exponents": {
>>>>>>>>>>>> "duration": 3.692e+09,
>>>>>>>>>>>> "iterations": 1.96608e+08,
>>>>>>>>>>>> "max": 66.452,
>>>>>>>>>>>> "min": 16.835,
>>>>>>>>>>>> "mean": 18.7785
>>>>>>>>>>>> }
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> * generic
>>>>>>>>>>>> "fmod": {
>>>>>>>>>>>> "subnormals": {
>>>>>>>>>>>> "duration": 3.68645e+09,
>>>>>>>>>>>> "iterations": 2.2848e+08,
>>>>>>>>>>>> "max": 66.896,
>>>>>>>>>>>> "min": 15.91,
>>>>>>>>>>>> "mean": 16.1347
>>>>>>>>>>>> },
>>>>>>>>>>>> "normal": {
>>>>>>>>>>>> "duration": 4.1455e+09,
>>>>>>>>>>>> "iterations": 8.192e+06,
>>>>>>>>>>>> "max": 3376.18,
>>>>>>>>>>>> "min": 15.873,
>>>>>>>>>>>> "mean": 506.043
>>>>>>>>>>>> },
>>>>>>>>>>>> "close-exponents": {
>>>>>>>>>>>> "duration": 3.70197e+09,
>>>>>>>>>>>> "iterations": 2.08896e+08,
>>>>>>>>>>>> "max": 69.597,
>>>>>>>>>>>> "min": 15.947,
>>>>>>>>>>>> "mean": 17.7216
>>>>>>>>>>>> }
>>>>>>>>>>>> }
>>>>>>>>>>>> ---
>>>>>>>>>>>> sysdeps/i386/fpu/Versions | 4 ++++
>>>>>>>>>>>> sysdeps/i386/fpu/e_fmod.S | 18 ------------------
>>>>>>>>>>>> sysdeps/i386/fpu/e_fmod.c | 2 ++
>>>>>>>>>>>> sysdeps/i386/fpu/math_err.c | 1 -
>>>>>>>>>>>> sysdeps/i386/fpu/w_fmod_compat.c | 15 ---------------
>>>>>>>>>>>> sysdeps/ieee754/dbl-64/e_fmod.c | 5 ++++-
>>>>>>>>>>>> sysdeps/mach/hurd/i386/libm.abilist | 1 +
>>>>>>>>>>>> sysdeps/unix/sysv/linux/i386/libm.abilist | 1 +
>>>>>>>>>>>> 8 files changed, 12 insertions(+), 35 deletions(-)
>>>>>>>>>>>> delete mode 100644 sysdeps/i386/fpu/e_fmod.S
>>>>>>>>>>>> create mode 100644 sysdeps/i386/fpu/e_fmod.c
>>>>>>>>>>>> delete mode 100644 sysdeps/i386/fpu/math_err.c
>>>>>>>>>>>> delete mode 100644 sysdeps/i386/fpu/w_fmod_compat.c
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/sysdeps/i386/fpu/Versions b/sysdeps/i386/fpu/Versions
>>>>>>>>>>>> index a2eec371f1..d37bc1eae6 100644
>>>>>>>>>>>> --- a/sysdeps/i386/fpu/Versions
>>>>>>>>>>>> +++ b/sysdeps/i386/fpu/Versions
>>>>>>>>>>>> @@ -3,4 +3,8 @@ libm {
>>>>>>>>>>>> # functions used in inline functions or macros
>>>>>>>>>>>> __expl; __expm1l;
>>>>>>>>>>>> }
>>>>>>>>>>>> + GLIBC_2.40 {
>>>>>>>>>>>> + # No SVID compatible error handling.
>>>>>>>>>>>> + fmod;
>>>>>>>>>>>> + }
>>>>>>>>>>>
>>>>>>>>>>> This changes the ABI. I assume that it fixes a real bug. Is there a bug
>>>>>>>>>>> report open for this?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The new version is the way to provide the system without the SVID compat
>>>>>>>>>> support, which we for all ABIs but i386 on 2.38. For instance:
>>>>>>>>>>
>>>>>>>>>> find . -iname libm.abilist | xargs grep -w fmod
>>>>>>>>>> ./sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist:GLIBC_2.0 fmod F
>>>>>>>>>> ./sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist:GLIBC_2.38 fmod F
>>>>>>>>>> [...]
>>>>>>>>>>
>>>>>>>>>> For i386 specifically, the old SVID symbol will be kept as fmod@GLIBC_2.0.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Does it fix a run-time test which fails without the fix?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Not really, but it is one less assembly implementation in favor a generic one
>>>>>>>> (which also shows a slight improvement on recent chips) and it sync i386
>>>>>>>> with generic code (so less possible issues, such as the static lib in this
>>>>>>>> patchset).
>>>>>>>
>>>>>>> Why do we need a new symbol?
>>>>>>
>>>>>> Because the new fmod@GLIBC_2.40 for i386 won't have the SVID handling,
>>>>>> similar to what has been done for other architectures with
>>>>>> 16439f419b270184ec501c531bf20d83b6745fb0;
>>>>>
>>>>> Does it change i386 fmod behavior? If yes, we need a testcase to verify it.
>>>>> If not, why is it needed?
>>>>>
>>>>
>>>> It is not strictly required, but it makes i386 has one less assembly optimization
>>>> that do not follow the rest of the code and it optimizes it slight because. Since
>>>> we do actually have check for SVID, the default math tests already check the
>>>> required symbol semantic.
>>>
>>> fmod@GLIBC_2.40 is added because of the SVID handling. But there is no
>>> user visible behavior change. Is this correct?
>>
>> The user visible is the missing SVID handling (which I think noone actually uses
>> it). That's the main reason we need the compat dance and this extra complexity.
>> Maybe one day we just can drop this for good...
>
> If we want to provide the SVID compatibility, 2 testcases are needed:
>
> 1. A testcase to show that the new implementation is incompatible with SVID.
> 2. A testcase to show that the compat symbol provides the SVID compatibility.
We don't really have SVID compatibility tests for any other optimization/simplification,
and although I don't really oppose on adding I also thinking that this is making this
change even more complicated than it would require.
I can drop the i386 changes to use generic implementations if you think it would
simplify this patchset.
More information about the Libc-alpha
mailing list