[PATCH v2 03/10] i386: Use generic fmod

Adhemerval Zanella Netto adhemerval.zanella@linaro.org
Thu Mar 28 18:22:08 GMT 2024



On 28/03/24 13:00, H.J. Lu wrote:
> On Thu, Mar 28, 2024 at 8:57 AM Adhemerval Zanella Netto
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>>
>> On 28/03/24 12:55, H.J. Lu wrote:
>>> On Thu, Mar 28, 2024 at 8:48 AM Adhemerval Zanella Netto
>>> <adhemerval.zanella@linaro.org> wrote:
>>>>
>>>>
>>>>
>>>> On 28/03/24 12:42, H.J. Lu wrote:
>>>>> On Thu, Mar 28, 2024 at 8:14 AM Adhemerval Zanella Netto
>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 28/03/24 11:51, H.J. Lu wrote:
>>>>>>> On Thu, Mar 28, 2024 at 7:11 AM Adhemerval Zanella Netto
>>>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 27/03/24 18:38, H.J. Lu wrote:
>>>>>>>>> On Wed, Mar 27, 2024 at 1:37 PM Adhemerval Zanella Netto
>>>>>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 27/03/24 16:55, H.J. Lu wrote:
>>>>>>>>>>> On Wed, Mar 27, 2024 at 12:40 PM Adhemerval Zanella
>>>>>>>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> The benchtest results shows a slight improvement (Ryzen 5900, gcc
>>>>>>>>>>>> 13.2.1):
>>>>>>>>>>>>
>>>>>>>>>>>> * sysdeps/i386/fpu/e_fmod.S:
>>>>>>>>>>>>   "fmod": {
>>>>>>>>>>>>    "subnormals": {
>>>>>>>>>>>>     "duration": 3.68855e+09,
>>>>>>>>>>>>     "iterations": 2.12608e+08,
>>>>>>>>>>>>     "max": 62.012,
>>>>>>>>>>>>     "min": 16.798,
>>>>>>>>>>>>     "mean": 17.349
>>>>>>>>>>>>    },
>>>>>>>>>>>>    "normal": {
>>>>>>>>>>>>     "duration": 3.88459e+09,
>>>>>>>>>>>>     "iterations": 7.168e+06,
>>>>>>>>>>>>     "max": 2879.12,
>>>>>>>>>>>>     "min": 16.909,
>>>>>>>>>>>>     "mean": 541.934
>>>>>>>>>>>>    },
>>>>>>>>>>>>    "close-exponents": {
>>>>>>>>>>>>     "duration": 3.692e+09,
>>>>>>>>>>>>     "iterations": 1.96608e+08,
>>>>>>>>>>>>     "max": 66.452,
>>>>>>>>>>>>     "min": 16.835,
>>>>>>>>>>>>     "mean": 18.7785
>>>>>>>>>>>>    }
>>>>>>>>>>>>   }
>>>>>>>>>>>>
>>>>>>>>>>>> * generic
>>>>>>>>>>>>   "fmod": {
>>>>>>>>>>>>    "subnormals": {
>>>>>>>>>>>>     "duration": 3.68645e+09,
>>>>>>>>>>>>     "iterations": 2.2848e+08,
>>>>>>>>>>>>     "max": 66.896,
>>>>>>>>>>>>     "min": 15.91,
>>>>>>>>>>>>     "mean": 16.1347
>>>>>>>>>>>>    },
>>>>>>>>>>>>    "normal": {
>>>>>>>>>>>>     "duration": 4.1455e+09,
>>>>>>>>>>>>     "iterations": 8.192e+06,
>>>>>>>>>>>>     "max": 3376.18,
>>>>>>>>>>>>     "min": 15.873,
>>>>>>>>>>>>     "mean": 506.043
>>>>>>>>>>>>    },
>>>>>>>>>>>>    "close-exponents": {
>>>>>>>>>>>>     "duration": 3.70197e+09,
>>>>>>>>>>>>     "iterations": 2.08896e+08,
>>>>>>>>>>>>     "max": 69.597,
>>>>>>>>>>>>     "min": 15.947,
>>>>>>>>>>>>     "mean": 17.7216
>>>>>>>>>>>>    }
>>>>>>>>>>>>   }
>>>>>>>>>>>> ---
>>>>>>>>>>>>  sysdeps/i386/fpu/Versions                 |  4 ++++
>>>>>>>>>>>>  sysdeps/i386/fpu/e_fmod.S                 | 18 ------------------
>>>>>>>>>>>>  sysdeps/i386/fpu/e_fmod.c                 |  2 ++
>>>>>>>>>>>>  sysdeps/i386/fpu/math_err.c               |  1 -
>>>>>>>>>>>>  sysdeps/i386/fpu/w_fmod_compat.c          | 15 ---------------
>>>>>>>>>>>>  sysdeps/ieee754/dbl-64/e_fmod.c           |  5 ++++-
>>>>>>>>>>>>  sysdeps/mach/hurd/i386/libm.abilist       |  1 +
>>>>>>>>>>>>  sysdeps/unix/sysv/linux/i386/libm.abilist |  1 +
>>>>>>>>>>>>  8 files changed, 12 insertions(+), 35 deletions(-)
>>>>>>>>>>>>  delete mode 100644 sysdeps/i386/fpu/e_fmod.S
>>>>>>>>>>>>  create mode 100644 sysdeps/i386/fpu/e_fmod.c
>>>>>>>>>>>>  delete mode 100644 sysdeps/i386/fpu/math_err.c
>>>>>>>>>>>>  delete mode 100644 sysdeps/i386/fpu/w_fmod_compat.c
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/sysdeps/i386/fpu/Versions b/sysdeps/i386/fpu/Versions
>>>>>>>>>>>> index a2eec371f1..d37bc1eae6 100644
>>>>>>>>>>>> --- a/sysdeps/i386/fpu/Versions
>>>>>>>>>>>> +++ b/sysdeps/i386/fpu/Versions
>>>>>>>>>>>> @@ -3,4 +3,8 @@ libm {
>>>>>>>>>>>>      # functions used in inline functions or macros
>>>>>>>>>>>>      __expl; __expm1l;
>>>>>>>>>>>>    }
>>>>>>>>>>>> +  GLIBC_2.40 {
>>>>>>>>>>>> +    # No SVID compatible error handling.
>>>>>>>>>>>> +    fmod;
>>>>>>>>>>>> +  }
>>>>>>>>>>>
>>>>>>>>>>> This changes the ABI.  I assume that it fixes a real bug.   Is there a bug
>>>>>>>>>>> report open for this?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The new version is the way to provide the system without the SVID compat
>>>>>>>>>> support, which we for all ABIs but i386 on 2.38. For instance:
>>>>>>>>>>
>>>>>>>>>> find . -iname libm.abilist | xargs grep -w fmod
>>>>>>>>>> ./sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist:GLIBC_2.0 fmod F
>>>>>>>>>> ./sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist:GLIBC_2.38 fmod F
>>>>>>>>>> [...]
>>>>>>>>>>
>>>>>>>>>> For i386 specifically, the old SVID symbol will be kept as fmod@GLIBC_2.0.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Does it fix a run-time test which fails without the fix?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Not really, but it is one less assembly implementation in favor a generic one
>>>>>>>> (which also shows a slight improvement on recent chips) and it sync i386
>>>>>>>> with generic code (so less possible issues, such as the static lib in this
>>>>>>>> patchset).
>>>>>>>
>>>>>>> Why do we need a new symbol?
>>>>>>
>>>>>> Because the new fmod@GLIBC_2.40 for i386 won't have the SVID handling,
>>>>>> similar to what has been done for other architectures with
>>>>>> 16439f419b270184ec501c531bf20d83b6745fb0;
>>>>>
>>>>> Does it change i386 fmod behavior? If yes, we need a testcase to verify it.
>>>>> If not, why is it needed?
>>>>>
>>>>
>>>> It is not strictly required, but it makes i386 has one less assembly optimization
>>>> that do not follow the rest of the code and it optimizes it slight because. Since
>>>> we do actually have check for SVID, the default math tests already check the
>>>> required symbol semantic.
>>>
>>> fmod@GLIBC_2.40 is added because of the SVID handling.  But there is no
>>> user visible behavior change.  Is this correct?
>>
>> The user visible is the missing SVID handling (which I think noone actually uses
>> it).  That's the main reason we need the compat dance and this extra complexity.
>> Maybe one day we just can drop this for good...
> 
> If we want to provide the SVID compatibility, 2 testcases are needed:
> 
> 1.  A testcase to show that the new implementation is incompatible with SVID.
> 2.  A testcase to show that the compat symbol provides the SVID compatibility.

We don't really have SVID compatibility tests for any other optimization/simplification,
and although I don't really oppose on adding I also thinking that this is making this
change even more complicated than it would require.

I can drop the i386 changes to use generic implementations if you think it would
simplify this patchset.



More information about the Libc-alpha mailing list