This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] S390: Refactor ifunc handling
On 05/12/2018 14:15, Stefan Liebler wrote:
> On 12/03/2018 06:30 PM, Adhemerval Zanella wrote:
>>
>>
>> On 03/12/2018 06:48, Stefan Liebler wrote:
>>> On 11/30/2018 07:31 PM, Adhemerval Zanella wrote:
>>>>
>>>>
>>>> On 30/11/2018 13:57, Stefan Liebler wrote:
>>>>> This patch series is mainly refactoring the s390 specific ifunc handling.
>>>>> The idea is to omit ifunc variants or ifunc at all if the used compile
>>>>> options are already building for newer cpus. The glibc internal calls
>>>>> and ld.so will then use "newer" ifunc variants as before.
>>>>>
>>>>> In case of the memcpy, memset and memcmp functions, the newest ifunc variant
>>>>> is for z196 and there are two further ones for older cpus, but the current usual
>>>>> compile options are e.g. building for zEC12.
>>>>> In case of the string / wcsmbs functions, there are variants for z13 and
>>>>> "before z13". After switching to z13 as default cpu level, there won't
>>>>> be IFUNC symbols in s390 libc.so.
>>>>>
>>>>> Furthermore new z13 specific ifunc variants are introduced for
>>>>> memmove, strstr and memmem.
>>>>>
>>>>> Some functions like the mem* functions are duplicated twice for 31 and 64 bit.
>>>>> In fact they are nearly the same. Thus those implementations are now unified
>>>>> and adjusted in order to be usable for 31 and 64bit.
>>>>>
>>>>> I've build and tested these patches with different -march levels
>>>>> and with / without multiarch and checked the symbols with readelf - e.g. if
>>>>> IFUNC is used or not and if the __GI_ symbols are targeting the correct
>>>>> ifunc variant.
>>>>>
>>>>> If no one objects, I plan to commit this series in the next one or two weeks.
>>>>>
>>>>
>>>> The only issue I have for this change is it would require another
>>>> ABI variant for testing and validation, which would require more
>>>> coverage from build-many-glibcs.py (similar to armv7 for instance).
>>> The ABI itself is not changed, but you are right tests could be done for e.g. -march=zEC12 and -march=z13
>>
>> I meant 'variant' in the sense it would require multiple glibc build to
>> actually validate a change in s390 implementations. If I am reading it
>> correctly, on memcpy, for instance, we might have:
>>
>> 1. HAVE_MEMCPY_IFUNC
>> 1.1. HAVE_S390_MIN_Z196_ZARCH_ASM_SUPPORT
>> 1.2. HAVE_S390_MIN_Z10_ZARCH_ASM_SUPPORT
>> 1.3. Z900
>> 2. !HAVE_MEMCPY_IFUNC
>> 3. Disabled multi-arch
>>
>> So is is basically 4 different glibc builds one will need to actually
>> test and check to fully validate s390.
>>
> It depends on the change. If you change the implementation of the ifunc variants and build with the oldest march level, then due to ifunc-impl-list the tests will run all ifunc variants.
You validate the ifunc implementation itself, you need to also check if it
builds on all the variants (even when you are sure that the ifunc variant
itself works). That's what I am trying to avoid, to have multiple possible
glibc build depending of how you configure the compiler.
>
> The patchset also do some clean up and unifies 31 vs 64bit implementations from s390-32 and s390-64 folders. This simplifies maintenance.
I do agree with these kind of changes.
>>>
>>>>
>>>> The gains I see is a slight reduction is loading time (due no ifunc)
>>>> and less code side. Is is what is driving you for this change? Does
>>>> it worth the extra testing and validation it might require?
>>>>
>>> The current s390_vx_libc_ifunc macros are not flexible and do not allow future enhancements as they only use the *_c or *_vx variant.
>>
>> Right, this is one issue which is not really tied to setting up different
>> build variants depending of the minimum ISA level.
>>
>>>
>>> As soon as future ditros will use z13 as default architecture,
>>> vector instructions can be used by default. But the current __GI_* functions won't use the vector variants. Instead the fallback is always used for internal calls and the vector variants are always used for external calls via IFUNC.
>>
>> Some ABI does impose restriction for ifunc with internal hidden symbols
>> (powerpc32 and i686 if I recall correctly), is it the case for s390
>> and/or s390x?
> On s390 the compiler will generate e.g. a PC32DBL relocation instead of PLT32DBL one and the GOT pointer might not be setup in r12. But the called PLT stub relies on a setup r12.
>
> And on older binutils (unfortunately I'm not sure if it was version >= or < 2.25) bugs regarding ifunc lead to a segmentation fault while linking libc.so.
Do you consider this a deal-break issue? One option is to check the minimum
supported binutils for this change and set it as the minimum required one,
another is disable internal ifunc for such ABI (as for powerpc32, powerpc64
does work).
>>
>> At least for some ABI, we can still route some internal symbols through
>> ifunc. On some x86 ifunc selector implementation the comments state it
>> might show no performance gain, but I am not sure about the validity of
>> the claims. So if s390 or s390x does not have any impeding reason, a
>> possibility might to use ifunc internally instead of relying on compiler
>> default or used options to setup the minimum ABI.
>>
> If I start routing those symbols via IFUNC then an additional PLT stub is introduced in libc.so. If we are running on >=z13 then the __XYZ_vx is called via PLT. On machines <z13 __XYZ_c is also called via PLT. Compared to non-ifunc __GI_ symbols we have introduced the extra PLT stub overhead without any advantage on <z13 machines as the __XYZ_c implementation is called.
Do we have real usercases where they are stressing libc symbols that are
doing intra-symbol calls and which PLT overhead is dominant?
>
>> So the question is whether the possible internal symbols call optimization
>> worth that added complexity for build and validation.
>>
> E.g. for strstr there is an advantage. On s390 we are jumping from the common-code implementations to the vector implementations. On other archs we would jump from one vector implementation to another vector implementation and you perhaps don't see so much difference.
>
One option maybe is maybe to trade some code size for a specific strstr
variant for z13, which calls strstr/strnlen/memcpy locally. Specially
for strstr, is the difference really worth to add this build and validation
complexity?