This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] S390: Refactor ifunc handling


On 12/05/2018 08:32 PM, Adhemerval Zanella wrote:


On 05/12/2018 14:15, Stefan Liebler wrote:
On 12/03/2018 06:30 PM, Adhemerval Zanella wrote:


On 03/12/2018 06:48, Stefan Liebler wrote:
On 11/30/2018 07:31 PM, Adhemerval Zanella wrote:


On 30/11/2018 13:57, Stefan Liebler wrote:
This patch series is mainly refactoring the s390 specific ifunc handling.
The idea is to omit ifunc variants or ifunc at all if the used compile
options are already building for newer cpus. The glibc internal calls
and ld.so will then use "newer" ifunc variants as before.

In case of the memcpy, memset and memcmp functions, the newest ifunc variant
is for z196 and there are two further ones for older cpus, but the current usual
compile options are e.g. building for zEC12.
In case of the string / wcsmbs functions, there are variants for z13 and
"before z13".  After switching to z13 as default cpu level, there won't
be IFUNC symbols in s390 libc.so.

Furthermore new z13 specific ifunc variants are introduced for
memmove, strstr and memmem.

Some functions like the mem* functions are duplicated twice for 31 and 64 bit.
In fact they are nearly the same. Thus those implementations are now unified
and adjusted in order to be usable for 31 and 64bit.

I've build and tested these patches with different -march levels
and with / without multiarch and checked the symbols with readelf - e.g. if
IFUNC is used or not and if the __GI_ symbols are targeting the correct
ifunc variant.

If no one objects, I plan to commit this series in the next one or two weeks.


The only issue I have for this change is it would require another
ABI variant for testing and validation, which would require more
coverage from build-many-glibcs.py (similar to armv7 for instance).
The ABI itself is not changed, but you are right tests could be done for e.g. -march=zEC12 and -march=z13

I meant 'variant' in the sense it would require multiple glibc build to
actually validate a change in s390 implementations.  If I am reading it
correctly, on memcpy, for instance, we might have:

    1. HAVE_MEMCPY_IFUNC
      1.1. HAVE_S390_MIN_Z196_ZARCH_ASM_SUPPORT
      1.2. HAVE_S390_MIN_Z10_ZARCH_ASM_SUPPORT
      1.3. Z900
    2. !HAVE_MEMCPY_IFUNC
    3. Disabled multi-arch

So is is basically 4 different glibc builds one will need to actually
test and check to fully validate s390.

It depends on the change. If you change the implementation of the ifunc variants and build with the oldest march level, then due to ifunc-impl-list the tests will run all ifunc variants.

You validate the ifunc implementation itself, you need to also check if it
builds on all the variants (even when you are sure that the ifunc variant
itself works). That's what I am trying to avoid, to have multiple possible
glibc build depending of how you configure the compiler.

You always have to build and test glibc with your used compiler-configuration of the target distro. E.g. using z13 as default architecture level set can also introduce other bugs which are not triggered if the test was previously done with zEC12 as als.

A caller of a function is just calling __GI_XYZ. If you change the caller side then you rely on the correct __GI_XYZ symbol. If you change one existing implementation of XYZ, then the __GI_ mechanism is not touched and you have to test the changed implementation. If you add a new implementation for a new machine, then you have to extend and test the mechanism. But then you can also afford to build multiple glibcs. If __GI_XYZ exists twice or does not exist, then you'll get a link error without running the testsuite. If __GI_XYZ points to a false implementation, this is really bad. Then you'll either see it with readelf or it fails as you are running on an too old machine. But a change in the ifunc-resolver can also lead to such an issue.
Are all the power[4 to X], multiarch and co variants tested on every change?


The patchset also do some clean up and unifies 31 vs 64bit implementations from s390-32 and s390-64 folders. This simplifies maintenance.

I do agree with these kind of changes.



The gains I see is a slight reduction is loading time (due no ifunc)
and less code side. Is is what is driving you for this change? Does
it worth the extra testing and validation it might require?

The current s390_vx_libc_ifunc macros are not flexible and do not allow future enhancements as they only use the *_c or *_vx variant.

Right, this is one issue which is not really tied to setting up different
build variants depending of the minimum ISA level.


As soon as future ditros will use z13 as default architecture,
vector instructions can be used by default.  But the current __GI_* functions won't use the vector variants.  Instead the fallback is always used for internal calls and the vector variants are always used for external calls via IFUNC.

Some ABI does impose restriction for ifunc with internal hidden symbols
(powerpc32 and i686 if I recall correctly), is it the case for s390
and/or s390x?
On s390 the compiler will generate e.g. a PC32DBL relocation instead of PLT32DBL one and the GOT pointer might not be setup in r12. But the called PLT stub relies on a setup r12.

And on older binutils (unfortunately I'm not sure if it was version >= or < 2.25) bugs regarding ifunc lead to a segmentation fault while linking libc.so.

Do you consider this a deal-break issue? One option is to check the minimum
supported binutils for this change and set it as the minimum required one,
another is disable internal ifunc for such ABI (as for powerpc32, powerpc64
does work).

The "not setuped r12 register" is no binutils-bug. This is such an restriction with ifunc and internal hidden symbols you've mentioned.
Thus I can't use IFUNC __GI_* symbols.

I disable internal ifunc by setting the __GI_* symbols to the "oldest" implementation. Or is there another mechanism?


At least for some ABI, we can still route some internal symbols through
ifunc. On some x86 ifunc selector implementation the comments state it
might show no performance gain, but I am not sure about the validity of
the claims. So if s390 or s390x does not have any impeding reason, a
possibility might to use ifunc internally instead of relying on compiler
default or used options to setup the minimum ABI.

If I start routing those symbols via IFUNC then an additional PLT stub is introduced in libc.so. If we are running on >=z13 then the __XYZ_vx is called via PLT. On machines <z13 __XYZ_c is also called via PLT. Compared to non-ifunc __GI_ symbols we have introduced the extra PLT stub overhead without any advantage on <z13 machines as the __XYZ_c implementation is called.

Do we have real usercases where they are stressing libc symbols that are
doing intra-symbol calls and which PLT overhead is dominant?

Usually we try to avoid PLT calls. I think this is one idea behind the __GI_* symbols. Shall we remove those symbols and always jump over PLT?


So the question is whether the possible internal symbols call optimization
worth that added complexity for build and validation.

E.g. for strstr there is an advantage. On s390 we are jumping from the common-code implementations to the vector implementations. On other archs we would jump from one vector implementation to another vector implementation and you perhaps don't see so much difference.


One option maybe is maybe to trade some code size for a specific strstr
variant for z13, which calls strstr/strnlen/memcpy locally.  Specially
for strstr, is the difference really worth to add this build and validation
complexity?



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]