This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 4/4] S390: Implement mempcpy with help of memcpy. [BZ #19765]


On Fri, May 13, 2016 at 7:59 AM, Stefan Liebler <stli@linux.vnet.ibm.com> wrote:
>
>
> On 05/13/2016 04:49 PM, H.J. Lu wrote:
>>
>> On Fri, May 13, 2016 at 7:42 AM, Stefan Liebler <stli@linux.vnet.ibm.com>
>> wrote:
>>>
>>>
>>>
>>> On 05/12/2016 04:10 PM, Adhemerval Zanella wrote:
>>>>
>>>>
>>>>
>>>>
>>>> On 09/05/2016 11:15, Stefan Liebler wrote:
>>>>>
>>>>>
>>>>> On 05/05/2016 06:36 PM, H.J. Lu wrote:
>>>>>>
>>>>>>
>>>>>> On Thu, May 5, 2016 at 9:34 AM, Adhemerval Zanella
>>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On May 5, 2016, at 11:45, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On Thu, May 5, 2016 at 7:15 AM, Adhemerval Zanella
>>>>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On 05/05/2016 10:37, H.J. Lu wrote:
>>>>>>>>>> On Wed, May 4, 2016 at 1:58 PM, Adhemerval Zanella
>>>>>>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On 04/05/2016 17:51, Wilco Dijkstra wrote:
>>>>>>>>>>>> Adhemerval Zanella wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> But my point is all the architectures which provide an
>>>>>>>>>>>>> optimized
>>>>>>>>>>>>> mempcpy is
>>>>>>>>>>>>> though either 1. jump directly to optimized memcpy (s390 case
>>>>>>>>>>>>> for
>>>>>>>>>>>>> this patchset),
>>>>>>>>>>>>> 2. clonning the same memcpy implementation and adjusting the
>>>>>>>>>>>>> pointers (x86_64) or
>>>>>>>>>>>>> 3. using a similar strategy for both implementations (powerpc).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Indeed, which of those are used doesn't matter much.
>>>>>>>>>>>>
>>>>>>>>>>>>> So for this change I am proposing compiler support won't be
>>>>>>>>>>>>> required because both
>>>>>>>>>>>>> memcpy and __mempcpy will be transformed to memcpy + s.  Based
>>>>>>>>>>>>> on
>>>>>>>>>>>>> assumption that
>>>>>>>>>>>>> memcpy is fast as mempcpy implementation I think there is no
>>>>>>>>>>>>> need
>>>>>>>>>>>>> to just add
>>>>>>>>>>>>> this micro-optimization to only s390, but rather make is
>>>>>>>>>>>>> general.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> GLIBC already has this optimization in the generic string
>>>>>>>>>>>> header,
>>>>>>>>>>>> it's just that s390 wants
>>>>>>>>>>>> to do something different again. As long as GCC isn't fixed this
>>>>>>>>>>>> isn't possible to support
>>>>>>>>>>>> s390 without this header workaround. And we need GCC to improve
>>>>>>>>>>>> so
>>>>>>>>>>>> things work
>>>>>>>>>>>> better for all the other C libraries...
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> But the current one at string/string.h is only enabled with
>>>>>>>>>>> !defined _HAVE_STRING_ARCH_mempcpy,
>>>>>>>>>>> so if a port actually adds a mempcpy one it won't be enabled.
>>>>>>>>>>> What
>>>>>>>>>>> I am trying to argue it
>>>>>>>>>>> to just remove the !defined _HAVE_STRING_ARCH_mempcpy and enable
>>>>>>>>>>> it
>>>>>>>>>>> as default for all
>>>>>>>>>>> ports.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Please don't enable it for x86.  Calling memcpy means we have to
>>>>>>>>>> save and restore 2 registers for no good reasons.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes, direct call will require save and restore the size for further
>>>>>>>>> add
>>>>>>>>> and this is true for most architectures.  My question is if does
>>>>>>>>> this
>>>>>>>>> really matter in currently GLIBC internal usage and on programs
>>>>>>>>> that
>>>>>>>>> might use it compared against the burden of keeping the various
>>>>>>>>> string*.h header in check for multiple architectures or adding this
>>>>>>>>> logic (mempcpy transformation to memcpy) on compiler.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> What burden? There is nothing to do in glibc for x86.  GCC can
>>>>>>>> inline mempcpy for x86.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> In fact I am objecting all the bits GLIBC added on string*.h that
>>>>>>> only
>>>>>>> adds complexity for some micro-optimizations. For x86 I do agree that
>>>>>>> transforming mempcpy to memcpy is no the best strategy.
>>>>>>>
>>>>>>> My rationale is to avoid add even more arch-specific bits in
>>>>>>> installed
>>>>>>> headers to add such optimizations.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I believe most of those micro-optimizations belong to GCC, not glibc.
>>>>>> Of course, we should keep the existing ones for older GCCs.  We
>>>>>> should avoid adding new ones.
>>>>>>
>>>>>
>>>>> Does this mean, the proposed way is to not add a macro for mempcpy in
>>>>> sysdeps/s390/bits/string.h or e.g. for another architecture?
>>>>>
>>>>> If _HAVE_STRING_ARCH_mempcpy is not defined for s390, then it will
>>>>> always
>>>>> use the macro defined in string/string.h, which inlines memcpy + n and
>>>>> the
>>>>> mempcpy-function is not called. The memcpy is transformed to mvc, mvhi
>>>>> for
>>>>> e.g. constant lengths. Otherwise, the memcpy-function is called and the
>>>>> length will be added after the call.
>>>>>
>>>>> According to PR70140
>>>>> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70140), the macro in
>>>>> string/string.h will be removed after this bug is fixed in GCC?
>>>>> Then I think the decision, if mempcpy or memcpy is called or an inline
>>>>> version is emitted, will be done per architecture backend?
>>>>>
>>>>> If this is all true, then the mempcpy-function will be called in future
>>>>> if it makes sense!?
>>>>>
>>>>>
>>>>> If _HAVE_STRING_ARCH_mempcpy will be defined for s390, then
>>>>> mempcpy-function is always called - even for cases with constant length
>>>>> where mvc or mvhi is emitted.
>>>>> This is definitely not the intention of this patch.
>>>>> After fixing PR70140, GCC will correctly handle the constant length
>>>>> cases!?
>>>>
>>>>
>>>>
>>>> What I am proposing is to avoid add more arch-specific optimization on
>>>> installed
>>>> string*.h headers and instead work on adding them on compiler side.  My
>>>> view is
>>>> we should cleanup as much as possible the string headers and only add
>>>> optimization
>>>> that are more architecture neutral.
>>>>
>>>> Related to patch, my understanding is s390x does not really provide an
>>>> optimized
>>>> mempcpy (it uses the default mempcpy.c) and I think a better approach
>>>> would be
>>>> to add an optimized mempcpy like x88_64
>>>> (c365e615f7429aee302f8af7bf07ae262278febb).
>>>> The mempcpy won't be optimized directly to mvc instruction, but at least
>>>> it will
>>>> call the optimized memcpy.  The full optimization will be done by
>>>> correctly
>>>> handling the transformation on compiler size (the PR#70140 as you
>>>> referred).
>>>>
>>>>
>>>
>>> Okay. Here is the updated patch. It implements mempcpy with memcpy as
>>> before, but does not change the s390-specific string.h and
>>> _HAVE_STRING_ARCH_mempcpy is not defined. Thus at the moment mempcpy()
>>> won't
>>> be called, but instead memcpy + n is inlined by the mempcpy macro in
>>> string/string.h. After fxing PR70140, either the mempcpy macro in
>>> string/string.h has to be removed or _HAVE_STRING_ARCH_mempcpy has to be
>>> defined for S390.
>>>
>>> Okay to commit?
>>
>>
>> +__asm__ (".weak mempcpy\n\t"
>> + ".set mempcpy,__mempcpy\n\t");
>>
>> Don't we have a macro for this?
>>
> Yes that is true, but I can't use it in mempcpy.c in this case,
> because the macro s390_libc_ifunc (__mempcpy) uses an inline assembly
> to implement the __mempcpy ifunc symbol.
> If I use the weak_alias macro here in the c-file, gcc fails because it has
> no knowledge of symbol __mempcpy.
>

Can you make s390_libc_ifunc similar to libc_ifunc to expose
FUNC to gcc?

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]