This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 4/4] S390: Implement mempcpy with help of memcpy. [BZ #19765]


On Fri, May 13, 2016 at 7:42 AM, Stefan Liebler <stli@linux.vnet.ibm.com> wrote:
>
>
> On 05/12/2016 04:10 PM, Adhemerval Zanella wrote:
>>
>>
>>
>> On 09/05/2016 11:15, Stefan Liebler wrote:
>>>
>>> On 05/05/2016 06:36 PM, H.J. Lu wrote:
>>>>
>>>> On Thu, May 5, 2016 at 9:34 AM, Adhemerval Zanella
>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> On May 5, 2016, at 11:45, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>>
>>>>>> On Thu, May 5, 2016 at 7:15 AM, Adhemerval Zanella
>>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On 05/05/2016 10:37, H.J. Lu wrote:
>>>>>>>> On Wed, May 4, 2016 at 1:58 PM, Adhemerval Zanella
>>>>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On 04/05/2016 17:51, Wilco Dijkstra wrote:
>>>>>>>>>> Adhemerval Zanella wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> But my point is all the architectures which provide an optimized
>>>>>>>>>>> mempcpy is
>>>>>>>>>>> though either 1. jump directly to optimized memcpy (s390 case for
>>>>>>>>>>> this patchset),
>>>>>>>>>>> 2. clonning the same memcpy implementation and adjusting the
>>>>>>>>>>> pointers (x86_64) or
>>>>>>>>>>> 3. using a similar strategy for both implementations (powerpc).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Indeed, which of those are used doesn't matter much.
>>>>>>>>>>
>>>>>>>>>>> So for this change I am proposing compiler support won't be
>>>>>>>>>>> required because both
>>>>>>>>>>> memcpy and __mempcpy will be transformed to memcpy + s.  Based on
>>>>>>>>>>> assumption that
>>>>>>>>>>> memcpy is fast as mempcpy implementation I think there is no need
>>>>>>>>>>> to just add
>>>>>>>>>>> this micro-optimization to only s390, but rather make is general.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> GLIBC already has this optimization in the generic string header,
>>>>>>>>>> it's just that s390 wants
>>>>>>>>>> to do something different again. As long as GCC isn't fixed this
>>>>>>>>>> isn't possible to support
>>>>>>>>>> s390 without this header workaround. And we need GCC to improve so
>>>>>>>>>> things work
>>>>>>>>>> better for all the other C libraries...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> But the current one at string/string.h is only enabled with
>>>>>>>>> !defined _HAVE_STRING_ARCH_mempcpy,
>>>>>>>>> so if a port actually adds a mempcpy one it won't be enabled.  What
>>>>>>>>> I am trying to argue it
>>>>>>>>> to just remove the !defined _HAVE_STRING_ARCH_mempcpy and enable it
>>>>>>>>> as default for all
>>>>>>>>> ports.
>>>>>>>>
>>>>>>>>
>>>>>>>> Please don't enable it for x86.  Calling memcpy means we have to
>>>>>>>> save and restore 2 registers for no good reasons.
>>>>>>>
>>>>>>>
>>>>>>> Yes, direct call will require save and restore the size for further
>>>>>>> add
>>>>>>> and this is true for most architectures.  My question is if does this
>>>>>>> really matter in currently GLIBC internal usage and on programs that
>>>>>>> might use it compared against the burden of keeping the various
>>>>>>> string*.h header in check for multiple architectures or adding this
>>>>>>> logic (mempcpy transformation to memcpy) on compiler.
>>>>>>
>>>>>>
>>>>>> What burden? There is nothing to do in glibc for x86.  GCC can
>>>>>> inline mempcpy for x86.
>>>>>
>>>>>
>>>>> In fact I am objecting all the bits GLIBC added on string*.h that only
>>>>> adds complexity for some micro-optimizations. For x86 I do agree that
>>>>> transforming mempcpy to memcpy is no the best strategy.
>>>>>
>>>>> My rationale is to avoid add even more arch-specific bits in installed
>>>>> headers to add such optimizations.
>>>>
>>>>
>>>> I believe most of those micro-optimizations belong to GCC, not glibc.
>>>> Of course, we should keep the existing ones for older GCCs.  We
>>>> should avoid adding new ones.
>>>>
>>>
>>> Does this mean, the proposed way is to not add a macro for mempcpy in
>>> sysdeps/s390/bits/string.h or e.g. for another architecture?
>>>
>>> If _HAVE_STRING_ARCH_mempcpy is not defined for s390, then it will always
>>> use the macro defined in string/string.h, which inlines memcpy + n and the
>>> mempcpy-function is not called. The memcpy is transformed to mvc, mvhi for
>>> e.g. constant lengths. Otherwise, the memcpy-function is called and the
>>> length will be added after the call.
>>>
>>> According to PR70140
>>> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70140), the macro in
>>> string/string.h will be removed after this bug is fixed in GCC?
>>> Then I think the decision, if mempcpy or memcpy is called or an inline
>>> version is emitted, will be done per architecture backend?
>>>
>>> If this is all true, then the mempcpy-function will be called in future
>>> if it makes sense!?
>>>
>>>
>>> If _HAVE_STRING_ARCH_mempcpy will be defined for s390, then
>>> mempcpy-function is always called - even for cases with constant length
>>> where mvc or mvhi is emitted.
>>> This is definitely not the intention of this patch.
>>> After fixing PR70140, GCC will correctly handle the constant length
>>> cases!?
>>
>>
>> What I am proposing is to avoid add more arch-specific optimization on
>> installed
>> string*.h headers and instead work on adding them on compiler side.  My
>> view is
>> we should cleanup as much as possible the string headers and only add
>> optimization
>> that are more architecture neutral.
>>
>> Related to patch, my understanding is s390x does not really provide an
>> optimized
>> mempcpy (it uses the default mempcpy.c) and I think a better approach
>> would be
>> to add an optimized mempcpy like x88_64
>> (c365e615f7429aee302f8af7bf07ae262278febb).
>> The mempcpy won't be optimized directly to mvc instruction, but at least
>> it will
>> call the optimized memcpy.  The full optimization will be done by
>> correctly
>> handling the transformation on compiler size (the PR#70140 as you
>> referred).
>>
>>
>
> Okay. Here is the updated patch. It implements mempcpy with memcpy as
> before, but does not change the s390-specific string.h and
> _HAVE_STRING_ARCH_mempcpy is not defined. Thus at the moment mempcpy() won't
> be called, but instead memcpy + n is inlined by the mempcpy macro in
> string/string.h. After fxing PR70140, either the mempcpy macro in
> string/string.h has to be removed or _HAVE_STRING_ARCH_mempcpy has to be
> defined for S390.
>
> Okay to commit?

+__asm__ (".weak mempcpy\n\t"
+ ".set mempcpy,__mempcpy\n\t");

Don't we have a macro for this?

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]