This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 4/4] S390: Implement mempcpy with help of memcpy. [BZ #19765]



On 04/05/2016 15:20, H.J. Lu wrote:
> On Wed, May 4, 2016 at 11:12 AM, Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>> On 04/05/2016 14:17, Wilco Dijkstra wrote:
>>> Adhemerval Zanella wrote:
>>>> Right, but I *think* compiler would be smart enough to just avoid the extra spilling.
>>>> Take this example for instance [1], using GCC 5.3 for s390x I see no difference in
>>>> generated assembly if I the strategy I proposed (-DMEMPCPY_TO_MEMCPY) to
>>>> the s390 specific you are suggesting.  In the end, I am proposing that architecture
>>>> specific micro-optimization should be avoid in favor of a more specific one.
>>>> Specially the one that tend to avoid one or two extra spilling based on quite complex
>>>> macro expansion.  [1] http://pastie.org/10824072
>>>
>>> You need to use something like this to show the difference:
>>>
>>> return __mempcpy (__mempcpy (__mempcpy (p1, s, len), p2, 1), p3, 16);
>>>
>>> GCC doesn't even optimize mempcpy of constant size (PR70140), so if you do have
>>> an optimized mempcpy like s390 here, you *still* need to use memcpy for small immediate
>>> sizes (so they get inlined), and only use mempcpy for unknown or very large sizes.
>>>
>>> We end up having to do these header tricks because GCC doesn't implement mempcpy
>>> as a first-class builtin or allow targets to defer to memcpy.
>>>
>>> There are similar issues with strchr (s, 0) being used instead of the faster strlen (s) + s.
>>>
>>> Wilco
>>
>> But my point is all the architectures which provide an optimized mempcpy is
>> though either 1. jump directly to optimized memcpy (s390 case for this patchset),
>> 2. clonning the same memcpy implementation and adjusting the pointers (x86_64) or
> 
> X86-64 doesn't do that after
> 
> commit c365e615f7429aee302f8af7bf07ae262278febb
> Author: H.J. Lu <hjl.tools@gmail.com>
> Date:   Mon Mar 28 13:13:36 2016 -0700
> 
>     Implement x86-64 multiarch mempcpy in memcpy
> 
>     Implement x86-64 multiarch mempcpy in memcpy to share most of code.  It
>     reduces code size of libc.so.
> 
>       [BZ #18858]

Right, so it follows s390 strategy as well.

> 
>> 3. using a similar strategy for both implementations (powerpc).
>>
>> So for this change I am proposing compiler support won't be required because both
>> memcpy and __mempcpy will be transformed to memcpy + s.  Based on assumption that
>> memcpy is fast as mempcpy implementation I think there is no need to just add
>> this micro-optimization to only s390, but rather make is general.
> 
> 
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]