This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 4/4] S390: Implement mempcpy with help of memcpy. [BZ #19765]



On 04/05/2016 14:17, Wilco Dijkstra wrote:
> Adhemerval Zanella wrote:
>> Right, but I *think* compiler would be smart enough to just avoid the extra spilling. 
>> Take this example for instance [1], using GCC 5.3 for s390x I see no difference in
>> generated assembly if I the strategy I proposed (-DMEMPCPY_TO_MEMCPY) to
>> the s390 specific you are suggesting.  In the end, I am proposing that architecture
>> specific micro-optimization should be avoid in favor of a more specific one.  
>> Specially the one that tend to avoid one or two extra spilling based on quite complex
>> macro expansion.  [1] http://pastie.org/10824072
> 
> You need to use something like this to show the difference:
> 
> return __mempcpy (__mempcpy (__mempcpy (p1, s, len), p2, 1), p3, 16);
> 
> GCC doesn't even optimize mempcpy of constant size (PR70140), so if you do have
> an optimized mempcpy like s390 here, you *still* need to use memcpy for small immediate
> sizes (so they get inlined), and only use mempcpy for unknown or very large sizes.
> 
> We end up having to do these header tricks because GCC doesn't implement mempcpy
> as a first-class builtin or allow targets to defer to memcpy.
> 
> There are similar issues with strchr (s, 0) being used instead of the faster strlen (s) + s.
> 
> Wilco

But my point is all the architectures which provide an optimized mempcpy is
though either 1. jump directly to optimized memcpy (s390 case for this patchset),
2. clonning the same memcpy implementation and adjusting the pointers (x86_64) or 
3. using a similar strategy for both implementations (powerpc).

So for this change I am proposing compiler support won't be required because both
memcpy and __mempcpy will be transformed to memcpy + s.  Based on assumption that
memcpy is fast as mempcpy implementation I think there is no need to just add
this micro-optimization to only s390, but rather make is general.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]