This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH 4/4] S390: Implement mempcpy with help of memcpy. [BZ #19765]
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- Cc: nd <nd at arm dot com>, 'GNU C Library' <libc-alpha at sourceware dot org>
- Date: Wed, 4 May 2016 17:17:33 +0000
- Subject: Re: [PATCH 4/4] S390: Implement mempcpy with help of memcpy. [BZ #19765]
- Authentication-results: sourceware.org; auth=none
- Nodisclaimer: True
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:23
Adhemerval Zanella wrote:
> Right, but I *think* compiler would be smart enough to just avoid the extra spilling.
> Take this example for instance , using GCC 5.3 for s390x I see no difference in
> generated assembly if I the strategy I proposed (-DMEMPCPY_TO_MEMCPY) to
> the s390 specific you are suggesting. In the end, I am proposing that architecture
> specific micro-optimization should be avoid in favor of a more specific one.
> Specially the one that tend to avoid one or two extra spilling based on quite complex
> macro expansion.  http://pastie.org/10824072
You need to use something like this to show the difference:
return __mempcpy (__mempcpy (__mempcpy (p1, s, len), p2, 1), p3, 16);
GCC doesn't even optimize mempcpy of constant size (PR70140), so if you do have
an optimized mempcpy like s390 here, you *still* need to use memcpy for small immediate
sizes (so they get inlined), and only use mempcpy for unknown or very large sizes.
We end up having to do these header tricks because GCC doesn't implement mempcpy
as a first-class builtin or allow targets to defer to memcpy.
There are similar issues with strchr (s, 0) being used instead of the faster strlen (s) + s.