This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 4/4] S390: Implement mempcpy with help of memcpy. [BZ #19765]
- From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- To: libc-alpha at sourceware dot org
- Date: Wed, 4 May 2016 12:41:50 -0300
- Subject: Re: [PATCH 4/4] S390: Implement mempcpy with help of memcpy. [BZ #19765]
- Authentication-results: sourceware.org; auth=none
- References: <1461672469-2107-1-git-send-email-stli at linux dot vnet dot ibm dot com> <1461672469-2107-4-git-send-email-stli at linux dot vnet dot ibm dot com> <571F6E07 dot 6010403 at linaro dot org> <nfpse0$aac$2 at ger dot gmane dot org>
On 27/04/2016 05:14, Stefan Liebler wrote:
> On 04/26/2016 03:32 PM, Adhemerval Zanella wrote:
>>
>>
>> On 26/04/2016 09:07, Stefan Liebler wrote:
>>> There exist optimized memcpy functions on s390, but no optimized mempcpy.
>>> This patch adds mempcpy entry points in memcpy.S files, which
>>> use the memcpy implementation. Now mempcpy itself is also an IFUNC function
>>> as memcpy is and the variants are listed in ifunc-impl-list.c.
>>>
>>> The s390 string.h defines _HAVE_STRING_ARCH_mempcpy and __mempcpy/mempcpy
>>> definitions. If n is constant and small enough, __mempcpy/mempcpy is an inlined
>>> function, which uses memcpy and returns dest + n. In this constant case,
>>> GCC emits instructions like mvi or mvc and avoids the function call to memcpy.
>>> If n is not constant, then __mempcpy is called.
>>>
>>
>> AFAIk GLIBC does not have any ports that implements mempcpy and not
>> memcpy and IMHO the inline version to convert mempcpy to memcpy is
>> always a gain.
>>
>> So I would suggest to just avoid adding arch-specific bits and move the
>> mempcpy inline optimization at string/string.h to string/string2.h as
>>
>> --
>>
>> #define mempcpy(dest, src, n) __mempcpy_inline (dest, src, n)
>> #define __mempcpy(dest, src, n) __mempcpy_inline (dest, src, n)
>>
>> __extern_always_inline void *
>> __mempcpy_inline (void *__restrict __dest,
>> const void *__restrict __src, size_t __n)
>> {
>> return (char *) memcpy (__dest, __src, __n) + __n;
>> }
>>
>> --
>>
>> This is convert any mempcpy to mempcy plus length and provide
>> the symbol only for older binaries.
>>
>
> According to the current behaviour on s390, a call to memcpy with constant byte-length below 65536 is done by emitting mvc, mvi or
> a suitable instruction sequence.
> If gcc don't know about a constant byte-length, then memcpy function
> in glibc is called.
>
> For mempcpy cascades like in iconv/gconv_conf.c:
> 466: __mempcpy (__mempcpy (__mempcpy (gconv_path, __gconv_path_envvar,
> user_len),
> ":", 1),
> default_gconv_path, sizeof (default_gconv_path));
>
> The current behaviour is to perform a call to memcpy(), add user_len to
> the return value of the call and use mvi, mvc for the two calls with constant byte-lengths.
>
> If only _HAVE_STRING_ARCH_mempcpy is defined, there would be three calls
> to mempcpy. Here you are right, this is not what you want.
>
> The behaviour with this patch is to call mempcpy() once and use mvi, mvc
> directly without the need to add user_len after the mempcpy call.
> This way, there is no need to store user_len in an extra register or on stack.
>
> I don't know the conditions on other archs, whether memcpy is inlined or called, thus I added the s390-specific condition to s390-string.h.
>
Right, but I *think* compiler would be smart enough to just avoid the
extra spilling. Take this example for instance [1], using GCC 5.3
for s390x I see no difference in generated assembly if I the strategy
I proposed (-DMEMPCPY_TO_MEMCPY) to the s390 specific you are suggesting.
In the end, I am proposing that architecture specific micro-optimization
should be avoid in favor of a more specific one. Specially the one that
tend to avoid one or two extra spilling based on quite complex macro
expansion.
[1] http://pastie.org/10824072