This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 4/4] S390: Implement mempcpy with help of memcpy. [BZ #19765]



On 27/04/2016 05:14, Stefan Liebler wrote:
> On 04/26/2016 03:32 PM, Adhemerval Zanella wrote:
>>
>>
>> On 26/04/2016 09:07, Stefan Liebler wrote:
>>> There exist optimized memcpy functions on s390, but no optimized mempcpy.
>>> This patch adds mempcpy entry points in memcpy.S files, which
>>> use the memcpy implementation. Now mempcpy itself is also an IFUNC function
>>> as memcpy is and the variants are listed in ifunc-impl-list.c.
>>>
>>> The s390 string.h defines _HAVE_STRING_ARCH_mempcpy and __mempcpy/mempcpy
>>> definitions. If n is constant and small enough, __mempcpy/mempcpy is an inlined
>>> function, which uses memcpy and returns dest + n. In this constant case,
>>> GCC emits instructions like mvi or mvc and avoids the function call to memcpy.
>>> If n is not constant, then __mempcpy is called.
>>>
>>
>> AFAIk GLIBC does not have any ports that implements mempcpy and not
>> memcpy and IMHO the inline version to convert mempcpy to memcpy is
>> always a gain.
>>
>> So I would suggest to just avoid adding arch-specific bits and move the
>> mempcpy inline optimization at string/string.h to string/string2.h as
>>
>> -- 
>>
>> #define mempcpy(dest, src, n) __mempcpy_inline (dest, src, n)
>> #define __mempcpy(dest, src, n) __mempcpy_inline (dest, src, n)
>>
>> __extern_always_inline void *
>> __mempcpy_inline (void *__restrict __dest,
>>                    const void *__restrict __src, size_t __n)
>> {
>>    return (char *) memcpy (__dest, __src, __n) + __n;
>> }
>>
>> -- 
>>
>> This is convert any mempcpy to mempcy plus length and provide
>> the symbol only for older binaries.
>>
> 
> According to the current behaviour on s390, a call to memcpy with constant byte-length below 65536 is done by emitting mvc, mvi or
> a suitable instruction sequence.
> If gcc don't know about a constant byte-length, then memcpy function
> in glibc is called.
> 
> For mempcpy cascades like in iconv/gconv_conf.c:
> 466:  __mempcpy (__mempcpy (__mempcpy (gconv_path, __gconv_path_envvar,
>                        user_len),
>                 ":", 1),
>              default_gconv_path, sizeof (default_gconv_path));
> 
> The current behaviour is to perform a call to memcpy(), add user_len to
> the return value of the call and use mvi, mvc for the two calls with constant byte-lengths.
> 
> If only _HAVE_STRING_ARCH_mempcpy is defined, there would be three calls
> to mempcpy. Here you are right, this is not what you want.
> 
> The behaviour with this patch is to call mempcpy() once and use mvi, mvc
> directly without the need to add user_len after the mempcpy call.
> This way, there is no need to store user_len in an extra register or on stack.
> 
> I don't know the conditions on other archs, whether memcpy is inlined or called, thus I added the s390-specific condition to s390-string.h.
> 

Right, but I *think* compiler would be smart enough to just avoid the
extra spilling. Take this example for instance [1], using GCC 5.3
for s390x I see no difference in generated assembly if I the strategy
I proposed (-DMEMPCPY_TO_MEMCPY) to the s390 specific you are suggesting.

In the end, I am proposing that architecture specific micro-optimization
should be avoid in favor of a more specific one.  Specially the one that
tend to avoid one or two extra spilling based on quite complex macro
expansion.

[1] http://pastie.org/10824072


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]