This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCHv2] powerpc: P9 vector load instruction change in memcpy and memmove



On 19/10/2017 16:48, Tulio Magno Quites Machado Filho wrote:
> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
> 
>> On 19/10/2017 16:20, Tulio Magno Quites Machado Filho wrote:
>>> From: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
>>>
>>> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
>>>
>>>> According to "POWER8 Processor User’s Manual for the Single-Chip Module"
>>>> (it is buried on a sign wall at [1]), both lxv2dx/lvx and stxvd2x/stvx
>>>> uses the same pipeline, have the same latency and same throughput.  The
>>>> only difference is lxv2dx/stxv2x have microcode handling for unaligned
>>>> case and for 4k crossing or 32-byte cross L1 miss (which should not
>>>> occur in the with aligned address).
>>>>
>>>> Why not change POWER7 implementation instead of dropping another one
>>>> which is exactly the same for POWER9?
>>>
>>> We're trying to limit the impact of this requirement on other processors so
>>> that newer P7 or P8 optimizations can still benefit from lxv2dx and stxvd2x.
>>>
>>> However, we could avoid source code duplication with the macros LVX and STVX
>>> I propose here in version 2.
>>> That way, we will postpone the copy to when/if a P7 optimization is
>>> contributed.
>>
>> And which benefit will be exactly? For this specific case current code 
>> already only does aligned accesses, so it does not really matter whether 
>> you use VSX or VMX instruction. If I recall correctly, both lxv2dx/lvx 
>> and stxvd2x/stvx shows the same latency and throughput also for POWER7.  
>>
>> I see no gain on using this POWER9 specific case where you could adjust
>> POWER7 one.
> 
> There are no gains now.  The problem arises when contributing a new
> optimization, e.g. a memcpy optimization for POWER8 using lxv2dx or stxvd2x.
> 
> If POWER9 doesn't have its own implementation, this problem will appear again.
> 

I think if eventually a POWER8 optimization could not be used as is for POWER9,
then a new ifunc variant would make sense.  But I still think we current
variant, a much simpler solutions (in code sense and maintainability) would be
to just adapt POWER7 variant to use VMX instructions.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]