This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCHv2] powerpc: P9 vector load instruction change in memcpy and memmove
- From: "Tulio Magno Quites Machado Filho" <tuliom at linux dot vnet dot ibm dot com>
- To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, libc-alpha at sourceware dot org
- Cc: raji at linux dot vnet dot ibm dot com
- Cc:
- Date: Thu, 19 Oct 2017 16:48:33 -0200
- Subject: Re: [PATCHv2] powerpc: P9 vector load instruction change in memcpy and memmove
- Authentication-results: sourceware.org; auth=none
- References: <20171019182056.11179-1-tuliom@linux.vnet.ibm.com> <bd6351dd-b6d2-e672-ea07-8659ec6a28a9@linaro.org>
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
> On 19/10/2017 16:20, Tulio Magno Quites Machado Filho wrote:
>> From: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
>>
>> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
>>
>>> According to "POWER8 Processor User’s Manual for the Single-Chip Module"
>>> (it is buried on a sign wall at [1]), both lxv2dx/lvx and stxvd2x/stvx
>>> uses the same pipeline, have the same latency and same throughput. The
>>> only difference is lxv2dx/stxv2x have microcode handling for unaligned
>>> case and for 4k crossing or 32-byte cross L1 miss (which should not
>>> occur in the with aligned address).
>>>
>>> Why not change POWER7 implementation instead of dropping another one
>>> which is exactly the same for POWER9?
>>
>> We're trying to limit the impact of this requirement on other processors so
>> that newer P7 or P8 optimizations can still benefit from lxv2dx and stxvd2x.
>>
>> However, we could avoid source code duplication with the macros LVX and STVX
>> I propose here in version 2.
>> That way, we will postpone the copy to when/if a P7 optimization is
>> contributed.
>
> And which benefit will be exactly? For this specific case current code
> already only does aligned accesses, so it does not really matter whether
> you use VSX or VMX instruction. If I recall correctly, both lxv2dx/lvx
> and stxvd2x/stvx shows the same latency and throughput also for POWER7.
>
> I see no gain on using this POWER9 specific case where you could adjust
> POWER7 one.
There are no gains now. The problem arises when contributing a new
optimization, e.g. a memcpy optimization for POWER8 using lxv2dx or stxvd2x.
If POWER9 doesn't have its own implementation, this problem will appear again.
--
Tulio Magno