This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH RFC] Improve 64bit memset for Corei7 with avx2 instruction

On Tue, Jul 30, 2013 at 5:38 AM, Ling Ma <> wrote:
> 2013/7/30, OndÅej BÃlka <>:
>> On Tue, Jul 30, 2013 at 05:26:09PM +0800, Ling Ma wrote:
>>> We never find prefetcht1 is good instruction to pre-fetch data on
>>> core2, nehalem, sandybridge, and haswell. Our experiments  show
>>> prefetchw is best in your cases.
>> But your code was following:
> Ling: yes, i say we find in your case, prefetchw is the best,
> and we also say we will do further test to verify whether prefetchw is
> better in gcc.403 cases too, if prefetchw is better in gcc.403, we
> will replace prefetcht0 with prefetchw.
>> +L(gobble_128_loop):
>> +       prefetcht0      0x1c0(%rdi)
>> +       vmovaps %ymm0, (%rdi)
>> +       prefetcht0      0x280(%rdi)
>> +       vmovaps %ymm0, 0x20(%rdi)
>> +       vmovaps %ymm0, 0x40(%rdi)
>> +       vmovaps %ymm0, 0x60(%rdi)
>> +       lea     0x80(%rdi), %rdi
>> +       sub     $0x80, %rdx
>> +       jae     L(gobble_128_loop)
>> Which uses prefetcht0 (A prefetcht1 in mine benchmark was typo.)
>> I updated benchmark (attached) with your code with and without prefetching.
>> 1)
>> Ljuba could you test it on haswell?
> Ling: Ljuba, please also append with prefetchw, thanks.
>>> In your code, memset only handle 256 bytes, in this case we don't need
>>> to use prefetch because hardware prefetch is enough for us in small
>>> size, but it can tell us whether prefetch will hurt performance so we
>> Does haswell improved hardware prefetcher to fetch from next page? I
>> changed layout of benchmark so that data ends at page boundary.
> Ling: we use software prefetch is becuase it have longer stride than
> hw prefetch,
> so it is good for bigger size.
>>> run it, result is below, it indicates prefetchw on haswell is
>>> harmless, even it is redundant code in memset on haswell.
>> Your test was invalid as you did compared apples with oranges
>> (prefetcht0 vs prefetchw) To see how your code fares you should replace
>> it with your implementation with and without prefetch.
>> You need that to be exactly what you submitted and if that means
>> prefetchw then post new version.

There is no need to test prefetchw on Haswll since it isn't
supported. I think this is a rare case where prefetcht0 helps.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]