This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH RFC V2] Improve 64bit memcpy/memove for Corei7 with unaligned avx instruction
- From: Ling Ma <ling dot ma dot program at gmail dot com>
- To: Ondřej Bílka <neleai at seznam dot cz>
- Cc: libc-alpha at sourceware dot org, liubov dot dmitrieva at gmail dot com, Ma Ling <ling dot ml at alibaba-inc dot com>
- Date: Fri, 12 Jul 2013 22:44:19 +0800
- Subject: Re: [PATCH RFC V2] Improve 64bit memcpy/memove for Corei7 with unaligned avx instruction
- References: <1373547096-8095-1-git-send-email-ling dot ma dot program at gmail dot com> <20130712043608 dot GA8886 at domone dot PAOCY>
> Wait doing prefetching memory at read and having nontemporal stores?
> These aims are contradictory and if you want best memcpy performance
> do not use nontemporal store and when we do not want to trash cache we
> do not use prefetch and load use nontemporal loads.
>
> Also following code does not use avx. Is it intentional or could it
> improve performance?
Ling: We use non-temporary store to avoid read-for-ownership,
write-back memory page asks us to read data from memory first , then
combine it with latest modification, at last write cacheline into L1
cache if data was not in L1 cache. we use non-temporary instruction to
combine data into one cache line and directly write it into memory, it
helps us to avoid redundant operation. We have to read source data so
introduce prefetch instruction, our results shows prefetchnta is
better for us, we will replace prefecht0 with it in next version.
Thanks Ondra!