This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Mon, Jul 29, 2013 at 05:42:02AM -0400, ling.ma.program@gmail.com wrote: > From: Ma Ling <ling.ml@alibaba-inc.com> > > In this patch we use the similar approach with memcpy to > avoid branch instructions and force destination to be aligned > with avx instruction. By gcc.403 benchmark we find memset > spend more time than memcpy by 5~20 times. > Another issue is if a big loop is really needed. I tested variant with big loop disabled on ivy bridge and for sizes upto 262144 performance is about same but from that a rep movsb becomes 20% faster. Ljuba, could you test also this case? size: 262144 0.44 0.45 0.44 0.46 0.44 0.43 0.44 0.44 0.44 0.45 0.45 0.45 0.46 0.44 0.45 0.44 0.44 0.46 0.45 0.44 0.46 0.44 0.44 0.44 0.44 0.45 0.45 0.48 0.44 0.44 size: 524288 0.54 0.47 0.45 0.55 0.45 0.45 0.55 0.44 0.46 0.53 0.45 0.46 0.52 0.45 0.44 0.54 0.45 0.44 0.54 0.44 0.45 0.55 0.44 0.45 0.52 0.44 0.46 0.54 0.45 0.45 > + ALIGN(4) > +L(gobble_data): > +#ifdef SHARED_CACHE_SIZE_HALF > + mov $SHARED_CACHE_SIZE_HALF, %r9 > +#else > + mov __x86_shared_cache_size_half(%rip), %r9 > +#endif > + shl $4, %r9 Getting half of cache size then multiplying it by 16 ? > + cmp %r9, %rdx > + ja L(gobble_big_data) > + mov %rax, %r9 > + mov %esi, %eax > + mov %rdx, %rcx > + rep stosb > + mov %r9, %rax > + vzeroupper > + ret > + > + ALIGN(4) > +L(gobble_big_data): > + sub $0x80, %rdx > +L(gobble_big_data_loop): > + vmovntdq %ymm0, (%rdi) > + vmovntdq %ymm0, 0x20(%rdi) > + vmovntdq %ymm0, 0x40(%rdi) > + vmovntdq %ymm0, 0x60(%rdi) > + lea 0x80(%rdi), %rdi > + sub $0x80, %rdx > + jae L(gobble_big_data_loop) > + vmovups %ymm0, -0x80(%r8) > + vmovups %ymm0, -0x60(%r8) > + vmovups %ymm0, -0x40(%r8) > + vmovups %ymm0, -0x20(%r8) > + vzeroupper > + sfence > + ret > + > +END (MEMSET) > +#endif
Attachment:
memset_big.tar.bz2
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |