This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/16830] memset performance regression


https://sourceware.org/bugzilla/show_bug.cgi?id=16830

--- Comment #3 from Michael Matz <matz at suse dot de> ---
Created attachment 7611
  --> https://sourceware.org/bugzilla/attachment.cgi?id=7611&action=edit
Patch adding non-temporal stores

This patch adds non-temporal stores for large blocksizes.  On my machine
(an Opteron) with libmicro benchmark "memset -s 10m" I get:

glibc 2.17:
             prc thr   usecs/call      samples   errors cnt/samp     size
memset         1   1   2424.64635           97        0       20 10485760

glibc 2.19:
             prc thr   usecs/call      samples   errors cnt/samp     size
memset         1   1   3539.25120           97        0       20 10485760

glibc 2.19 with patch:
             prc thr   usecs/call      samples   errors cnt/samp     size
memset         1   1   2524.34610          102        0       20 10485760

So it's indeed the non-temporal stores that improve performance.  It's still
a bit slower than it once was, but much more reasonable.  The old code
filled 128 bytes per loop iteration, the new code only 64, which might explain
the last little difference.  So the real patch should also use a 128 byte
loop.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]