This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug libc/16830] memset performance regression
- From: "matz at suse dot de" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Wed, 21 May 2014 14:21:26 +0000
- Subject: [Bug libc/16830] memset performance regression
- Auto-submitted: auto-generated
- References: <bug-16830-131 at http dot sourceware dot org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=16830
--- Comment #3 from Michael Matz <matz at suse dot de> ---
Created attachment 7611
--> https://sourceware.org/bugzilla/attachment.cgi?id=7611&action=edit
Patch adding non-temporal stores
This patch adds non-temporal stores for large blocksizes. On my machine
(an Opteron) with libmicro benchmark "memset -s 10m" I get:
glibc 2.17:
prc thr usecs/call samples errors cnt/samp size
memset 1 1 2424.64635 97 0 20 10485760
glibc 2.19:
prc thr usecs/call samples errors cnt/samp size
memset 1 1 3539.25120 97 0 20 10485760
glibc 2.19 with patch:
prc thr usecs/call samples errors cnt/samp size
memset 1 1 2524.34610 102 0 20 10485760
So it's indeed the non-temporal stores that improve performance. It's still
a bit slower than it once was, but much more reasonable. The old code
filled 128 bytes per loop iteration, the new code only 64, which might explain
the last little difference. So the real patch should also use a 128 byte
loop.
--
You are receiving this mail because:
You are on the CC list for the bug.