This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v1.2] Improve unaligned memcpy and memmove.


On Fri, Oct 04, 2013 at 05:46:51PM +0400, Liubov Dmitrieva wrote:
> I am surprised that rep is faster on Atom because Atom is known for slow reps...
> We should recheck it.
>
This does not suprise me much. Alternative there have complex control
flow which works well for ooo machines. Rep is exception as control flow
there is very simple. Second factor rep is much more icache friendly than
other implementations which small atom caches make more noticable.

A third factor could be data cache, when reading lot of aligned data from
main memory a rep is fastest alternative for most processors, see block
mode of
http://kam.mff.cuni.cz/~ondra/benchmark_string/core2/memcpy_profile_loop/results_rand_aligned_nocache/result.html

For rechecking I wrote a independent tool. It LD_PRELOAD given
implementation and measures total time spent. It calculates relative performance
with 95% confidence interval. 

This should count all factors but it has disadvantage that it is slow.
Difference caused by best and worst memcpy implementations is less than
1% so you need run it for day until variance becomes small enough. 

I ran this on varios processors, results with checker are here.

http://kam.mff.cuni.cz/~ondra/benchmark_string/memcpy_consistency.tar.bz2

Results would need more time, biggest problem with them is frequency
switching. When variance and mean suddenly jump by big amount it was
probably caused by being rescheduled to idle core.

Could you try to run in screen consistency benchmark, it is done by

./benchmark | tee result/atom

You can see accumulated results by running following script.

./rep


> You probably should join that memcpy patches into one to simplify
> review and to make clear what version for which processor will be
> finally used.
> 
I will post that when I will have time.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]