This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] Improving memcpy and memset.


Another area of optimization that I want return to are memset and
memcpy. Results are here

Ljuba, could you also test them on haswell, it will be useful to know.

I did several experiments.

One is using stosq/movsq as generated with  
gcc -mstringop-strategy=rep_8byte memset_rep8.c -S -o memset_rep8.s
Results here are chaotic.

When data is already at L1 cache then our loops are more effective. 
When I increassed cache pressure to have data in L2 cache then I loops
are still better.
But when data are L3 cache or when data is in main memory then for nehalem, core2 and ivy bridge
cases a rep implementation is significantly faster. 

On other hand bulldozer has rep implementation always slower.

For rest of architectures results are chaotic. 

A question if to switch to rep foosq depends on memory behavior, I do
not have simple answer other than decide by profile based optimization.

Second experiment was was how effective could computed jump be. 
I eliminated several overheads, like jump table improved cache usage.
My table is upto size 1024 but if I cut it to same size as headeds of
other implementation a table would be more space effective.
But performance is still inferior. See files memset/cpy_tbl.s

Third one is haswell specific, I consider to incerease memset/memcpy
header to handle upto 512 bytes. For memcpy I need to store additional
data to ymm registers. A memset could be done without that but I am
still interested if this extension (memset/cpy_512.s) is sucessfull.

Then I discovered that my memcpy implementation could be improved in
several cases, I wrote memcpy_new_tuned.s that tries to use more
effective control flow.

Comments, new ideas?

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]