[PATCH] malloc: Optimize small memory zeroing for calloc

H.J. Lu hjl.tools@gmail.com
Wed Nov 27 07:45:54 GMT 2024


For memory size up to 9 * INTERNAL_SIZE_T bytes, calloc has special codes
to clear the memory.  Add calloc-clear-memory.h to allow architecture
specific optimization.  On x86-64, it uses up to 1 branch, instead of 3,
and up to 5 stores, instead of 9, by using overlapping vector stores:

Test Platform: Xeon-8380
Bench Function: calloc
Ratio: New / Original time_per_iteration (Lower is Better)

Threads#   | Ratio
-----------|------
1 thread   | 0.953
4 threads  | 0.952

OK for master?

Thanks.

-- 
H.J.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-malloc-Optimize-small-memory-zeroing-for-calloc.patch
Type: text/x-patch
Size: 5780 bytes
Desc: not available
URL: <https://sourceware.org/pipermail/libc-alpha/attachments/20241127/629e5b41/attachment.bin>


More information about the Libc-alpha mailing list