[PATCH v4] malloc: Optimize small memory zeroing for calloc
Guo, Wangyang
wangyang.guo@intel.com
Fri Nov 29 07:49:27 GMT 2024
On 11/29/2024 8:39 AM, H.J. Lu wrote:
> Add calloc-clear-memory.h to clear memory for calloc. Use overlapping
> stores with 1 branch, instead of up to 3. Unroll 17 times on x86-64 to
> support 32-byte vector stores as well as x32 when compiling glibc with
> -march=x86-64-v3.
>
> Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
> ---
> malloc/malloc-internal.h | 1 +
> malloc/malloc.c | 35 +--------------
> sysdeps/generic/calloc-clear-memory.h | 48 +++++++++++++++++++++
> sysdeps/x86_64/calloc-clear-memory.h | 61 +++++++++++++++++++++++++++
> 4 files changed, 111 insertions(+), 34 deletions(-)
> create mode 100644 sysdeps/generic/calloc-clear-memory.h
> create mode 100644 sysdeps/x86_64/calloc-clear-memory.h
Here is the performance data:
Test Platform: Xeon-8380
Ratio: New / Original time_per_iteration (Lower is Better)
Original Commit: "Add tcache path for calloc"
Without ISA level 3, CFLAGS="-march=x86-64-v2 -O3"
Threads# | Ratio
-----------|------
1 thread | 0.986
4 threads | 0.962
With ISA level 3, CFLAGS="-march=x86-64-v3 -O3"
Threads# | Ratio
-----------|------
1 thread | 1.006 -- within variant
4 threads | 0.969
More information about the Libc-alpha
mailing list