[PATCH v4] malloc: Optimize small memory zeroing for calloc

Guo, Wangyang wangyang.guo@intel.com
Fri Nov 29 07:49:27 GMT 2024


On 11/29/2024 8:39 AM, H.J. Lu wrote:

> Add calloc-clear-memory.h to clear memory for calloc.  Use overlapping
> stores with 1 branch, instead of up to 3.  Unroll 17 times on x86-64 to
> support 32-byte vector stores as well as x32 when compiling glibc with
> -march=x86-64-v3.
>
> Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
> ---
>   malloc/malloc-internal.h              |  1 +
>   malloc/malloc.c                       | 35 +--------------
>   sysdeps/generic/calloc-clear-memory.h | 48 +++++++++++++++++++++
>   sysdeps/x86_64/calloc-clear-memory.h  | 61 +++++++++++++++++++++++++++
>   4 files changed, 111 insertions(+), 34 deletions(-)
>   create mode 100644 sysdeps/generic/calloc-clear-memory.h
>   create mode 100644 sysdeps/x86_64/calloc-clear-memory.h
Here is the performance data:

Test Platform: Xeon-8380
Ratio: New / Original time_per_iteration (Lower is Better)
Original Commit: "Add tcache path for calloc"

Without ISA level 3, CFLAGS="-march=x86-64-v2 -O3"
Threads# | Ratio
-----------|------
1 thread | 0.986
4 threads | 0.962

With ISA level 3, CFLAGS="-march=x86-64-v3 -O3"
Threads# | Ratio
-----------|------
1 thread | 1.006 -- within variant
4 threads | 0.969



More information about the Libc-alpha mailing list