[PATCH v4 3/3] malloc: Add tcache path for calloc

H.J. Lu hjl.tools@gmail.com
Tue Nov 26 21:58:13 GMT 2024


On Tue, Nov 26, 2024 at 5:39 PM Guo, Wangyang <wangyang.guo@intel.com> wrote:
>
> On 11/26/2024 5:08 PM, H.J. Lu wrote:
>
> > On Tue, Nov 26, 2024 at 3:37 PM Wangyang Guo <wangyang.guo@intel.com> wrote:
> >> This commit add tcache support in calloc() which can largely improve
> >> the performance of small size allocation, especially in multi-thread
> >> scenario. clear_mem() and tcache_available() is split out as a helper
> >> function for better reusing the code.
> >>
> >> Also fix tst-safe-linking failure after enabling tcache. In previous,
> >> calloc() is used as a way to by-pass tcache in memory allocation and
> >> trigger safe-linking check in fastbins path. With tcache enabled, it
> >> needs extra workarounds to bypass tcache.
> >>
> >> Result of bench-malloc-thread benchmark
> >>
> >> Test Platform: Xeon-8380
> >> Bench Function: calloc
> >> Ratio: New / Original time_per_iteration (Lower is Better)
> >>
> >> Threads#   | Ratio
> >> -----------|------
> >> 1 thread   | 0.724
> >> 4 threads  | 0.534
> >>
> > Since you are working on calloc, please try this patch to see if
> > it improves performance on x86-64.
> >
> > Thanks.
>
> Look like the change is within variation. For bench-malloc-thread
> benchmark, the cycles spent in this area is not very hot.
>
> Test Platform: Xeon-8380
> Bench Function: calloc
> Ratio: New / Original time_per_iteration (Lower is Better)
>
> Threads#   | Ratio
> -----------|------
> 1 thread   | 0.993
> 4 threads  | 0.996
>

This patch reduces the number of branches from 3 to 1.  How
does it perform?

-- 
H.J.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-malloc-Optimize-small-memory-zeroing-for-calloc.patch
Type: text/x-patch
Size: 5559 bytes
Desc: not available
URL: <https://sourceware.org/pipermail/libc-alpha/attachments/20241127/cc8cbcd5/attachment-0001.bin>


More information about the Libc-alpha mailing list