[PATCH v3 2/5] malloc: Avoid func call for tcache quick path in free()
H.J. Lu
hjl.tools@gmail.com
Mon Nov 25 06:38:58 GMT 2024
On Mon, Nov 25, 2024 at 2:35 PM Guo, Wangyang <wangyang.guo@intel.com> wrote:
>
> On 11/25/2024 2:16 PM, H.J. Lu wrote:
>
> On Mon, Nov 25, 2024 at 1:37 PM Guo, Wangyang <wangyang.guo@intel.com> wrote:
>
> On 11/25/2024 12:25 PM, H.J. Lu wrote:
>
> On Thu, Aug 29, 2024 at 2:31 PM Wangyang Guo <wangyang.guo@intel.com> wrote:
>
> Tcache is an important optimzation to accelerate memory free(), things
> within this code path should be kept as simple as possible. This commit
> try to remove the function call when free() invokes tcache code path.
>
> Result of bench-malloc-thread benchmark
>
> Test Platform: Xeon-8380
> Ratio: New / Original time_per_iteration (Lower is Better)
>
> Threads# | Ratio
> -----------|------
> 1 thread | 0.904
> 4 threads | 0.919
>
> The performance data shows it can improve bench-malloc-thread benchmark
> by ~10% in single thread and ~8% in multi-thread scenario.
>
> ---
> Changes in v2:
> - _int_free_check() should be put outside of USE_TCACHE.
> - Link to v1: https://sourceware.org/pipermail/libc-alpha/2024-August/159359.html
> ---
> malloc/malloc.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index ef49a13ea7..264f35e1a3 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -3448,7 +3448,17 @@ __libc_free (void *mem)
> (void)tag_region (chunk2mem (p), memsize (p));
>
> ar_ptr = arena_for_chunk (p);
> - _int_free (ar_ptr, p, 0);
> + INTERNAL_SIZE_T size = chunksize (p);
> + _int_free_check (ar_ptr, p, size);
> +
> +#if USE_TCACHE
> + if (tcache_free (p, size))
> + {
> + __set_errno (err);
> + return;
> + }
> +#endif
> + _int_free_chunk (ar_ptr, p, size, 0);
> }
>
> Does this patch inline _int_free by hand? If yes, should
> _int_free be inlined instead?
>
> _int_free is a complex function, if inline the whole function directly, it will cause additional register spills and hurt the performance.
>
> But does your patch inline it by hand?
>
> Yes.
>
> I probably get what you means. Based on patch 1 (split _int_free into 3 functions), here it does need manual inline, just make _int_free 'inline' instead.
>
Yes, can you try adding inline to _int_free and compare results?
--
H.J.
More information about the Libc-alpha
mailing list