[PATCH v3 2/5] malloc: Avoid func call for tcache quick path in free()

H.J. Lu hjl.tools@gmail.com
Mon Nov 25 06:38:58 GMT 2024


On Mon, Nov 25, 2024 at 2:35 PM Guo, Wangyang <wangyang.guo@intel.com> wrote:
>
> On 11/25/2024 2:16 PM, H.J. Lu wrote:
>
> On Mon, Nov 25, 2024 at 1:37 PM Guo, Wangyang <wangyang.guo@intel.com> wrote:
>
> On 11/25/2024 12:25 PM, H.J. Lu wrote:
>
> On Thu, Aug 29, 2024 at 2:31 PM Wangyang Guo <wangyang.guo@intel.com> wrote:
>
> Tcache is an important optimzation to accelerate memory free(), things
> within this code path should be kept as simple as possible. This commit
> try to remove the function call when free() invokes tcache code path.
>
> Result of bench-malloc-thread benchmark
>
> Test Platform: Xeon-8380
> Ratio: New / Original time_per_iteration (Lower is Better)
>
> Threads#   | Ratio
> -----------|------
> 1 thread   | 0.904
> 4 threads  | 0.919
>
> The performance data shows it can improve bench-malloc-thread benchmark
> by ~10% in single thread and ~8% in multi-thread scenario.
>
> ---
> Changes in v2:
> - _int_free_check() should be put outside of USE_TCACHE.
> - Link to v1: https://sourceware.org/pipermail/libc-alpha/2024-August/159359.html
> ---
>  malloc/malloc.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index ef49a13ea7..264f35e1a3 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -3448,7 +3448,17 @@ __libc_free (void *mem)
>        (void)tag_region (chunk2mem (p), memsize (p));
>
>        ar_ptr = arena_for_chunk (p);
> -      _int_free (ar_ptr, p, 0);
> +      INTERNAL_SIZE_T size = chunksize (p);
> +      _int_free_check (ar_ptr, p, size);
> +
> +#if USE_TCACHE
> +      if (tcache_free (p, size))
> +       {
> +         __set_errno (err);
> +         return;
> +       }
> +#endif
> +      _int_free_chunk (ar_ptr, p, size, 0);
>      }
>
> Does this patch inline _int_free by hand?  If yes, should
> _int_free be inlined instead?
>
> _int_free is a complex function, if inline the whole function directly, it will cause additional register spills and hurt the performance.
>
> But does your patch inline it by hand?
>
> Yes.
>
> I probably get what you means. Based on patch 1 (split _int_free into 3 functions), here it does need manual inline, just make _int_free 'inline' instead.
>

Yes, can you try adding inline to _int_free and compare results?


-- 
H.J.


More information about the Libc-alpha mailing list