[PATCH v3 2/5] malloc: Avoid func call for tcache quick path in free()

H.J. Lu hjl.tools@gmail.com
Mon Nov 25 07:34:57 GMT 2024


On Mon, Nov 25, 2024 at 2:38 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Nov 25, 2024 at 2:35 PM Guo, Wangyang <wangyang.guo@intel.com> wrote:
> >
> > On 11/25/2024 2:16 PM, H.J. Lu wrote:
> >
> > On Mon, Nov 25, 2024 at 1:37 PM Guo, Wangyang <wangyang.guo@intel.com> wrote:
> >
> > On 11/25/2024 12:25 PM, H.J. Lu wrote:
> >
> > On Thu, Aug 29, 2024 at 2:31 PM Wangyang Guo <wangyang.guo@intel.com> wrote:
> >
> > Tcache is an important optimzation to accelerate memory free(), things
> > within this code path should be kept as simple as possible. This commit
> > try to remove the function call when free() invokes tcache code path.
> >
> > Result of bench-malloc-thread benchmark
> >
> > Test Platform: Xeon-8380
> > Ratio: New / Original time_per_iteration (Lower is Better)
> >
> > Threads#   | Ratio
> > -----------|------
> > 1 thread   | 0.904
> > 4 threads  | 0.919
> >
> > The performance data shows it can improve bench-malloc-thread benchmark
> > by ~10% in single thread and ~8% in multi-thread scenario.
> >
> > ---
> > Changes in v2:
> > - _int_free_check() should be put outside of USE_TCACHE.
> > - Link to v1: https://sourceware.org/pipermail/libc-alpha/2024-August/159359.html
> > ---
> >  malloc/malloc.c | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/malloc/malloc.c b/malloc/malloc.c
> > index ef49a13ea7..264f35e1a3 100644
> > --- a/malloc/malloc.c
> > +++ b/malloc/malloc.c
> > @@ -3448,7 +3448,17 @@ __libc_free (void *mem)
> >        (void)tag_region (chunk2mem (p), memsize (p));
> >
> >        ar_ptr = arena_for_chunk (p);
> > -      _int_free (ar_ptr, p, 0);
> > +      INTERNAL_SIZE_T size = chunksize (p);
> > +      _int_free_check (ar_ptr, p, size);
> > +
> > +#if USE_TCACHE
> > +      if (tcache_free (p, size))
> > +       {
> > +         __set_errno (err);
> > +         return;
> > +       }
> > +#endif
> > +      _int_free_chunk (ar_ptr, p, size, 0);
> >      }
> >
> > Does this patch inline _int_free by hand?  If yes, should
> > _int_free be inlined instead?
> >
> > _int_free is a complex function, if inline the whole function directly, it will cause additional register spills and hurt the performance.
> >
> > But does your patch inline it by hand?
> >
> > Yes.
> >
> > I probably get what you means. Based on patch 1 (split _int_free into 3 functions), here it does need manual inline, just make _int_free 'inline' instead.
> >
>
> Yes, can you try adding inline to _int_free and compare results?
>
>

Since this patch inlines _int_free by hand, if _int_free is changed,
one has to re-inline it by hand.  This may lead to subtle bugs.

-- 
H.J.


More information about the Libc-alpha mailing list