[PATCH v3 3/5] malloc: Arena is not needed for tcache path in free()
H.J. Lu
hjl.tools@gmail.com
Tue Nov 26 06:20:25 GMT 2024
On Tue, Nov 26, 2024, 11:41 AM Guo, Wangyang <wangyang.guo@intel.com> wrote:
>
> On 11/26/2024 11:23 AM, H.J. Lu wrote:
>
>
>
> On Tue, Nov 26, 2024, 10:55 AM Guo, Wangyang <wangyang.guo@intel.com> wrote:
>>
>> On 11/25/2024 7:58 PM, H.J. Lu wrote:
>>
>> On Mon, Nov 25, 2024 at 6:34 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Thu, Aug 29, 2024 at 2:31 PM Wangyang Guo <wangyang.guo@intel.com> wrote:
>>
>> Arena is not needed for _int_free_check() in non-DEBUG mode. This commit
>> defers arena deference to _int_free_chunk() thus accelerate tcache path.
>> When DEBUG enabled, arena can be obtained from p in do_check_inuse_chunk().
>>
>> Result of bench-malloc-thread benchmark
>>
>> Test Platform: Xeon-8380
>> Ratio: New / Original time_per_iteration (Lower is Better)
>>
>> Threads# | Ratio
>> -----------|------
>> 1 thread | 0.994
>> 4 threads | 0.968
>>
>> The data shows it can brings 3% performance gain in multi-thread scenario.
>>
>> ---
>> Changes in v2:
>> - _int_free_check() should be put outside of USE_TCACHE.
>> - Link to v1: https://sourceware.org/pipermail/libc-alpha/2024-August/159360.html
>> ---
>> malloc/malloc.c | 10 ++++++++--
>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/malloc/malloc.c b/malloc/malloc.c
>> index 264f35e1a3..efb5292e9f 100644
>> --- a/malloc/malloc.c
>> +++ b/malloc/malloc.c
>> @@ -2143,6 +2143,9 @@ do_check_inuse_chunk (mstate av, mchunkptr p)
>> {
>> mchunkptr next;
>>
>> + if (av == NULL)
>> + av = arena_for_chunk (p);
>> +
>> do_check_chunk (av, p);
>>
>> if (chunk_is_mmapped (p))
>> @@ -3447,9 +3450,10 @@ __libc_free (void *mem)
>> /* Mark the chunk as belonging to the library again. */
>> (void)tag_region (chunk2mem (p), memsize (p));
>>
>> - ar_ptr = arena_for_chunk (p);
>> INTERNAL_SIZE_T size = chunksize (p);
>> - _int_free_check (ar_ptr, p, size);
>> + /* av is not needed for _int_free_check in non-DEBUG mode,
>> + in DEBUG mode, av will fetch from p in do_check_inuse_chunk. */
>> + _int_free_check (NULL, p, size);
>>
>> If _int_free_check doesn't need av, can you push it further down?
>>
>> Something like this?
>>
>> The problem is _init_free is also used in other places. We can not make sure `av` always got from `arena_for_chunk (p)`.
>
>
> Have you looked at my change?
>
> Your patch try to remove `av` parameters in _int_free_check:
>
> @@ -4684,7 +4693,7 @@ _int_free (mstate av, mchunkptr p, int have_lock)
>
> size = chunksize (p);
>
> - _int_free_check (av, p, size);
> + _int_free_check (p, size);
>
> And fetch `av` by `arena_for_chunk (p)`.
>
> +static inline void +do_check_inuse_chunk_fast (mstate av, mchunkptr p) +{ + mstate av = arena_for_chunk (p); + do_check_inuse_chunk (av, p); +}
>
> My point is for _int_free(av, p, have_lock), if `av` is not equals to `arena_for_chunk (p)`, then we can not simply omit it from `_int_free_check`.
Is inlining _int_free good enough for performance improvement?
H.J.
More information about the Libc-alpha
mailing list