[PATCH v2 0/4] malloc: Improve Huge Page support

Adhemerval Zanella adhemerval.zanella@linaro.org
Thu Aug 19 12:04:43 GMT 2021



On 19/08/2021 08:48, Siddhesh Poyarekar wrote:
> On 8/19/21 4:56 PM, Adhemerval Zanella wrote:
>> I though about it, and decided to use two tunables because although
>> for mmap() system allocation both tunable are mutually exclusive
>> (since it does not make sense to madvise() a mmap(MAP_HUGETLB)
>> we still use sbrk() on main arena. The way I did for sbrk() is to align
>> to the THP page size advertisen by the kernel, so using the tunable
>> does change the behavior slightly (it is not 'transparent' as the
>> madvise call).
>>
>> So to use only one tunable would require to either drop the sbrk()
>> madvise when MAP_HUGETLB is used, move it to another tunable (say
>> '3: HugeTLB enabled with default hugepage size and madvise() on sbrk()),
>> or assume it when huge pages should be used.
>>
>> (and how do we handle sbrk() with explicit size?)
>>
>> If one tunable is preferable I think it would be something like:
>>
>> 0: Disabled (default)
>> 1: Transparent, where we emulate "always" behaviour of THP
>>     sbrk() is also aligned to huge page size and issued madvise()
>> 2: HugeTLB enabled with default hugepage size and sbrk() as
>>     handled are 1
>>> <size>: HugeTLB enabled with the specified page size and sbrk()
>>     are handled as 1
>>
>> By forcing the sbrk() and madvise() on all tunables value make
>> the expectation to use huge pages in all possible occasions.
> 
> What do you think about using mmap instead of sbrk for (2) and <size> if hugetlb is requested?  It kinda emulates what libhugetlbfs does and makes the behaviour more consistent with what is advertised by the tunables.

I think this would be an additional tunable, we still need to handle
the case where mmap() fails either in default path (due maximum number
of mmap() per process by kernel or when the poll is exhausted for 
MAP_HUGETLB).

So for sbrk() call, should we align the increment to huge page and
issue the madvise() if the tunable is set to use huge pages?

> 
>>> A simple test like below in benchtests would be very useful to at least get an initial understanding of the behaviour differences with different tunable values.  Later those who care can add more relevant workloads.
>>
>> Yeah, I am open to suggestions on how to properly test it.  The issue
>> is we need to have specific system configuration either by proper
>> kernel support (THP) or with reserved large pages to actually test
>> it.
>>
>> For THP the issue is really 'transparent' for user, which means that
>> we will need to poke on specific Linux sysfs information to check if
>> huge pages are being used. And we might not get the expected answer
>> depending of the system load and memory utilization (the advised
>> pages might not be moved to large pages if there is no sufficient
>> memory).
> 
> For benchmarking we can make a minimal assumption that the user will set the system up to appropriately isolate the benchmarks.  As for the sysfs setup, we can always test and bail if unsupported.
> 
>>> You could add tests similar to mcheck and malloc-check, i.e. add $(tests-hugepages) to run all malloc tests again with the various tunable values.  See tests-mcheck for example.
>>
>> Ok, I can work with this.  This might not add much if the system is
>> not configured with either THP or with some huge page pool but at
>> least adds some coverage.
> 
> Yeah the main intent is to simply ensure that there are no differences in behaviour with hugepages.

Alright, I will add some tunable usage then.


More information about the Libc-alpha mailing list