Bug 25817 - Tune the number of malloc arenas based on the CPU affinity mask
Summary: Tune the number of malloc arenas based on the CPU affinity mask
Status: RESOLVED WONTFIX
Alias: None
Product: glibc
Classification: Unclassified
Component: malloc (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-04-13 22:02 UTC by Ian Lance Taylor
Modified: 2024-01-11 09:41 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
Project(s) to access:
ssh public key:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ian Lance Taylor 2020-04-13 22:02:25 UTC
The function __get_nprocs, which is called by arena_get2 in malloc/arena.c, reads the file /sys/devices/system/cpu/online.  As far as I can tell, essentially the same information can be obtained more efficiently and more accurately by calling sched_getaffinity and counting the number of CPUs returned.

I would encourage glibc to change the code in malloc/arena.c to call sched_getaffinity if possible.

(This came up in a lengthy investigation of https://golang.org/issue/25628, in which a Go program that tried to test that there were no unexpected open file descriptors would fail because it would occasionally encounter the file descriptor opened by __get_nprocs.)
Comment 1 Florian Weimer 2020-04-14 09:13:00 UTC
Given that more and more kernel information is available only via /sys and /proc, the test expectations are not really valid. We have to open file descriptors behind the scenes occasionally, and it is impossible to predict what the future will bring in this regard.

For malloc tuning, it may indeed be more reasonable to look at the current affinity mask. But we cannot do so by changing __get_nprocs because some software depends on that returning 1 if the system is a true uniprocessor system because it uses this information to elide barriers, depending on the return value of sysconf (_SC_NPROCESSORS_ONLN). See bug 21542.

For the malloc tuning, this does not matter because the locks are there no matter whether the system is SMP or not. (Internally, we have a different way to elide atomics on some architectures, by parsing uname -v output. I'm sure you didn't want to know that.)
Comment 2 Ian Lance Taylor 2020-04-17 00:50:45 UTC
Yeah, I didn't mean to suggest that __get_nprocs should be changed.  It seems to me that the malloc tuning should be changed.

Thanks for the info on other files being opened.  Go programs call very few C library functions.  Still, it may well be that we can't run this test when linking against glibc.
Comment 3 Florian Weimer 2024-01-11 09:41:22 UTC
We tried to implement this, but it didn't work and had to reverted:

commit 472894d2cfee5751b44c0aaa71ed87df81c8e62e
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Wed Oct 11 13:43:56 2023 -0300

    malloc: Use __get_nprocs on arena_get2 (BZ 30945)
    
    This restore the 2.33 semantic for arena_get2.  It was changed by
    11a02b035b46 to avoid arena_get2 call malloc (back when __get_nproc
    was refactored to use an scratch_buffer - 903bc7dcc2acafc).  The
    __get_nproc was refactored over then and now it also avoid to call
    malloc.
    
    The 11a02b035b46 did not take in consideration any performance
    implication, which should have been discussed properly.  The
    __get_nprocs_sched is still used as a fallback mechanism if procfs
    and sysfs is not acessible.
    
    Checked on x86_64-linux-gnu.
    Reviewed-by: DJ Delorie <dj@redhat.com>