posix_memalign performance regression in 2.38?

Tue Aug 8 08:08:32 GMT 2023

On Mon, 2023-08-07 at 23:38 -0400, DJ Delorie wrote:
> 
> Reproduced.
> 
> In the case where I reproduced it, the most common problematic case was
> an allocation of 64-byte aligned chunks of 472 bytes, where 30 smallbin
> chunks were tested without finding a match.
> 
> The most common non-problematic case was a 64-byte-aligned request for
> 24 bytes.
> 
> There were a LOT of other size requests.  The smallest I saw was TWO
> bytes.  WHY?  I'm tempted to not fix this, to teach developers to not
> use posix_memalign() unless they REALLY need it ;-)

Have you tested this?

$ cat t.c
#include <stdlib.h>
int main()
{
	void *buf;
	for (int i = 0; i < (1 << 16); i++)
		posix_memalign(&buf, 64, 64);
}

To me this is quite reasonable (if we just want many blocks each can fit
into a cache line), but this costs 17.7 seconds on my system.  Do you
think people just should avoid this?  If so we at least need to document
the issue in the manual.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University