posix_memalign performance regression in 2.38?

Tue Aug 8 15:08:19 GMT 2023

Xi Ruoyao <xry111@xry111.site> writes:
> Have you tested this?
>
> $ cat t.c
> #include <stdlib.h>
> int main()
> {
> 	void *buf;
> 	for (int i = 0; i < (1 << 16); i++)
> 		posix_memalign(&buf, 64, 64);
> }
>
> To me this is quite reasonable (if we just want many blocks each can fit
> into a cache line), but this costs 17.7 seconds on my system.  Do you
> think people just should avoid this?  If so we at least need to document
> the issue in the manual.

This is the worst possible way (at least, with glibc's malloc) to
allocate the blocks you want.  You should call mmap() once and break up
the memory it returns.  Note: this has *always* been the worst possible
way; the new code just makes "worst" worse.

We will, of course, consider the worst case scenario and try to optimize
or limit it.  You still should not write code like that.