Re: improving malloc

>> So, I don't disagree kernel can implement per-process unmapped memory
>> cache. however I don't see any advantage because 1) it also need take
>> mmap_sem and then it may be slower than madvise(DONTNEED) and
>> 2) As you know, using M_TRIM_THRESHOLD=-1 can avoid zeroing memory
>> completely. It is most efficient rather than any kernel mechanism.
> 1) I wanted something like madvise(DONTNEED) where memory becomes
> undefined instead zero. On access some page from cache is supplied.
> That cache can be maintained with atomic operations without lock.

First, kernel can't expose uninitialized page because it may contain
security sensitive data which other process freed. We have only a
chance to recycle
a memory which freed allocation process itself. So, kernel implementation
don't increase cache hit ratio.

Second, atomic ops don't help this situation because 1) atomic ops still
too expensive for allocation cache and 2) page reclaim need to invalidate
such cache and can't avoid full lock.

> 2) Not with big allocations. In following /proc/PID/maps are same at
> a) and c).
> #include <malloc.h>
> #include <stdlib.h>
> #include <stdio.h>
> int main(){int i,j;
>     mallopt( M_TRIM_THRESHOLD,-1);
>     char y[100];
>     scanf("%s",y); // a)
>     char *x= malloc(100000000);
>     for(i=0;i<100000000;i+=4096) x[i]=1;
>     scanf("%s",y); // b)
>     free(x);
>     scanf("%s",y); // c)
> }

Yes, then HPC folks uses MMAP_THRESHLD too. That said, current default
parameter is not extream fast. So, I agree we have a chance to improve it.

> Also in userspace finding what pages are used is expensive so we end
> with bigger rss.

Can you please elaborate a bit more why in userspace finding is expensive?
I have no seen any difference between userspace and kernel.

