This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: improving malloc
On Sun, Jan 06, 2013 at 05:13:06PM -0500, KOSAKI Motohiro wrote:
> >> I'm one of kernel memory folks and I'd like to explain how mvolatile() does.
> >> It give a hint that given ranges are discardable to kernel. Thus when getting
> >> memory pressure, kernel just drop such memory instead of swap out. It help
> >> to minimalize wrong returning memory cost.
> >>
> >> see https://lwn.net/Articles/531305/
> >> http://lwn.net/Articles/518130/
> >
> > When these pages are result of fragmentation caused by large malloc it
> > is wasteful to swap them.
>
> Current approach (using madvise(MADV_DONTNEED)) also can avoid swapping out.
> So, mvolatile don't make any difference from point of swapping view.
>
>
> > A possible alternative could be implement most of this in userspace by
> > callback that tells which pages can be zeroed.
>
> No comments. As you know, userspace implementation makes a lot of
> race in general. I'm not sure it can practical performance improvement,
> however, i can not comment it until i see actual code.
>
>
> >> btw, I don't understand Ondrej's "linked list" is which mechanism
> >> point to. Can anyone clarify?
> >
> > One can allocate >10 page requests with nearly zero fragmentation
> > (on 64-bit systems where address exhaustion is not problem.) and
> > quite slowly with calling mmap/munmap instead of malloc/free.
> >
> > Zeroing memory on that mmap(with some new flag) could be avoided
> > by kernel tracking and reusing unmaped memory.
>
> So, I don't disagree kernel can implement per-process unmapped memory
> cache. however I don't see any advantage because 1) it also need take
> mmap_sem and then it may be slower than madvise(DONTNEED) and
> 2) As you know, using M_TRIM_THRESHOLD=-1 can avoid zeroing memory
> completely. It is most efficient rather than any kernel mechanism.
>
1) I wanted something like madvise(DONTNEED) where memory becomes
undefined instead zero. On access some page from cache is supplied.
That cache can be maintained with atomic operations without lock.
2) Not with big allocations. In following /proc/PID/maps are same at
a) and c).
#include <malloc.h>
#include <stdlib.h>
#include <stdio.h>
int main(){int i,j;
mallopt( M_TRIM_THRESHOLD,-1);
char y[100];
scanf("%s",y); // a)
char *x= malloc(100000000);
for(i=0;i<100000000;i+=4096) x[i]=1;
scanf("%s",y); // b)
free(x);
scanf("%s",y); // c)
}
Also in userspace finding what pages are used is expensive so we end
with bigger rss.
> Maybe I'm overlooking anything. When posting actual code, probably I can talk
> about more productive comment.
--
hardware stress fractures