POSIX describes the POSIX_MADV_DONTNEED parameter to posix_madvise as follows: POSIX_MADV_DONTNEED Specifies that the application expects that it will not access the specified range in the near future. Linux describes and implements the MADV_DONTNEED parameter to madvise as follows: MADV_DONTNEED Do not expect access in the near future. (For the time being, the application is finished with the given range, so the kernel can free resources associated with it.) Subsequent accesses of pages in this range will succeed, but will result either in re-loading of the memory contents from the underlying mapped file (see mmap()) or zero-fill-on-demand pages for mappings without an underlying file. glibc transparently forwards calls to posix_madvise to madvise, which means that POSIX conformant applications which use posix_madvise(addr, len, POSIX_MADV_DONTNEED) will corrupt data. Suggested fix: Implement posix_madvise as a small wrapper around madvise which silently discards all calls using POSIX_MADV_DONTNEED, fails for values other than POSIX_MADV_*, and forwards the remainder.
alternate suggestion: rename POSIX_MADV_DONTNEED to POSIX_MADV_DISCARD_NP (keeping the same value), add a new POSIX_MADV_DONTNEED which is silently ignored.
MADV_DONTNEED is nowadays described as this: * MADV_DONTNEED - the application is finished with the given range, * so the kernel can free resources associated with it. Where does your second part of the description come from? There has been discussion about this on lkml but I didn't follow it. What is the outcome? Is what you say indeed true and will remain true?
The second part of the description comes from the man-pages manual of madvise(2). The kernel comment is as follows (from linux/mm/madvise.c): /* * Application no longer needs these pages. If the pages are dirty, * it's OK to just throw them away. The app will be more careful about * data it wants to keep. Be sure to free swap resources too. The * zap_page_range call sets things up for refill_inactive to actually free * these pages later if no one else has touched them in the meantime, * although we could add these pages to a global reuse list for * refill_inactive to pick up before reclaiming other pages. * * NB: This interface discards data rather than pushes it out to swap, * as some implementations do. This has performance implications for * applications like large transactional databases which want to discard * pages in anonymous maps after committing to backing store the data * that was kept in them. There is no reason to write this data out to * the swap area if the application is discarding it. * * An interface that causes the system to free clean pages and flush * dirty pages is already available as msync(MS_INVALIDATE). */ static long madvise_dontneed(struct vm_area_struct * vma, struct vm_area_struct ** prev, unsigned long start, unsigned long end) and my reading of the implementation of madvise_dontneed() is that the comment is accurate. I have no knowledge of any lkml discussions, but my quick search turned up http://lkml.org/lkml/2006/1/16/105 -- which doesn't seem to have gone anywhere.
I've added code to ignore POSIX_MADV_DONTNEED for now. I'm not going to add a new POSIX_MADV_ value. It's non-standrd anyway so people can use madvise.
thread was http://marc.theaimsgroup.com/?l=linux-kernel&m=111996850004771&w=2 in June 2005, a bug entry at http://bugzilla.kernel.org/show_bug.cgi?id=6282 and again a thread http://marc.theaimsgroup.com/?l=linux-kernel&m=113745993804157&w=2
Subject: Bug 3458 CVSROOT: /cvs/glibc Module name: libc Branch: glibc-2_5-branch Changes by: jakub@sourceware.org 2007-07-12 14:56:42 Modified files: . : ChangeLog sysdeps/unix/sysv/linux: syscalls.list Added files: sysdeps/unix/sysv/linux: posix_madvise.c Log message: 2007-02-21 Ulrich Drepper <drepper@redhat.com> [BZ #3458] * sysdeps/unix/sysv/linux/posix_madvise.c: New file. * sysdeps/unix/sysv/linux/syscalls.list: Remove posix_madvise entry. Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/libc/ChangeLog.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=1.10362.2.47&r2=1.10362.2.48 http://sourceware.org/cgi-bin/cvsweb.cgi/libc/sysdeps/unix/sysv/linux/posix_madvise.c.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=NONE&r2=1.2.6.1 http://sourceware.org/cgi-bin/cvsweb.cgi/libc/sysdeps/unix/sysv/linux/syscalls.list.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=1.127&r2=1.127.2.1