In multi-threaded programs, we are seeing a lot of free() aborts with MALLOC_CHECK_ turned on (our default settings) with glibc-2.10 on openSUSE:Factory. A simple testcase is not easy to make, but I suppose brute-forcing parallel free()s agressively enough would make it show up. I think this locking change is the cause. In realloc_check(), the mutex is explicitly taken when calling mem2chunk_check(), and mem2chunk_check appears to be accessing other parts of the arena which I guess is unsafe without the mutex. Shouldn't the mutex be held during mem2chunk_check()?
Created attachment 3996 [details] proposed patch
It turns out that this introduces on the other hand a deadlock if MALLOC_CHECK_=3, since malloc_printerr() tries to re-acquire the lock; the same deadlock exists in top_check() currently, BTW. I will attach a new patch as soon as I test it.
Created attachment 4001 [details] deadlock-free proposed patch Revised patch; unfortunately, the ATOMIC_FASTBINS stuff makes the code fairly ugly now... getting rid of the #if 0 bit might help a little. Without this patch, this crashes in few tens of seconds on my four-core when run with MALLOC_CHECK_=3: /* compile with -fopenmp */ #include <stdlib.h> #include <unistd.h> int main(void) { #pragma omp parallel num_threads(256) while (1) { void *ptr = malloc(rand() % 65536); usleep((rand() % 100) * 100); free(ptr); usleep((rand() % 100) * 100); } return 0; }
I just wanted to point out that the bug is still present in glibc 2.11. The second proposed patch works for me in both the testcase and (so far) in my KDE workspace with MALLOC_CHECK_ enabled. This bug is a concern for KDE developers because development versions of KDE automatically set MALLOC_CHECK_ for glibc systems to attempt maximize early error detection. It's hard when merely enabling mcheck causes crashes of its own though. Something in the combination of Qt4+glib and a couple of other KDE programs (like Okular, KTorrent, and KNotify) trips across this race quite frequently. Since there appears to be a fix I'll go ahead and inform the KDE development community so we can push for the fix to be implemented in distribution packages while it's debated for glibc.
That is quite strange, this appeared to me to have been fixed right before 2.11 release. And I cannot reproduce this bug anymore with 2.11 final. Are you sure you are seeing the bug with that glibc version? Is that vanilla or in some distribution? Does my testcase still trigger the bug for you?
(In reply to comment #5) > That is quite strange, this appeared to me to have been fixed right before 2.11 > release. And I cannot reproduce this bug anymore with 2.11 final. Are you sure > you are seeing the bug with that glibc version? Is that vanilla or in some > distribution? Does my testcase still trigger the bug for you? This is in glibc 2.11 as distributed by Gentoo that I see it. The vanilla USE flag is disabled so they apply whatever Gentoo magic it is that makes things happen. However the mcheck fix patch applied cleanly and I can't believe Gentoo would create a patch to revert that fix. According to gitweb the affected file (malloc/hooks.c) was last updated 2009-04-17 in the glibc 2.11 tag (http://sourceware.org/git/?p=glibc.git;a=history;f=malloc/hooks.c;h=622a815f32 Your testcase still triggered the bug (and quite expeditiously too).
Aha, you are right, I'm sorry; a fix was committed right after 2.11 was tagged, and in SUSE I took a later commit for our 2.11 build. Anyway, I have already cherry-picked the fix for the 2.11 stable branch and this will be included in 2.11.1, to be released before the end of November.
*** Bug 770 has been marked as a duplicate of this bug. ***
*** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Page where seen: http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla.