Bug 10282

Summary: free() race in mcheck hooks
Product: glibc Reporter: Petr Baudis <pasky>
Component: libcAssignee: Ulrich Drepper <drepper.fsp>
Status: RESOLVED FIXED    
Severity: normal CC: esigra, glibc-bugs, herrold, qboosh, rdieter
Priority: P2 Flags: fweimer: security-
Version: 2.11   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Attachments: proposed patch
deadlock-free proposed patch

Description Petr Baudis 2009-06-14 23:04:15 UTC
In multi-threaded programs, we are seeing a lot of free() aborts with
MALLOC_CHECK_ turned on (our default settings) with glibc-2.10 on
openSUSE:Factory. A simple testcase is not easy to make, but I suppose
brute-forcing parallel free()s agressively enough would make it show up.

I think this locking change is the cause. In realloc_check(), the mutex is
explicitly taken when calling mem2chunk_check(), and mem2chunk_check appears to
be accessing other parts of the arena which I guess is unsafe without the mutex.

Shouldn't the mutex be held during mem2chunk_check()?
Comment 1 Petr Baudis 2009-06-14 23:04:50 UTC
Created attachment 3996 [details]
proposed patch
Comment 2 Petr Baudis 2009-06-15 15:42:58 UTC
It turns out that this introduces on the other hand a deadlock if
MALLOC_CHECK_=3, since malloc_printerr() tries to re-acquire the lock; the same
deadlock exists in top_check() currently, BTW.

I will attach a new patch as soon as I test it.
Comment 3 Petr Baudis 2009-06-15 22:37:44 UTC
Created attachment 4001 [details]
deadlock-free proposed patch

Revised patch; unfortunately, the ATOMIC_FASTBINS stuff makes the code fairly
ugly now... getting rid of the #if 0 bit might help a little.

Without this patch, this crashes in few tens of seconds on my four-core when
run with MALLOC_CHECK_=3:

/* compile with -fopenmp */
#include <stdlib.h>
#include <unistd.h>

int main(void)
{
#pragma omp parallel num_threads(256)
  while (1) {
    void *ptr = malloc(rand() % 65536);
    usleep((rand() % 100) * 100);
    free(ptr);
    usleep((rand() % 100) * 100);
  }
  return 0;
}
Comment 4 Michael Pyne 2009-11-16 23:15:02 UTC
I just wanted to point out that the bug is still present in glibc 2.11. The
second proposed patch works for me in both the testcase and (so far) in my KDE
workspace with MALLOC_CHECK_ enabled.

This bug is a concern for KDE developers because development versions of KDE
automatically set MALLOC_CHECK_ for glibc systems to attempt maximize early
error detection.  It's hard when merely enabling mcheck causes crashes of its
own though. Something in the combination of Qt4+glib and a couple of other KDE
programs (like Okular, KTorrent, and KNotify) trips across this race quite
frequently.

Since there appears to be a fix I'll go ahead and inform the KDE development
community so we can push for the fix to be implemented in distribution packages
while it's debated for glibc.
Comment 5 Petr Baudis 2009-11-16 23:25:43 UTC
That is quite strange, this appeared to me to have been fixed right before 2.11
release. And I cannot reproduce this bug anymore with 2.11 final. Are you sure
you are seeing the bug with that glibc version? Is that vanilla or in some
distribution? Does my testcase still trigger the bug for you?
Comment 6 Michael Pyne 2009-11-17 00:25:36 UTC
(In reply to comment #5)
> That is quite strange, this appeared to me to have been fixed right before
2.11
> release. And I cannot reproduce this bug anymore with 2.11 final. Are you
sure
> you are seeing the bug with that glibc version? Is that vanilla or in some
> distribution? Does my testcase still trigger the bug for you?

This is in glibc 2.11 as distributed by Gentoo that I see it. The vanilla USE
flag is disabled so they apply whatever Gentoo magic it is that makes things
happen. However the mcheck fix patch applied cleanly and I can't believe Gentoo
would create a patch to revert that fix.

According to gitweb the affected file (malloc/hooks.c) was last updated
2009-04-17 in the glibc 2.11 tag
(http://sourceware.org/git/?p=glibc.git;a=history;f=malloc/hooks.c;h=622a815f32

Your testcase still triggered the bug (and quite expeditiously too).
Comment 7 Petr Baudis 2009-11-17 00:49:39 UTC
Aha, you are right, I'm sorry; a fix was committed right after 2.11 was tagged,
and in SUSE I took a later commit for our 2.11 build.

Anyway, I have already cherry-picked the fix for the 2.11 stable branch and this
will be included in 2.11.1, to be released before the end of November.
Comment 8 Ondrej Bilka 2013-10-13 07:36:32 UTC
*** Bug 770 has been marked as a duplicate of this bug. ***
Comment 9 Jackie Rosen 2014-02-16 19:41:03 UTC Comment hidden (spam)