Bug 10282 - free() race in mcheck hooks
Summary: free() race in mcheck hooks
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.11
: P2 normal
Target Milestone: ---
Assignee: Ulrich Drepper
URL:
Keywords:
: 770 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-06-14 23:04 UTC by Petr Baudis
Modified: 2014-07-01 16:37 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
proposed patch (483 bytes, patch)
2009-06-14 23:04 UTC, Petr Baudis
Details | Diff
deadlock-free proposed patch (543 bytes, patch)
2009-06-15 22:37 UTC, Petr Baudis
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Petr Baudis 2009-06-14 23:04:15 UTC
In multi-threaded programs, we are seeing a lot of free() aborts with
MALLOC_CHECK_ turned on (our default settings) with glibc-2.10 on
openSUSE:Factory. A simple testcase is not easy to make, but I suppose
brute-forcing parallel free()s agressively enough would make it show up.

I think this locking change is the cause. In realloc_check(), the mutex is
explicitly taken when calling mem2chunk_check(), and mem2chunk_check appears to
be accessing other parts of the arena which I guess is unsafe without the mutex.

Shouldn't the mutex be held during mem2chunk_check()?
Comment 1 Petr Baudis 2009-06-14 23:04:50 UTC
Created attachment 3996 [details]
proposed patch
Comment 2 Petr Baudis 2009-06-15 15:42:58 UTC
It turns out that this introduces on the other hand a deadlock if
MALLOC_CHECK_=3, since malloc_printerr() tries to re-acquire the lock; the same
deadlock exists in top_check() currently, BTW.

I will attach a new patch as soon as I test it.
Comment 3 Petr Baudis 2009-06-15 22:37:44 UTC
Created attachment 4001 [details]
deadlock-free proposed patch

Revised patch; unfortunately, the ATOMIC_FASTBINS stuff makes the code fairly
ugly now... getting rid of the #if 0 bit might help a little.

Without this patch, this crashes in few tens of seconds on my four-core when
run with MALLOC_CHECK_=3:

/* compile with -fopenmp */
#include <stdlib.h>
#include <unistd.h>

int main(void)
{
#pragma omp parallel num_threads(256)
  while (1) {
    void *ptr = malloc(rand() % 65536);
    usleep((rand() % 100) * 100);
    free(ptr);
    usleep((rand() % 100) * 100);
  }
  return 0;
}
Comment 4 Michael Pyne 2009-11-16 23:15:02 UTC
I just wanted to point out that the bug is still present in glibc 2.11. The
second proposed patch works for me in both the testcase and (so far) in my KDE
workspace with MALLOC_CHECK_ enabled.

This bug is a concern for KDE developers because development versions of KDE
automatically set MALLOC_CHECK_ for glibc systems to attempt maximize early
error detection.  It's hard when merely enabling mcheck causes crashes of its
own though. Something in the combination of Qt4+glib and a couple of other KDE
programs (like Okular, KTorrent, and KNotify) trips across this race quite
frequently.

Since there appears to be a fix I'll go ahead and inform the KDE development
community so we can push for the fix to be implemented in distribution packages
while it's debated for glibc.
Comment 5 Petr Baudis 2009-11-16 23:25:43 UTC
That is quite strange, this appeared to me to have been fixed right before 2.11
release. And I cannot reproduce this bug anymore with 2.11 final. Are you sure
you are seeing the bug with that glibc version? Is that vanilla or in some
distribution? Does my testcase still trigger the bug for you?
Comment 6 Michael Pyne 2009-11-17 00:25:36 UTC
(In reply to comment #5)
> That is quite strange, this appeared to me to have been fixed right before
2.11
> release. And I cannot reproduce this bug anymore with 2.11 final. Are you
sure
> you are seeing the bug with that glibc version? Is that vanilla or in some
> distribution? Does my testcase still trigger the bug for you?

This is in glibc 2.11 as distributed by Gentoo that I see it. The vanilla USE
flag is disabled so they apply whatever Gentoo magic it is that makes things
happen. However the mcheck fix patch applied cleanly and I can't believe Gentoo
would create a patch to revert that fix.

According to gitweb the affected file (malloc/hooks.c) was last updated
2009-04-17 in the glibc 2.11 tag
(http://sourceware.org/git/?p=glibc.git;a=history;f=malloc/hooks.c;h=622a815f32

Your testcase still triggered the bug (and quite expeditiously too).
Comment 7 Petr Baudis 2009-11-17 00:49:39 UTC
Aha, you are right, I'm sorry; a fix was committed right after 2.11 was tagged,
and in SUSE I took a later commit for our 2.11 build.

Anyway, I have already cherry-picked the fix for the 2.11 stable branch and this
will be included in 2.11.1, to be released before the end of November.
Comment 8 Ondrej Bilka 2013-10-13 07:36:32 UTC
*** Bug 770 has been marked as a duplicate of this bug. ***
Comment 9 Jackie Rosen 2014-02-16 19:41:03 UTC Comment hidden (spam)