This is sources Bugzilla
Bugzilla Version 2.17.5
Bugzilla Bug 10282
  free() race in mcheck hooks Last modified: 2009-12-30 19:51
     Query page      Enter new bug
Bug#: 10282   Hardware:   Reporter: Petr Baudis <pasky@suse.cz>
Host: Target: Build:
Product:     Add CC:
Component:   Version:   CC:
Remove selected CCs
Status: RESOLVED   Priority:  
Resolution: FIXED   Severity:  
Assigned To: Ulrich Drepper <drepper@redhat.com>   Target Milestone:  
Flags: Requestee:
  backport ()
  examined ()
  testsuite ()
Summary:
Keywords:

Attachment Description Type Created Actions
glibc-2.10-mcheck-free-race.diff proposed patch patch 2009-06-14 23:04 Edit | Diff
glibc-2.10-mcheck-free-race2.diff deadlock-free proposed patch patch 2009-06-15 22:37 Edit | Diff
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 10282 depends on: Show dependency tree
Show dependency graph
Bug 10282 blocks:

Additional Comments:


Leave as RESOLVED FIXED
Reopen bug
Mark bug as VERIFIED

View Bug Activity   |   Format For Printing


Description:   Last confirmed: 0000-00-00 00:00 Opened: 2009-06-14 23:04
In multi-threaded programs, we are seeing a lot of free() aborts with
MALLOC_CHECK_ turned on (our default settings) with glibc-2.10 on
openSUSE:Factory. A simple testcase is not easy to make, but I suppose
brute-forcing parallel free()s agressively enough would make it show up.

I think this locking change is the cause. In realloc_check(), the mutex is
explicitly taken when calling mem2chunk_check(), and mem2chunk_check appears to
be accessing other parts of the arena which I guess is unsafe without the mutex.

Shouldn't the mutex be held during mem2chunk_check()?

------- Additional Comment #1 From Petr Baudis 2009-06-14 23:04 -------
Created an attachment (id=3996)
proposed patch

------- Additional Comment #2 From Petr Baudis 2009-06-15 15:42 -------
It turns out that this introduces on the other hand a deadlock if
MALLOC_CHECK_=3, since malloc_printerr() tries to re-acquire the lock; the same
deadlock exists in top_check() currently, BTW.

I will attach a new patch as soon as I test it.

------- Additional Comment #3 From Petr Baudis 2009-06-15 22:37 -------
Created an attachment (id=4001)
deadlock-free proposed patch

Revised patch; unfortunately, the ATOMIC_FASTBINS stuff makes the code fairly
ugly now... getting rid of the #if 0 bit might help a little.

Without this patch, this crashes in few tens of seconds on my four-core when
run with MALLOC_CHECK_=3:

/* compile with -fopenmp */
#include <stdlib.h>
#include <unistd.h>

int main(void)
{
#pragma omp parallel num_threads(256)
  while (1) {
    void *ptr = malloc(rand() % 65536);
    usleep((rand() % 100) * 100);
    free(ptr);
    usleep((rand() % 100) * 100);
  }
  return 0;
}

------- Additional Comment #4 From Michael Pyne 2009-11-16 23:15 -------
I just wanted to point out that the bug is still present in glibc 2.11. The
second proposed patch works for me in both the testcase and (so far) in my KDE
workspace with MALLOC_CHECK_ enabled.

This bug is a concern for KDE developers because development versions of KDE
automatically set MALLOC_CHECK_ for glibc systems to attempt maximize early
error detection.  It's hard when merely enabling mcheck causes crashes of its
own though. Something in the combination of Qt4+glib and a couple of other KDE
programs (like Okular, KTorrent, and KNotify) trips across this race quite
frequently.

Since there appears to be a fix I'll go ahead and inform the KDE development
community so we can push for the fix to be implemented in distribution packages
while it's debated for glibc.

------- Additional Comment #5 From Petr Baudis 2009-11-16 23:25 -------
That is quite strange, this appeared to me to have been fixed right before 2.11
release. And I cannot reproduce this bug anymore with 2.11 final. Are you sure
you are seeing the bug with that glibc version? Is that vanilla or in some
distribution? Does my testcase still trigger the bug for you?

------- Additional Comment #6 From Michael Pyne 2009-11-17 00:25 -------
(In reply to comment #5)
> That is quite strange, this appeared to me to have been fixed right before
2.11
> release. And I cannot reproduce this bug anymore with 2.11 final. Are you
sure
> you are seeing the bug with that glibc version? Is that vanilla or in some
> distribution? Does my testcase still trigger the bug for you?

This is in glibc 2.11 as distributed by Gentoo that I see it. The vanilla USE
flag is disabled so they apply whatever Gentoo magic it is that makes things
happen. However the mcheck fix patch applied cleanly and I can't believe Gentoo
would create a patch to revert that fix.

According to gitweb the affected file (malloc/hooks.c) was last updated
2009-04-17 in the glibc 2.11 tag
(http://sourceware.org/git/?p=glibc.git;a=history;f=malloc/hooks.c;h=622a815f32

Your testcase still triggered the bug (and quite expeditiously too).

------- Additional Comment #7 From Petr Baudis 2009-11-17 00:49 -------
Aha, you are right, I'm sorry; a fix was committed right after 2.11 was tagged,
and in SUSE I took a later commit for our 2.11 build.

Anyway, I have already cherry-picked the fix for the 2.11 stable branch and this
will be included in 2.11.1, to be released before the end of November.

     Query page      Enter new bug
Actions: New | Query | bug # | Reports | Requests   New Account | Log In