Bug 16573 - mtrace hangs when MALLOC_TRACE is defined
Summary: mtrace hangs when MALLOC_TRACE is defined
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: malloc (show other bugs)
Version: 2.19
: P2 minor
Target Milestone: 2.30
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-13 11:39 UTC by Kwok Yeung
Modified: 2019-04-15 18:51 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Mtrace test program (118 bytes, text/plain)
2014-02-13 11:39 UTC, Kwok Yeung
Details
Patch to temporarily undo all mtrace hooks (641 bytes, patch)
2014-02-13 11:44 UTC, Kwok Yeung
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kwok Yeung 2014-02-13 11:39:57 UTC
Created attachment 7409 [details]
Mtrace test program

On any current Linux distribution with glibc, if mtrace_test.c is compiled and executed as follows, it displays:

$ MALLOC_TRACE="mtrace.log" ./mtrace_test
*** Error in `./mtrace_test': double free or corruption (fasttop): 0x0000000000efc460 ***

before hanging indefinitely. Checking the backtrace with GDB:

#0  __lll_lock_wait_private ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1  0x00007ffff7a99915 in _L_lock_45 () at mtrace.c:379
#2  0x00007ffff7a99088 in lock_and_info (
    caller=caller@entry=0x7ffff7de2c37 <_dl_map_object+1159>,
    mem=mem@entry=0x7fffffffd770) at mtrace.c:128
#3  0x00007ffff7a9931e in tr_mallochook (size=36,
    caller=0x7ffff7de2c37 <_dl_map_object+1159>) at mtrace.c:172
#4  0x00007ffff7de2c37 in local_strdup (
    s=0x7ffff7fe6883 "/lib/x86_64-linux-gnu/libgcc_s.so.1") at dl-load.c:162
#5  _dl_map_object (loader=loader@entry=0x7ffff7ff94c0,
    name=name@entry=0x7ffff7b94b26 "libgcc_s.so.1", type=type@entry=2,
    trace_mode=trace_mode@entry=0, mode=mode@entry=-1879048191,
    nsid=<optimised out>) at dl-load.c:2510
#6  0x00007ffff7dedd54 in dl_open_worker (a=a@entry=0x7fffffffddb8)
    at dl-open.c:239
#7  0x00007ffff7de96e6 in _dl_catch_error (
    objname=objname@entry=0x7fffffffdda8,
    errstring=errstring@entry=0x7fffffffddb0,
    mallocedp=mallocedp@entry=0x7fffffffdda0,
    operate=operate@entry=0x7ffff7dedc00 <dl_open_worker>,
    args=args@entry=0x7fffffffddb8) at dl-error.c:177
#8  0x00007ffff7ded809 in _dl_open (file=0x7ffff7b94b26 "libgcc_s.so.1",
    mode=-2147483647, caller_dlopen=<optimised out>, nsid=-2, argc=1,
    argv=0x7fffffffeb38, env=0x7fffffffeb48) at dl-open.c:667
#9  0x00007ffff7b49da2 in do_dlopen (ptr=ptr@entry=0x7fffffffdfc0)
    at dl-libc.c:87
#10 0x00007ffff7de96e6 in _dl_catch_error (objname=0x7fffffffdfa0,
    errstring=0x7fffffffdfb0, mallocedp=0x7fffffffdf90,
    operate=0x7ffff7b49d60 <do_dlopen>, args=0x7fffffffdfc0) at dl-error.c:177
#11 0x00007ffff7b49e62 in dlerror_run (args=0x7fffffffdfc0,
    operate=0x7ffff7b49d60 <do_dlopen>) at dl-libc.c:46
#12 __GI___libc_dlopen_mode (name=name@entry=0x7ffff7b94b26 "libgcc_s.so.1",
    mode=mode@entry=-2147483647) at dl-libc.c:163
#13 0x00007ffff7b242d9 in init () at ../sysdeps/x86_64/backtrace.c:52
#14 __GI___backtrace (array=array@entry=0x7fffffffe250, size=size@entry=64)
    at ../sysdeps/x86_64/backtrace.c:103
#15 0x00007ffff7a86515 in __libc_message (do_abort=do_abort@entry=2,
    fmt=fmt@entry=0x7ffff7b9a240 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:178
#16 0x00007ffff7a92996 in malloc_printerr (ptr=0x602460,
    str=0x7ffff7b9a408 "double free or corruption (fasttop)", action=3)
    at malloc.c:4923
#17 _int_free (av=<optimised out>, p=0x602450, have_lock=0) at malloc.c:3779
#18 0x00007ffff7a996b8 in tr_freehook (caller=0x400640 <main+51>, ptr=0x602460)
    at mtrace.c:158
#19 tr_freehook (ptr=0x602460, caller=0x400640 <main+51>) at mtrace.c:136
#20 0x0000000000400640 in main ()

The problem is that although tr_freehook replaces __free_hook with the old hook before calling free or the old hook, the malloc hook is still pointing to tr_mallochook. Further down the call chain, a malloc is performed, resulting in tr_mallochook being called. It tries to acquire the lock, which is still held by tr_freehook, resulting in a deadlock.
Comment 1 Kwok Yeung 2014-02-13 11:44:49 UTC
Created attachment 7410 [details]
Patch to temporarily undo all mtrace hooks

I think the easiest approach to fixing this is simply to undo all the hooks whenever an mtrace hook is called before proceeding, and redo them afterwards - I have attached a patch against the git master to do this. I suppose this might cause some memory errors to be missed in a multi-threaded program, but that is already somewhat the case already.
Comment 2 Ondrej Bilka 2014-02-13 12:24:28 UTC
On Thu, Feb 13, 2014 at 11:44:49AM +0000, kcy at codesourcery dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=16573
> 
> --- Comment #1 from Kwok Yeung <kcy at codesourcery dot com> ---
> Created attachment 7410 [details]
>   --> https://sourceware.org/bugzilla/attachment.cgi?id=7410&action=edit
> Patch to temporarily undo all mtrace hooks
> 
> I think the easiest approach to fixing this is simply to undo all the hooks
> whenever an mtrace hook is called before proceeding, and redo them afterwards -
> I have attached a patch against the git master to do this. I suppose this might
> cause some memory errors to be missed in a multi-threaded program, but that is
> already somewhat the case already.
> 
A proper solution would be replace a mtrace with better tool, as hook
approach is not thread safe and implementation is quite slow.
Comment 3 Sourceware Commits 2019-04-09 14:57:54 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  e621246ec6393ea08ae50310f9d5e72500f8c9bc (commit)
      from  648279f4af423c4783ec1dfa63cb7b46a7640217 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=e621246ec6393ea08ae50310f9d5e72500f8c9bc

commit e621246ec6393ea08ae50310f9d5e72500f8c9bc
Author: Carlos O'Donell <carlos@redhat.com>
Date:   Mon Apr 8 17:35:05 2019 -0400

    malloc: Set and reset all hooks for tracing (Bug 16573)
    
    If an error occurs during the tracing operation, particularly during a
    call to lock_and_info() which calls _dl_addr, we may end up calling back
    into the malloc-subsystem and relock the loader lock and deadlock. For
    all intents and purposes the call to _dl_addr can call any of the malloc
    family API functions and so we should disable all tracing before calling
    such loader functions.  This is similar to the strategy that the new
    malloc tracer takes when calling the real malloc, namely that all
    tracing ceases at the boundary to the real function and any faults at
    that point are the purvue of the library (though the new tracer does
    this on a per-thread basis in an MT-safe fashion). Since the new tracer
    and the hook deprecation are not yet complete we must fix these issues
    where we can.
    
    Tested on x86_64 with no regressions.
    
    Co-authored-by: Kwok Cheung Yeung <kcy@codesourcery.com>
    Reviewed-by: DJ Delorie <dj@redhat.com>

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog       |   15 +++++++++++
 malloc/mtrace.c |   72 +++++++++++++++++++++++++++++++++++--------------------
 2 files changed, 61 insertions(+), 26 deletions(-)
Comment 4 Carlos O'Donell 2019-04-09 14:59:28 UTC
Fixed in 2.30.
Comment 5 Sourceware Commits 2019-04-15 18:51:24 UTC
The release/2.29/master branch has been updated by Florian Weimer <fw@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=dcd2b97dd1d695445d45beb4daa815cfe06691dd

commit dcd2b97dd1d695445d45beb4daa815cfe06691dd
Author: Carlos O'Donell <carlos@redhat.com>
Date:   Mon Apr 15 20:49:32 2019 +0200

    malloc: Set and reset all hooks for tracing (Bug 16573)
    
    If an error occurs during the tracing operation, particularly during a
    call to lock_and_info() which calls _dl_addr, we may end up calling back
    into the malloc-subsystem and relock the loader lock and deadlock. For
    all intents and purposes the call to _dl_addr can call any of the malloc
    family API functions and so we should disable all tracing before calling
    such loader functions.  This is similar to the strategy that the new
    malloc tracer takes when calling the real malloc, namely that all
    tracing ceases at the boundary to the real function and any faults at
    that point are the purvue of the library (though the new tracer does
    this on a per-thread basis in an MT-safe fashion). Since the new tracer
    and the hook deprecation are not yet complete we must fix these issues
    where we can.
    
    Tested on x86_64 with no regressions.
    
    Co-authored-by: Kwok Cheung Yeung <kcy@codesourcery.com>
    Reviewed-by: DJ Delorie <dj@redhat.com>
    (cherry picked from commit e621246ec6393ea08ae50310f9d5e72500f8c9bc)