Bug 31460 - heap tracing causes infinite recursion on calloc with multi-threaded applications
Summary: heap tracing causes infinite recursion on calloc with multi-threaded applicat...
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: gprofng (show other bugs)
Version: 2.43
: P2 normal
Target Milestone: ---
Assignee: Vladimir Mezentsev
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-03-07 16:38 UTC by James Carlson
Modified: 2024-03-26 00:11 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2024-03-19 00:00:00


Attachments
Demonstrates infinite recursion with "-H on" (144 bytes, text/x-csrc)
2024-03-07 16:38 UTC, James Carlson
Details
proposed patch (2.48 KB, patch)
2024-03-24 02:09 UTC, Vladimir Mezentsev
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description James Carlson 2024-03-07 16:38:52 UTC
Created attachment 15391 [details]
Demonstrates infinite recursion with "-H on"

When setting "-H on" with "gprofng collect app", starting a new thread triggers infinite recursion. The crash looks like this in gdb:

#26129 0x00007f81ed538de5 in calloc (size=32, esize=16) at heaptrace.c:447
#26130 0x00007f81e7c9bbaf in ___pthread_setspecific (key=key@entry=38, value=value@entry=0x7f81e19fcac0) at ./nptl/pthread_setspecific.c:69
#26131 0x00007f81ed40bff7 in __collector_tsd_get_by_key (key_index=<optimized out>) at tsd.c:138
#26132 0x00007f81ed538de5 in calloc (size=32, esize=16) at heaptrace.c:447
#26133 0x00007f81e7c9bbaf in ___pthread_setspecific (key=key@entry=38, value=value@entry=0x7f81e19fcad0) at ./nptl/pthread_setspecific.c:69
#26134 0x00007f81ed40bff7 in __collector_tsd_get_by_key (key_index=<optimized out>) at tsd.c:138
#26135 0x00007f81ed538de5 in calloc (size=32, esize=16) at heaptrace.c:447
#26136 0x00007f81e7c9bbaf in ___pthread_setspecific (key=key@entry=40, value=value@entry=0x7f81e19fcae0) at ./nptl/pthread_setspecific.c:69
#26137 0x00007f81ed40bff7 in __collector_tsd_get_by_key (key_index=<optimized out>) at tsd.c:138
#26138 0x00007f81ed428bfc in __collector_ext_unwind_key_init (isPthread=isPthread@entry=1, stack=stack@entry=0x0) at unwind.c:342
#26139 0x00007f81ed40552a in collector_root (cargs=<optimized out>) at dispatcher.c:1103
#26140 0x00007f81e7c94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#26141 0x00007f81e7d26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Attached is a simple program that demonstrates the problem.
Comment 1 Vladimir Mezentsev 2024-03-21 00:26:06 UTC
 I cannot reproduce the problem on OL8 with libpthread-2.28.
It looks like calloc() is called in ./nptl/pthread_setspecific.c:69
But this calloc() must be from libc, not from the user application.


Can you run this test in your environment:

% cat t.c
#include <pthread.h>

long long calloc = 0;

void *func(void *arg)
{
	return NULL;
}

int main(int argc, char **argv)
{
	pthread_t thr;
	void *val;

	pthread_create(&thr, NULL, func, NULL);
	pthread_join(thr, &val);
	return 0;
}

% gcc -pthread t.c
% ./a.out
Comment 2 James Carlson 2024-03-21 13:32:29 UTC
First of all, your test case:

% gcc -pthread t.c
t.c:3:11: warning: built-in function 'calloc' declared as non-function [-Wbuiltin-declaration-mismatch]
    3 | long long calloc = 0;
      |           ^~~~~~
% ./a.out 
% echo $?
0

That works fine. The machine itself is running Ubuntu 22.04.4 LTS (Jammy Jellyfish).

I made a clean sandbox and found no trouble at all with "-H on", so it's clearly related to something else in my environment. I'll spend some time searching. In the meantime, this can probably be closed out as "cannot reproduce" until I get to the bottom of whatever's going on.

It's definitely reproducible and happens all the time in the giant proprietary C++ application I'm working on and in these tiny test programs when compiled inside that same environment, but I don't know yet how or why. Something causes collector's calloc hook to end up invoking itself.
Comment 3 James Carlson 2024-03-21 13:51:57 UTC
I found a clue: the application uses libtcmalloc. It seems like this recursion problem happens when that library is either ldopen'd into the address space or LD_PRELOADed in.
Comment 4 Vladimir Mezentsev 2024-03-21 17:06:31 UTC
 It looks like the bug is in gprofng/libcollector/heaptrace.c.
I see that init_heap_intf() is not thread safe.
Comment 5 Vladimir Mezentsev 2024-03-22 06:32:05 UTC
The problem is:
We use pthread_getspecific() and pthread_setspecific() to access thread local memory.
We use this memory to check that our interposed functions (like malloc, calloc or free) don't have recursion.
For example, the first time we call calloc(), we call pthread_setspecific() to create a thread-specific value.
On your machine, pthread_setspecific() calls calloc(), and we cannot intercept such recursion.

gcc supports thread-local storage. For example,
  static __thread int reentrance = 0;
I rewrote code using this instead of pthread_getspecific() and pthread_setspecific().
It works on OL8, but may not work on Ubuntu.
I will try to find Ubuntu machine.
Comment 6 James Carlson 2024-03-22 12:49:22 UTC
Neat. I'd be more than happy to try out a patch.
Comment 7 Vladimir Mezentsev 2024-03-24 02:09:10 UTC
Created attachment 15433 [details]
proposed patch
Comment 8 James Carlson 2024-03-24 02:42:29 UTC
(In reply to Vladimir Mezentsev from comment #7)
> Created attachment 15433 [details]
> proposed patch

That patch works perfectly. Thank you!
Comment 9 Sourceware Commits 2024-03-25 23:42:36 UTC
The master branch has been updated by Vladimir Mezentsev <vmezents@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=99c3fe52d237eae546d7de484d0cfbd615ac192c

commit 99c3fe52d237eae546d7de484d0cfbd615ac192c
Author: Vladimir Mezentsev <vladimir.mezentsev@oracle.com>
Date:   Sat Mar 23 18:31:03 2024 -0700

    gprofng: fix infinite recursion on calloc with multi-threaded applications
    
    libcollector uses pthread_getspecific() and pthread_setspecific() to access
    thread local memory. libcollector uses this memory to check that
    interposed functions (like malloc, calloc or free) don't have recursion.
    The first time we call calloc(), we call pthread_setspecific() to create
    a thread-specific value.
    On Ubuntu machine, pthread_setspecific() calls calloc(), and we cannot intercept
    such recursion.
    gcc supports thread-local storage. For example,
      static __thread int reentrance = 0;
    I rewrote code using this instead of pthread_setspecific().
    
    gprofng/ChangeLog
    2024-03-23  Vladimir Mezentsev  <vladimir.mezentsev@oracle.com>
    
            PR gprofng/31460
            * libcollector/heaptrace.c: Use the __thread variable to check for
            * reentry. Clean up code.
Comment 10 Vladimir Mezentsev 2024-03-26 00:11:34 UTC
Update status as resolved/fixed.