Created attachment 14427 [details] glibc patch to add delay in do_lookup_x I have observed a crash in a Python program that uses threads and fork. Below, I attach the following: * a patch for the latest glibc master (commit b4174c28d21e1672ef3cc15a058558e97b8471c6) that introduces a delay to force the race condition. * a minimal C reproduction. My understanding of the problem is that the following sequence of events takes place: parent: main-thread: starts thread-1 parent: thread-1: exits parent: thread-1: tries to call _Unwind_Find_FDE parent: thread-1: ld.so loads libgcc_s.so in memory parent: main-thread: fork() parent: thread-1: resolves _Unwind_Find_FDE@got.plt child: main-thread: starts thread-1 child: thread-1: exits child: thread-1: calls _Unwind_Find_FDE using the unresolved _Unwind_Find_FDE@got.plt The child process crashes with a SIGSEGV with the following backtrace: #0 0x0000000000003230 in ?? () #1 0x00007ffff75c2868 in uw_frame_state_for (context=0x7ffff7dcdc70, fs=0x7ffff7dcdab0) at ../../../src/libgcc/unwind-dw2.c:1263 #2 0x00007ffff75c3a20 in uw_init_context_1 (context=0x7ffff7dcdc70, outer_cfa=0x7ffff7dcdea0, outer_ra=0x7ffff7e64e36 <__GI___pthread_unwind+70>) at ../../../src/libgcc/unwind-dw2.c:1592 #3 0x00007ffff75c431a in _Unwind_ForcedUnwind (exc=0x7ffff7dced30, stop=stop@entry=0x7ffff7e64ca0 <unwind_stop>, stop_argument=0x7ffff7dcdef0) at ../../../src/libgcc/unwind.inc:211 #4 0x00007ffff7e64e36 in __GI___pthread_unwind (buf=<optimized out>) at unwind.c:130 #5 0x00007ffff7e5ddba in __do_cancel () at ../sysdeps/nptl/pthreadP.h:276 #6 __GI___pthread_exit (value=0x0) at pthread_exit.c:36 #7 0x00007ffff7fbf283 in ?? () #8 0x0000000000000000 in ?? () 0x3230 is the value of _Unwind_Find_FDE@got.plt in the binary: $ gdb -q /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 (gdb) p '_Unwind_Find_FDE@got.plt' $1 = (<text from jump slot in .got.plt, no debug info>) 0x3230 I have also a Python reproduction that fails without the forced delay and by just enabling LD_DEBUG=all (which seems to add the needed delays). Let me know if it's valuable to attach in this bug.
Created attachment 14428 [details] Reproduction
The child may only call async-signal-safe functions, which pthread_create isn't.
What's your glibc version? I don't expect unwind-link to have this problem. See __libc_unwind_link_after_fork. The unwind-link facility was added in glibc 2.34.
This was tested and reproduces on the latest master (commit b4174c28d21e1672ef3cc15a058558e97b8471c6). After reading Andreas' response, I found this snippet in fork's man page which renders the reproduction code invalid: * After a fork() in a multithreaded program, the child can safely call only async-signal-safe functions (see signal-safety(7)) until such time as it calls execve(2).
I see. So it's probably the half-completed dlopen that confuses the subprocess. We do this after fork: /* Reset the lock the dynamic loader uses to protect its data. */ __rtld_lock_initialize (GL(dl_load_lock)); /* Reset the lock protecting dynamic TLS related data. */ __rtld_lock_initialize (GL(dl_load_tls_lock)); That's unfortunately not particularly safe.