malloc crash
Ken Brown
kbrown@cornell.edu
Sun Oct 24 21:46:40 GMT 2021
I'm trying to debug the fifo problem reported here:
https://cygwin.com/pipermail/cygwin/2021-October/249635.html
To keep my email self-contained, here are the reproduction instructions. Run
the attached script with argument 1000. The output is supposed to look like this:
$ ./fifo_test.sh 1000
Creating 1000 fifo readers...
Created PID=6503 reading from /tmp/catfifo_0
FIFO 0
Created PID=6506 reading from /tmp/catfifo_1
FIFO 1
[...]
Created PID=9506 reading from /tmp/catfifo_998
FIFO 998
Created PID=9509 reading from /tmp/catfifo_999
FIFO 999
But invariably one of the exec'd cat processes will appear to hang. (Actually
it goes into an infinite loop.) If you attach gdb to that process and catch it
at the right time, you see something like this:
[...]
Reading symbols from /usr/bin/cat.exe...
Reading symbols from /usr/lib/debug//usr/bin/cat.exe.dbg...
(gdb) thr 1
[Switching to thread 1 (Thread 9692.0x8658)]
#0 0x00007ffe950ed674 in ntdll!ZwCreateEvent ()
from /c/WINDOWS/SYSTEM32/ntdll.dll
(gdb) bt
#0 0x00007ffe950ed674 in ntdll!ZwCreateEvent ()
from /c/WINDOWS/SYSTEM32/ntdll.dll
#1 0x00000001800e56c8 in CreateEventW (
lpEventAttributes=0x18030ac90 <sec_none_nih>, bManualReset=0,
bInitialState=0, lpName=0x0)
at ../../../../temp/winsup/cygwin/kernel32.cc:46
#2 0x00000001800e57c1 in CreateEventA (
lpEventAttributes=0x18030ac90 <sec_none_nih>, bManualReset=0,
bInitialState=0, lpName=0x0)
at ../../../../temp/winsup/cygwin/kernel32.cc:71
#3 0x00000001801493f1 in sig_send (p=0x180010000, si=..., tls=0xffffce00)
at ../../../../temp/winsup/cygwin/sigproc.cc:698
#4 0x00000001800676c9 in exception::handle (e=0xffffc5b0, frame=0xffffcd80,
in=0xffffc0c0, dispatch=0xffffbf40)
at ../../../../temp/winsup/cygwin/exceptions.cc:834
#5 0x00007ffe950f20cf in ntdll!.chkstk () from /c/WINDOWS/SYSTEM32/ntdll.dll
#6 0x00007ffe950a1454 in ntdll!RtlRaiseException ()
from /c/WINDOWS/SYSTEM32/ntdll.dll
#7 0x00007ffe950f0bfe in ntdll!KiUserExceptionDispatcher ()
from /c/WINDOWS/SYSTEM32/ntdll.dll
#8 0x0000000180191a5c in init_top (m=0x18036f860 <_gm_>, p=0x800010000,
psize=65456) at ../../../../temp/winsup/cygwin/malloc.cc:3903
#9 0x0000000180193249 in sys_alloc (m=0x18036f860 <_gm_>, nb=256)
at ../../../../temp/winsup/cygwin/malloc.cc:4186
#10 0x0000000180196b96 in dlmalloc (bytes=248)
at ../../../../temp/winsup/cygwin/malloc.cc:4669
#11 0x0000000180197f5d in dlcalloc (n_elements=1, elem_size=248)
at ../../../../temp/winsup/cygwin/malloc.cc:4799
#12 0x00000001800e9030 in calloc (nmemb=1, size=248)
at ../../../../temp/winsup/cygwin/malloc_wrapper.cc:101
#13 0x0000000180044a2a in operator new (s=248)
at ../../../../temp/winsup/cygwin/cxx.cc:21
#14 0x000000018016a75d in pthread::init_mainthread ()
at ../../../../temp/winsup/cygwin/thread.cc:371
#15 0x000000018004a310 in dll_crt0_1 ()
at ../../../../temp/winsup/cygwin/dcrt0.cc:887
#16 0x000000018004771c in _cygtls::call2 (this=0xffffce00,
func=0x18004a218 <dll_crt0_1(void*)>, arg=0x0, buf=0xffffcdb0)
at ../../../../temp/winsup/cygwin/cygtls.cc:40
#17 0x00000001800476c1 in _cygtls::call (func=0x18004a218 <dll_crt0_1(void*)>,
arg=0x0) at ../../../../temp/winsup/cygwin/cygtls.cc:27
#18 0x000000018004aac9 in _dll_crt0 ()
at ../../../../temp/winsup/cygwin/dcrt0.cc:1099
#19 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Typing 'finish' enough times until it won't return anymore shows that there is
an infinite loop starting with an access violation here:
(gdb) f 8
#8 0x0000000180191a5c in init_top (m=0x18036f860 <_gm_>, p=0x800010000,
psize=65456) at ../../../../temp/winsup/cygwin/malloc.cc:3903
3903 p->head = psize | PINUSE_BIT;
I guess there's an infinite loop rather than a crash because the exec'd cat
process isn't fully initialized yet, and the exception handler just keeps
continuing execution at the site of the access violation.
If I'm reading the backtrace correctly, the access violation occurs while Cygwin
is trying to allocate storage for the main thread object of the exec'd process.
I'm not familiar enough with the relevant Cygwin internals to take the analysis
any further, but my guess is that the problem is somehow triggered by the
creation of a new thread at the end of fhandler_fifo::fixup_after_exec:
new cygthread (fifo_reader_thread, this, "fifo_reader", thr_sync_evt);
Is this a bug in the fifo code? Is there some reason I shouldn't be creating a
new thread in fixup_after_exec? If so, I'm not sure what to do. The fifo
reader code depends crucially on having that thread running.
By the way, every once in a while the hang seems to occur in the forked bash
process, before it execs cat. This could also be due to the creation of a new
thread, this time in fixup_after_fork.
Ken
P.S. The gdb session was based on a build from current git HEAD, but the problem
also occurs in Cygwin 3.2.0. So I don't think it's related to the new pipe code.
-------------- next part --------------
#!/bin/bash
# take arg as number of iterations (default=100)
STEPS="${1-100}"
FIFO_PFX="/tmp/catfifo_"
FIFO_WAIT=0
STEP_WAIT=0
function mysleep() { if [ -n "$1" -a "$1" != "0" ]; then sleep "$1"; fi }
function cleanup(){
rm -f "$FIFO_PFX"*
}
trap cleanup EXIT
printf "Creating $STEPS fifo readers...\n"
for ((i=0; i<STEPS; i++ )); do
fifo="$FIFO_PFX$i"
# create fifo
mkfifo "$fifo"
mysleep $FIFO_WAIT
# fork a process reading from fifo and writing it to stdout
cat < "$fifo" &
pid=$!
printf "Created PID=$pid reading from $fifo\n"
# redirect FD3 to the fifo and print a message to it
exec 3>"$fifo"
printf "FIFO %d\n" "$i" >&3
# close the file descriptor, wait for process to exit and clean up
exec 3>&-
wait $pid
rm -f "$fifo"
mysleep $STEP_WAIT
done
More information about the Cygwin-developers
mailing list