cygwin hang problem

Joe Buehler
Fri Jul 19 15:37:00 GMT 2002

I have "hang" problems, and I have core dumps.
It's very inconsistent -- I have a dual-processor NT machine that
has been running continuous builds for about 3 days without stopping.
I have a single-processor XP machine that ran for less than a day
and hung with a core dump.

I spent the last few hours examining that core dump
(generated using dumper.exe) and it appears that the
cygwin dll jumped to never-never land after calling CopySid() in
cygsid::assign() in security.h.  Here's the trace from
gdb for the thread that caused the dump (I modified handle_exceptions
to wait for dumper to do the dump):

#0  0x77e72e9f in _libkernel32_a_iname ()
#1  0x61013ef5 in try_to_debug (waitloop=true)
     at /usr/local/cygwin-src/src/winsup/cygwin/
#2  0x6101453e in handle_exceptions (e=0x72f6f8, in=0x72f714)
     at /usr/local/cygwin-src/src/winsup/cygwin/
#3  0x77f833a0 in _libkernel32_a_iname ()
#4  0x77f83372 in _libkernel32_a_iname ()
#5  0x77f510a6 in _libkernel32_a_iname ()
#6  0x610d8b8c in cygsid::operator= (this=0x72fa5c, nsid=0x61610294)
     at /usr/local/cygwin-src/src/winsup/cygwin/security.h:47
#7  0x61070c37 in __sec_user (sa_buf=0x72fae8, sid2=0x0, inherit=0)
     at /usr/local/cygwin-src/src/winsup/cygwin/
#8  0x610db98c in sec_user_nih (sa_buf=0x72fae8 "", sid=0x0)
     at /usr/local/cygwin-src/src/winsup/cygwin/security.h:214
#9  0x61082da5 in getsem (p=0x0,
     str=0x610eb248 "cygwin1S3-2002-07-11 10:28.sigcatch.23002002-07-11 10:28", init=0,
     max=2147483647) at /usr/local/cygwin-src/src/winsup/cygwin/
#10 0x6108352b in wait_sig () at /usr/local/cygwin-src/src/winsup/cygwin/
#11 0x61007961 in thread_stub (arg=0x610e23a0)
     at /usr/local/cygwin-src/src/winsup/cygwin/
#12 0x77e802ed in _libkernel32_a_iname ()

Note that the "str" argument in frame 9 is not correct...

Here is a trace for the main thread:

#0  0x7ffe0304 in ?? ()
#1  0x77e79d6a in _libkernel32_a_iname ()
#2  0x610a000f in wait4 (intpid=-1, status=0x22cdbc, options=2, r=0x0)
     at /usr/local/cygwin-src/src/winsup/cygwin/
#3  0x6109fd3d in waitpid (intpid=-1, status=0x22cdbc, options=2)
     at /usr/local/cygwin-src/src/winsup/cygwin/
#4  0x00419148 in job_waitsafe (sig=0) at /usr/local/ast-src/src/cmd/ksh93/sh/jobs.c:201
#5  0x0041abcb in job_wait (pid=3568) at /usr/local/ast-src/src/cmd/ksh93/sh/jobs.c:1215
#6  0x00413c7d in sh_exec (t=0xa05e4d8, flags=0) at /usr/local/ast-src/src/cmd/ksh93/sh/xec.c:709
#7  0x004144f7 in sh_exec (t=0xa05e528, flags=0) at /usr/local/ast-src/src/cmd/ksh93/sh/xec.c:952
#8  0x004144db in sh_exec (t=0xa05e590, flags=0) at /usr/local/ast-src/src/cmd/ksh93/sh/xec.c:951
#9  0x00414f8e in sh_exec (t=0xa05e3e8, flags=4)
     at /usr/local/ast-src/src/cmd/ksh93/sh/xec.c:1193
#10 0x0041447f in sh_exec (t=0xa05e910, flags=4) at /usr/local/ast-src/src/cmd/ksh93/sh/xec.c:940
#11 0x00414faa in sh_exec (t=0xa05e298, flags=4)
     at /usr/local/ast-src/src/cmd/ksh93/sh/xec.c:1194
#12 0x0041447f in sh_exec (t=0xa05e998, flags=5) at /usr/local/ast-src/src/cmd/ksh93/sh/xec.c:940
#13 0x004140cc in sh_exec (t=0xa05ee70, flags=5) at /usr/local/ast-src/src/cmd/ksh93/sh/xec.c:833
#14 0x00413ef0 in sh_exec (t=0xa05ee80, flags=4) at /usr/local/ast-src/src/cmd/ksh93/sh/xec.c:779
#15 0x00401f7f in exfile (iop=0xa053950, fno=6) at /usr/local/ast-src/src/cmd/ksh93/sh/main.c:520
#16 0x00401810 in sh_main (ac=2, av=0x616107ac, userinit=0)
     at /usr/local/ast-src/src/cmd/ksh93/sh/main.c:318
#17 0x0040106a in main (argc=2, argv=0x616107ac)
     at /usr/local/ast-src/src/cmd/ksh93/sh/pmain.c:33
#18 0x610065b0 in dll_crt0_1 () at /usr/local/cygwin-src/src/winsup/cygwin/
#19 0x61006a59 in _dll_crt0 () at /usr/local/cygwin-src/src/winsup/cygwin/
#20 0x61006ab1 in dll_crt0 (uptr=0x0) at /usr/local/cygwin-src/src/winsup/cygwin/
#21 0x0045c44e in cygwin_crt0 ()
#22 0x0040103c in mainCRTStartup ()
#23 0x77e7eb69 in _libkernel32_a_iname ()

It is difficult to tell exactly what happened -- it looks like
the CopySid call did not return -- based on
the stack it looks like something may have gone wrong with
the DLL linkage code that loads advapi32 and calls the real
CopySid.  It did not get to the point where it overwrites the
original mov, call instruction sequence in the DLL linkage code.

One interesting point I haven't figured out yet is that
the exception address passed to Cygwin's exception handler
is almost exactly 2x (as in left shift 1) the address of
the _win32_CopySid@12 code.  I checked the IA32 exception handler
stack format and it looks like the exception that NT got was
due to a jump to the weeds -- the EIP pushed on the stack is the
same as the exception address passed to the Cygwin
exception handler.

I notice a comment in the source about replacing CopySid with
memcpy.  Does anyone remember why this was done?  Is there something
flaky about CopySid?

Something else I wonder about -- wait_sig() is still setting up,
and the main thread is in waitpid() -- perhaps a signal came
in while the signal handler is still setting up?  I haven't looked
at that stuff and don't know how it works.

Sorry if I am missing anything obvious -- I am learning Cygwin
internals as I go, and this is a very knotty problem.

Enough for now, time to go home...

Joe Buehler

More information about the Cygwin-developers mailing list