Intermittent failures retrieving process exit codes - snapshot test requested

Tom Honermann thonermann@coverity.com
Wed Jan 2 19:15:00 GMT 2013


On 01/01/2013 12:36 AM, Christopher Faylor wrote:
> On Mon, Dec 31, 2012 at 08:44:56PM -0500, Tom Honermann wrote:
>> I'm still seeing hangs in the latest code from CVS.  The stack traces
>> below are from WinDbg.
>
> I'm not asking you to build this yourself.  I have no way to know how
> you are building this.  Please just use the snapshots at
>
> http://cygwin.com/snapshots/

I was building it myself so that I could debug it without having to 
specify debug source paths and such.  I believe my builds are not 
unconventional.  I used options that disabled frame pointer omission so 
that the resulting binaries could be debugged with non-gcc debuggers.

$ mkdir build
$ cd build
$ ../src/configure \
     CFLAGS="-g" \
     CXXFLAGS="-g" \
     CFLAGS_FOR_TARGET="-g" \
     CXXFLAGS_FOR_TARGET="-g" \
     --enable-debugging \
     --prefix=$HOME/src/cygwin-latest/install -v
$ make
$ make install

>> I manually resolved the symbol references within
>> the cygwin1 module using the linker generated .map file.  Since the .map
>> file does not include static functions, some of these may be incorrect -
>> I didn't try and verify or correct for this.
>
> Thanks for trying, but the output below is garbled and not really
> useful.  If you are not going to dive in and attempt to fix code
> yourself then all we normally need is a simple test case.  WinDbg
> is not really appropriate for debugging Cygwin applications.

The output below is not garbled, but I didn't explain it clearly enough. 
  Lines with frame numbers come directly from WinDbg.  Since WinDbg is 
unable to resolve symbols to gcc generated debug info, the symbol 
references within the cygwin1 module are incorrect.  In those cases, I 
manually resolved the instruction pointer address using the RetAddr 
value from the prior frame and searching the linker generated 
cygwin1.map file.  I then pasted the mangled name on a line following 
the WinDbg line (with the incorrect symbol name) and, if the symbol is a 
C++ one, the unmangled name on an additional line.

For the stack fragment below, address 610f1553 == strtosigno+0x357 == 
__ZN4muto7acquireEm == muto::acquire(unsigned long).  I did not 
translate offsets for the functions as I resolved them, nor did I try 
and verify they are correct (ie, that the return address is not for a 
static function that is not represented in the .map file)

>>   # ChildEBP RetAddr
>> 00 00288bd0 758d0a91 ntdll!ZwWaitForSingleObject+0x15
>> 01 00288c3c 76c11194 KERNELBASE!WaitForSingleObjectEx+0x98
>> 02 00288c54 76c11148 kernel32!WaitForSingleObjectExImplementation+0x75
>> 03 00288c68 610f1553 kernel32!WaitForSingleObject+0x12
>> 04 00288cb8 6118e54d cygwin1!strtosigno+0x357
>>                               __ZN4muto7acquireEm
>>                               muto::acquire(unsigned long)
>> [snip]

The reason for using WinDbg is that, from what I understand, gdb is 
unable to produce accurate stack traces when the call stack includes 
frames for functions that omit the frame pointer and do not have debug 
info that gdb can process.  I believe many Microsoft provided functions 
in ntdll, kernel32, kernelbase, etc... do omit the frame pointer and 
only provide debug info in the PDB format - which gdb is unable to use. 
  Compiling Cygwin without frame pointer omission, and using WinDbg 
therefore provides the most accurate stack trace.  If I am incorrect 
about any of this, I would very much appreciate a correction and/or 
explanation.

I downloaded the latest snapshot (2012-12-31 18:44:57 UTC) and was able 
to reproduce several issues which are described below.

All of these issues occur when using ctrl-c to interrupt the infinite 
loop in the test case(s) I've been using to debug inconsistent exit 
codes.  When ctrl-c is pressed, I've observed the following:

1) Programs are (generally) terminated as expected.  cmd.exe prompts to 
"Terminate batch job" as expected.

2) An access violation occurs and a processor context is dumped to the 
console.  I do not yet have stack traces for these cases.

3) One of the processes hangs.

access violations occur in ~20% of test runs.  Hangs occur in ~5% of 
test runs.

I did not provide a test case previously because I don't have an 
automated reproducer at present.  All sources needed to reproduce the 
issues are below.  The test case uses a .bat file to avoid dependencies 
on bash so as to minimally isolate the problem.

To reproduce the issues, copy test.bat, false-cygwin32.exe, and 
expect-false-execve-cygwin32.exe to a Cygwin bin directory and run 
test.bat from a cmd.exe console.  Press ctrl-c to interrupt the test. 
Repeat until problems are observed.  I have not been able to reproduce 
these symptoms when running the test via a MinTTY console.

I have been unable to get useful stack traces from hung processes using 
gdb.  gdb reports that the debug information in cygwin1-20130102.dbg.bz2 
does not match (CRC mismatch) the cygwin1.dll module in 
cygwin-inst-20130102.tar.bz2.


$ cat expect-false-execve.c
#include <errno.h>
#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
     pid_t child_pid, wait_pid;
     int result, child_status;

     if (argc != 2) {
         fprintf(stderr, "expect-false: Missing or too many arguments\n");
         return 127;
     }

     child_pid = fork();
     if (child_pid == -1) {
         fprintf(stderr, "expect-false: fork failed.  errno=%d\n", errno);
         return 127;
     } else if (child_pid == 0) {
         result = execlp(argv[1], argv[1], NULL);
         if (result == -1) {
             fprintf(stderr, "expect-false: execlp failed.  errno=%d\n", 
errno);
         }
         _exit(127);
     }

     do {
         wait_pid = waitpid(child_pid, &child_status, 0);
     } while(
         (wait_pid == -1 && errno == EINTR) ||
         (wait_pid == child_pid && !(WIFEXITED(child_status) || 
WIFSIGNALED(child_status)))
     );
     if (wait_pid == -1) {
         fprintf(stderr, "expect-false: waitpid failed.  errno=%d\n", 
errno);
         return 127;
     }
     if (!WIFEXITED(child_status)) {
         fprintf(stderr, "expect-false: child process did not exit 
normally\n");
         return 127;
     }
     if (WEXITSTATUS(child_status) != 1) {
         fprintf(stderr, "expect-false: unexpected exit code: %d\n", 
child_status);
     }

     return WEXITSTATUS(child_status);
}


$ cat false.c
#include <stdio.h>

int main() {
     printf("myfalse\n");
     return 1;
}


$ cat test.bat
@echo off
setlocal

set PATH=%CD%;%PATH%

:loop
echo test...
expect-false-execve-cygwin32.exe false-cygwin32
if not errorlevel 1 (
     echo exiting...
     exit /B 1
)
goto loop


$ gcc -o expect-false-execve-cygwin32.exe expect-false-execve.c
$ gcc -o false-cygwin32.exe false.c

 From a cmd.exe console: (press ctrl-c once the test is running)
C:\...\cygwin\bin>test
test...
myfalse
test...
myfalse
...


Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple



More information about the Cygwin mailing list