python fails asyncio tests (py 3.7 & 3.8)

Mark Geisert mark@maxrnd.com
Wed Dec 2 09:01:32 GMT 2020


Hi folks,
I'm following up on the OP's investigation supplied in
https://cygwin.com/pipermail/cygwin/2020-November/246832.html .
The situation is a socket select thread stuck in a wait-for-event loop that 
doesn't realize select() is trying to cleanup that thread before returning a 
result to the app.  Here is the relevant part of an strace log:

>   114 8495682 [main] python3.8 1987 start_thread_socket: stuff_start 0xFFFF8C38
>    68 8495750 [main] python3.8 1987 cygthread::create: name socksel, id 0x737C, this 0x180234778
>    76 8495826 [main] python3.8 1987 cygthread::create: activated name 'socksel', thread_sync 0x3A8 for id 0x737C
>   122 8495948 [socksel] python3.8 1987 thread_socket: stuff_start 0xFFFF8C38, timeout 4294967295
>    78 8496026 [main] python3.8 1987 select_stuff::wait: m 4, us 10000, wmfo_timeout -1
>    77 8496103 [socksel] python3.8 1987 fhandler_socket_local::af_local_connect: af_local_connect called, no_getpeereid=0
>   115 8496218 [socksel] python3.8 1987 fhandler_socket_local::af_local_send_secret: Sending af_local secret succeeded
>    95 8496313 [socksel] python3.8 1987 fhandler_socket_local::af_local_recv_secret: entered
> 11450 8507763 [main] python3.8 1987 select_stuff::wait: wait_ret 3, m = 4.  verifying
>   135 8507898 [main] python3.8 1987 select_stuff::wait: timed out
>    98 8507996 [main] python3.8 1987 select_stuff::wait: returning 1
>    84 8508080 [main] python3.8 1987 select: sel.wait returns 1
>    73 8508153 [main] python3.8 1987 select_stuff::cleanup: calling cleanup routines
>    78 8508231 [main] python3.8 1987 socket_cleanup: si 0x800324910 si->thread 0x180234778
[end of strace.. nothing further happens]

The 'socksel' thread is shown entering af_local_recv_secret(), so this is all part 
of local socket connection startup, when a secret is sent and received, then 
credentials are sent and received.  The socksel thread is looping on a 
WSAWaitForMultipleEvents() call.  The OP suggested using select_info.stop_thread 
to indicate that wait loops should exit.  That would work further up the stack, 
but at this level the code doesn't (currently) see the appropriate select_info.

I believe this is where I need to add such code, unless I'm missing something.
Here is the stack while the socksel thread is looping:

> (gdb) bt
> #0  0x00007ffdff2ec8f4 in ntdll!ZwWaitForMultipleObjects ()
>    from /c/WINDOWS/SYSTEM32/ntdll.dll
> #1  0x00007ffdfce58910 in WaitForMultipleObjectsEx ()
>    from /c/WINDOWS/System32/KERNELBASE.dll
> #2  0x00000001800a76fa in fhandler_socket_wsock::wait_for_events (
>     this=0x180356450, event_mask=33, flags=<optimized out>)
>     at /oss/src/winsup/cygwin/fhandler_socket_inet.cc:426
> #3  0x00000001800a81b5 in fhandler_socket_wsock::wait_for_events (
>     this=this@entry=0x180356450, event_mask=event_mask@entry=33,
>     flags=flags@entry=0) at /oss/src/winsup/cygwin/fhandler.h:656
> #4  0x00000001800ab820 in fhandler_socket_local::recv_internal (
>     this=0x180356450, wsamsg=0x5fc930, use_recvmsg=false)
>     at /oss/src/winsup/cygwin/fhandler_socket_local.cc:1167
> #5  0x00000001800a515b in fhandler_socket_wsock::recvfrom (
>     this=<optimized out>, in_ptr=<optimized out>, len=<optimized out>,
>     flags=<optimized out>, from=0x0, fromlen=0x0)
>     at /oss/src/winsup/cygwin/fhandler_socket_inet.cc:1271
> #6  0x00000001800ad38f in fhandler_socket_local::af_local_recv_secret (
>     this=this@entry=0x180356450)
>     at /oss/src/winsup/cygwin/fhandler_socket_local.cc:437
> #7  0x00000001800ad929 in fhandler_socket_local::af_local_connect (
>     this=0x180356450) at /oss/src/winsup/cygwin/fhandler_socket_local.cc:539
> #8  fhandler_socket_local::af_local_connect (this=0x180356450)
>     at /oss/src/winsup/cygwin/fhandler_socket_local.cc:527
> #9  0x00000001800a75b6 in fhandler_socket_wsock::evaluate_events (
>     this=0x180356450, event_mask=63, events=@0x5fcb68: 0,
>     erase=erase@entry=false)
>     at /oss/src/winsup/cygwin/fhandler_socket_inet.cc:313
> #10 0x0000000180129052 in peek_socket (me=me@entry=0x8002f4590)
>     at /oss/src/winsup/cygwin/select.cc:1634
> #11 0x0000000180129221 in thread_socket (arg=0x80028ce10)
>     at /oss/src/winsup/cygwin/select.cc:1664
> #12 0x0000000180046686 in cygthread::callfunc (
>     this=this@entry=0x180234778 <threads+88>,
>     issimplestub=issimplestub@entry=false)
>     at /oss/src/winsup/cygwin/cygthread.cc:48
> #13 0x0000000180046c46 in cygthread::stub (
>     arg=arg@entry=0x180234778 <threads+88>)
>     at /oss/src/winsup/cygwin/cygthread.cc:91
> #14 0x0000000180047706 in _cygtls::call2 (this=0x5fce00,
>     func=0x180046bc0 <cygthread::stub(void*)>, arg=0x180234778 <threads+88>,
>     buf=buf@entry=0x5fcd20) at /oss/src/winsup/cygwin/cygtls.cc:40
> #15 0x00000001800477b4 in _cygtls::call (func=<optimized out>,
>     arg=<optimized out>) at /usr/include/w32api/psdk_inc/intrin-impl.h:790
> #16 0x00007ffdfda47034 in KERNEL32!BaseThreadInitThunk ()
>    from /c/WINDOWS/System32/KERNEL32.DLL
> #17 0x00007ffdff29cec1 in ntdll!RtlUserThreadStart ()
>    from /c/WINDOWS/SYSTEM32/ntdll.dll
> #18 0x0000000000000000 in ?? ()
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> (gdb)

Just to rule out one slight possibility, I built a different Cygwin DLL from 
topic/af_unix but the resulting strace output is the same.  So this doesn't seem 
to be related to Ken's work.

..mark


More information about the Cygwin-developers mailing list