python fails asyncio tests (py 3.7 & 3.8)
Mark Geisert
mark@maxrnd.com
Wed Dec 2 09:01:32 GMT 2020
Hi folks,
I'm following up on the OP's investigation supplied in
https://cygwin.com/pipermail/cygwin/2020-November/246832.html .
The situation is a socket select thread stuck in a wait-for-event loop that
doesn't realize select() is trying to cleanup that thread before returning a
result to the app. Here is the relevant part of an strace log:
> 114 8495682 [main] python3.8 1987 start_thread_socket: stuff_start 0xFFFF8C38
> 68 8495750 [main] python3.8 1987 cygthread::create: name socksel, id 0x737C, this 0x180234778
> 76 8495826 [main] python3.8 1987 cygthread::create: activated name 'socksel', thread_sync 0x3A8 for id 0x737C
> 122 8495948 [socksel] python3.8 1987 thread_socket: stuff_start 0xFFFF8C38, timeout 4294967295
> 78 8496026 [main] python3.8 1987 select_stuff::wait: m 4, us 10000, wmfo_timeout -1
> 77 8496103 [socksel] python3.8 1987 fhandler_socket_local::af_local_connect: af_local_connect called, no_getpeereid=0
> 115 8496218 [socksel] python3.8 1987 fhandler_socket_local::af_local_send_secret: Sending af_local secret succeeded
> 95 8496313 [socksel] python3.8 1987 fhandler_socket_local::af_local_recv_secret: entered
> 11450 8507763 [main] python3.8 1987 select_stuff::wait: wait_ret 3, m = 4. verifying
> 135 8507898 [main] python3.8 1987 select_stuff::wait: timed out
> 98 8507996 [main] python3.8 1987 select_stuff::wait: returning 1
> 84 8508080 [main] python3.8 1987 select: sel.wait returns 1
> 73 8508153 [main] python3.8 1987 select_stuff::cleanup: calling cleanup routines
> 78 8508231 [main] python3.8 1987 socket_cleanup: si 0x800324910 si->thread 0x180234778
[end of strace.. nothing further happens]
The 'socksel' thread is shown entering af_local_recv_secret(), so this is all part
of local socket connection startup, when a secret is sent and received, then
credentials are sent and received. The socksel thread is looping on a
WSAWaitForMultipleEvents() call. The OP suggested using select_info.stop_thread
to indicate that wait loops should exit. That would work further up the stack,
but at this level the code doesn't (currently) see the appropriate select_info.
I believe this is where I need to add such code, unless I'm missing something.
Here is the stack while the socksel thread is looping:
> (gdb) bt
> #0 0x00007ffdff2ec8f4 in ntdll!ZwWaitForMultipleObjects ()
> from /c/WINDOWS/SYSTEM32/ntdll.dll
> #1 0x00007ffdfce58910 in WaitForMultipleObjectsEx ()
> from /c/WINDOWS/System32/KERNELBASE.dll
> #2 0x00000001800a76fa in fhandler_socket_wsock::wait_for_events (
> this=0x180356450, event_mask=33, flags=<optimized out>)
> at /oss/src/winsup/cygwin/fhandler_socket_inet.cc:426
> #3 0x00000001800a81b5 in fhandler_socket_wsock::wait_for_events (
> this=this@entry=0x180356450, event_mask=event_mask@entry=33,
> flags=flags@entry=0) at /oss/src/winsup/cygwin/fhandler.h:656
> #4 0x00000001800ab820 in fhandler_socket_local::recv_internal (
> this=0x180356450, wsamsg=0x5fc930, use_recvmsg=false)
> at /oss/src/winsup/cygwin/fhandler_socket_local.cc:1167
> #5 0x00000001800a515b in fhandler_socket_wsock::recvfrom (
> this=<optimized out>, in_ptr=<optimized out>, len=<optimized out>,
> flags=<optimized out>, from=0x0, fromlen=0x0)
> at /oss/src/winsup/cygwin/fhandler_socket_inet.cc:1271
> #6 0x00000001800ad38f in fhandler_socket_local::af_local_recv_secret (
> this=this@entry=0x180356450)
> at /oss/src/winsup/cygwin/fhandler_socket_local.cc:437
> #7 0x00000001800ad929 in fhandler_socket_local::af_local_connect (
> this=0x180356450) at /oss/src/winsup/cygwin/fhandler_socket_local.cc:539
> #8 fhandler_socket_local::af_local_connect (this=0x180356450)
> at /oss/src/winsup/cygwin/fhandler_socket_local.cc:527
> #9 0x00000001800a75b6 in fhandler_socket_wsock::evaluate_events (
> this=0x180356450, event_mask=63, events=@0x5fcb68: 0,
> erase=erase@entry=false)
> at /oss/src/winsup/cygwin/fhandler_socket_inet.cc:313
> #10 0x0000000180129052 in peek_socket (me=me@entry=0x8002f4590)
> at /oss/src/winsup/cygwin/select.cc:1634
> #11 0x0000000180129221 in thread_socket (arg=0x80028ce10)
> at /oss/src/winsup/cygwin/select.cc:1664
> #12 0x0000000180046686 in cygthread::callfunc (
> this=this@entry=0x180234778 <threads+88>,
> issimplestub=issimplestub@entry=false)
> at /oss/src/winsup/cygwin/cygthread.cc:48
> #13 0x0000000180046c46 in cygthread::stub (
> arg=arg@entry=0x180234778 <threads+88>)
> at /oss/src/winsup/cygwin/cygthread.cc:91
> #14 0x0000000180047706 in _cygtls::call2 (this=0x5fce00,
> func=0x180046bc0 <cygthread::stub(void*)>, arg=0x180234778 <threads+88>,
> buf=buf@entry=0x5fcd20) at /oss/src/winsup/cygwin/cygtls.cc:40
> #15 0x00000001800477b4 in _cygtls::call (func=<optimized out>,
> arg=<optimized out>) at /usr/include/w32api/psdk_inc/intrin-impl.h:790
> #16 0x00007ffdfda47034 in KERNEL32!BaseThreadInitThunk ()
> from /c/WINDOWS/System32/KERNEL32.DLL
> #17 0x00007ffdff29cec1 in ntdll!RtlUserThreadStart ()
> from /c/WINDOWS/SYSTEM32/ntdll.dll
> #18 0x0000000000000000 in ?? ()
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> (gdb)
Just to rule out one slight possibility, I built a different Cygwin DLL from
topic/af_unix but the resulting strace output is the same. So this doesn't seem
to be related to Ken's work.
..mark
More information about the Cygwin-developers
mailing list