Deadlock of the process tree when running make

Alexey Izbyshev izbyshev@ispras.ru
Sat Apr 9 19:35:03 GMT 2022


On 2022-04-09 20:54, Takashi Yano wrote:
> Thanks for checking. This seems to be normal. Then, I cannot
> understand why the ClosePseudoConsole() call is blocked...
> 
> The document by Microsoft mentions the blocking conditions of
> ClosePseudoConsole():
> https://docs.microsoft.com/en-us/windows/console/closepseudoconsole
> however, the thread above is draining the channel.

I've decided to check what object ClosePseudoConsole() waits for. The 
wait happens inside unexported KERNELBASE!_ClosePseudoConsoleMembers 
function. Here is the relevant part:

76589fb5 8b4e08          mov     ecx,dword ptr [esi+8]
76589fb8 e8c2fdffff      call    KERNELBASE!_HandleIsValid (76589d7f)
76589fbd 84c0            test    al,al
76589fbf 7456            je      
KERNELBASE!_ClosePseudoConsoleMembers+0x89 (7658a017)
76589fc1 8d45fc          lea     eax,[ebp-4]
76589fc4 895dfc          mov     dword ptr [ebp-4],ebx
76589fc7 50              push    eax
76589fc8 51              push    ecx
76589fc9 e8c23ef5ff      call    KERNELBASE!GetExitCodeProcess 
(764dde90)
76589fce 85c0            test    eax,eax
76589fd0 7414            je      
KERNELBASE!_ClosePseudoConsoleMembers+0x58 (76589fe6)
76589fd2 817dfc03010000  cmp     dword ptr [ebp-4],103h
76589fd9 750b            jne     
KERNELBASE!_ClosePseudoConsoleMembers+0x58 (76589fe6)
76589fdb 53              push    ebx
76589fdc 6aff            push    0FFFFFFFFh
76589fde ff7608          push    dword ptr [esi+8]
76589fe1 e8ba74f6ff      call    KERNELBASE!WaitForSingleObjectEx 
(764f14a0)

"esi" is the argument of ClosePseudoConsole(), so the first mov 
dereferences it with an offset and loads a process handle. Then, if this 
handle is valid, it calls GetExitCodeProcess(), and if it succeeds and 
returns STILL_ACTIVE, it waits for that process.

I've checked that hanging bash process has only 3 process handles: for 
itself, for dead javac, and for conhost.exe. So obviously it waits for 
the latter to terminate. (After I did all this, I realized there was 
much easier way to get this result via "Analyze wait chain" feature of 
Task Manager).

Unfortunately, I don't know anything about Windows consoles, but just in 
case I also checked what 5 threads of conhost.exe are waiting for:

1. Tries to enter a critical section (Task Manager claims it waits for 
thread 4, so probably the latter owns it).
2. Waits on a handle for "pty1-from-master-nat" named pipe.
3. Waits for an anonymous event.
4. Waits on a handle for "\Device\ConDrv" (in DeviceIoControl()).
5. Blocked in GetMessageW().

It's also worth of note that this conhost.exe seems to be the only one 
related to the Cygwin process tree (as well as the only related 
non-Cygwin process). All other conhost.exe processes were created before 
I started my stress test.

My guess is that this conhost.exe was created for a native app started 
from a Cygwin process. Could it be some race condition/bug that 
prevented conhost.exe from terminating once the native process (probably 
javac?) died?

Alexey


More information about the Cygwin mailing list