Deadlock of the process tree when running make
Alexey Izbyshev
izbyshev@ispras.ru
Sat Apr 9 20:26:51 GMT 2022
On 2022-04-09 22:35, Alexey Izbyshev wrote:
> On 2022-04-09 20:54, Takashi Yano wrote:
>> Thanks for checking. This seems to be normal. Then, I cannot
>> understand why the ClosePseudoConsole() call is blocked...
>>
>> The document by Microsoft mentions the blocking conditions of
>> ClosePseudoConsole():
>> https://docs.microsoft.com/en-us/windows/console/closepseudoconsole
>> however, the thread above is draining the channel.
>
> I've decided to check what object ClosePseudoConsole() waits for. The
> wait happens inside unexported KERNELBASE!_ClosePseudoConsoleMembers
> function. Here is the relevant part:
>
> 76589fb5 8b4e08 mov ecx,dword ptr [esi+8]
> 76589fb8 e8c2fdffff call KERNELBASE!_HandleIsValid (76589d7f)
> 76589fbd 84c0 test al,al
> 76589fbf 7456 je
> KERNELBASE!_ClosePseudoConsoleMembers+0x89 (7658a017)
> 76589fc1 8d45fc lea eax,[ebp-4]
> 76589fc4 895dfc mov dword ptr [ebp-4],ebx
> 76589fc7 50 push eax
> 76589fc8 51 push ecx
> 76589fc9 e8c23ef5ff call KERNELBASE!GetExitCodeProcess
> (764dde90)
> 76589fce 85c0 test eax,eax
> 76589fd0 7414 je
> KERNELBASE!_ClosePseudoConsoleMembers+0x58 (76589fe6)
> 76589fd2 817dfc03010000 cmp dword ptr [ebp-4],103h
> 76589fd9 750b jne
> KERNELBASE!_ClosePseudoConsoleMembers+0x58 (76589fe6)
> 76589fdb 53 push ebx
> 76589fdc 6aff push 0FFFFFFFFh
> 76589fde ff7608 push dword ptr [esi+8]
> 76589fe1 e8ba74f6ff call KERNELBASE!WaitForSingleObjectEx
> (764f14a0)
>
> "esi" is the argument of ClosePseudoConsole(), so the first mov
> dereferences it with an offset and loads a process handle. Then, if
> this handle is valid, it calls GetExitCodeProcess(), and if it
> succeeds and returns STILL_ACTIVE, it waits for that process.
>
> I've checked that hanging bash process has only 3 process handles: for
> itself, for dead javac, and for conhost.exe. So obviously it waits for
> the latter to terminate. (After I did all this, I realized there was
> much easier way to get this result via "Analyze wait chain" feature of
> Task Manager).
>
> Unfortunately, I don't know anything about Windows consoles, but just
> in case I also checked what 5 threads of conhost.exe are waiting for:
>
> 1. Tries to enter a critical section (Task Manager claims it waits for
> thread 4, so probably the latter owns it).
> 2. Waits on a handle for "pty1-from-master-nat" named pipe.
> 3. Waits for an anonymous event.
> 4. Waits on a handle for "\Device\ConDrv" (in DeviceIoControl()).
> 5. Blocked in GetMessageW().
>
> It's also worth of note that this conhost.exe seems to be the only one
> related to the Cygwin process tree (as well as the only related
> non-Cygwin process). All other conhost.exe processes were created
> before I started my stress test.
>
> My guess is that this conhost.exe was created for a native app started
> from a Cygwin process. Could it be some race condition/bug that
> prevented conhost.exe from terminating once the native process
> (probably javac?) died?
>
A few more things that might be important:
* Clarification: thread 2 of conhost.exe waits in KernelBase!ReadFile().
* In the assembly part I omitted, before waiting on the conhost process,
_ClosePseudoConsoleMembers() closes the handle obtained from "dword ptr
[esi]", i.e. "hWritePipe" member of HPCON_INTERNAL struct.
Alexey
More information about the Cygwin
mailing list