[PATCH v2] Cygwin: signal: Do not handle signal when __SIGFLUSHFAST is sent

Takashi Yano takashi.yano@nifty.ne.jp
Sat Jan 18 11:41:37 GMT 2025


On Fri, 17 Jan 2025 18:52:41 +0900
Takashi Yano wrote:
> On Wed, 8 Jan 2025 18:05:53 -0800 (PST)
> Jeremy Drake wrote:
> > On Thu, 9 Jan 2025, Takashi Yano wrote:
> > 
> > > On Wed, 8 Jan 2025 16:48:41 +0100
> > > Corinna Vinschen wrote:
> > > > Does this patch fix Bruno's bash issue as well?
> > >
> > > I'm not sure because it is not reproducible as he said.
> > > I also could not reproduce that.
> > >
> > > However, at least this fixes the issue that Jeremy encountered:
> > > https://cygwin.com/pipermail/cygwin/2024-December/256977.html
> > >
> > > But, even with this patch, Jeremy reported another hang issue
> > > that also is not reproducible:
> > > https://cygwin.com/pipermail/cygwin/2024-December/256987.html
> > 
> > Yes, this patch helped the hangs I was seeing on Windows on ARM64.
> > However, there is still some hang issue in 3.5.5 (which occurs on
> > native x86_64) that is not there in 3.5.4.  Git for Windows' test suite
> > seems to be somewhat reliable at triggering this, but it's hardly a
> > minimal test case ;).
> > 
> > Because of this issue, MSYS2 has been keeping 3.5.5 in its 'staging' state
> > (rather than deploying it to normal users), and Git for Windows rolled
> > back to 3.5.4 before the release of the latest Git RC.
> 
> I might have successfully reproduced this issue. I tried building
> cygwin1.dll repeatedly for some of my machines, and one of them
> hung in fhandler_pipe::raw_read() as lazka's case:
> https://github.com/msys2/msys2-runtime/pull/251#issuecomment-2571338429
> 
> The call:
> L358:         waitret = cygwait (select_sem, select_sem_timeout);
> never returned even with select_sem_timeout == 1 until a signal
> (such as SIGTERM, SIGKILL) arrives. In this situation, attaching
> gdb to the process hanging and issuing 'si' command do not return.
> Something (stack?) seems to be completely broken.
> 
> I'll try to bisect which commit causes this issue. Please wait
> a while.

Done.

This issue also seems to be related to the commit:

commit d243e51ef1d30312ba1e21b4d25a1ca9a8dc1f63
Author: Takashi Yano <takashi.yano@nifty.ne.jp>
Date:   Mon Nov 25 19:51:53 2024 +0900

    Cygwin: signal: Fix deadlock between main thread and sig thread

    Previously, a deadlock happened if many SIGSTOP/SIGCONT signals were
    received rapidly. If the main thread sends __SIGFLUSH at the timing
    when SIGSTOP is handled by the sig thread, but not is handled by the
    main thread yet (sig_handle_tty_stop() not called yet), and if SIGCONT
    is received, the sig thread waits for cygtls::current_sig (is SIGSTOP
    now) cleared. However, the main thread waits for the pack.wakeup using
    WaitForSingleObject(), so the main thread cannot handle SIGSTOP. This
    is the mechanism of the deadlock. This patch uses cygwait() instead of
    WaitForSingleObject() to be able to handle the pending SIGSTOP.

    Addresses: https://cygwin.com/pipermail/cygwin/2024-November/256744.html
    Fixes: 7759daa979c4 ("(sig_send): Fill out sigpacket structure to send to signal thread rather than racily sending separate packets.")
    Reported-by: Christian Franke <Christian.Franke@t-online.de>
    Reviewed-by: Corinna Vinschen <corinna@vinschen.de>
    Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>

Even though the reason why this issue happens is not clear at all,
I perhaps found the solution for that.

Applying the attached patch:
0003-Cygwin-signal-Do-not-handle-signal-when-__SIGFLUSHFA.patch
instead of previous v2 __SIGFLUSHFAST patch solves the both issues.

However, strangely enough, the similar patch:
ng-0003-Cygwin-signal-Do-not-handle-signal-when-__SIGFLUSHFA.patch
which uses cygwait() instead of WF[SM]O does not solve the issue
Jeremy reported.

The reason is also unclear. What is the difference between cygwait()
and WF[SM]O? I expected both patches work almost the same. The v2
__SIGFLUSHFAST patch also uses cygwait(), so the reason might be
the same (the reason why we should use WF[SM]O rather than cygwait()).

Corinna, any idea? I need some clue.


While debugging this problem, I encountered another hang issue,
which is fixed by:
0001-Cygwin-signal-Avoid-frequent-tls-lock-unlock-for-SIG.patch

If we are confident in the patch 0003, I think we should apply
0001-Cygwin-signal-Avoid-frequent-tls-lock-unlock-for-SIG.patch
0002-Revert-Cygwin-signal-Do-not-handle-signal-when-__SIG.patch
0003-Cygwin-signal-Do-not-handle-signal-when-__SIGFLUSHFA.patch
0004-Revert-Cygwin-signal-Fix-high-load-when-retrying-to-.patch
for main branch and
0001-Cygwin-signal-Avoid-frequent-tls-lock-unlock-for-SIG.patch
0003-Cygwin-signal-Do-not-handle-signal-when-__SIGFLUSHFA.patch
for cygwin-3_5-branch.

Corinna, what do you think?

Jeremy,
could you please apply the attached patches:
0001-Cygwin-signal-Avoid-frequent-tls-lock-unlock-for-SIG.patch
0003-Cygwin-signal-Do-not-handle-signal-when-__SIGFLUSHFA.patch
against cygwin-3_5-branch and test if these fix the issue?

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0001-Cygwin-signal-Avoid-frequent-tls-lock-unlock-for-SIG.patch
URL: <https://cygwin.com/pipermail/cygwin-patches/attachments/20250118/2adb5d74/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0002-Revert-Cygwin-signal-Do-not-handle-signal-when-__SIG.patch
URL: <https://cygwin.com/pipermail/cygwin-patches/attachments/20250118/2adb5d74/attachment-0001.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0003-Cygwin-signal-Do-not-handle-signal-when-__SIGFLUSHFA.patch
URL: <https://cygwin.com/pipermail/cygwin-patches/attachments/20250118/2adb5d74/attachment-0002.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0004-Revert-Cygwin-signal-Fix-high-load-when-retrying-to-.patch
URL: <https://cygwin.com/pipermail/cygwin-patches/attachments/20250118/2adb5d74/attachment-0003.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ng-0003-Cygwin-signal-Do-not-handle-signal-when-__SIGFLUSHFA.patch
URL: <https://cygwin.com/pipermail/cygwin-patches/attachments/20250118/2adb5d74/attachment-0004.ksh>


More information about the Cygwin-patches mailing list