[PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5

Jon Turney jon.turney@dronecode.org.uk
Tue Jul 18 11:20:18 GMT 2023


On 17/07/2023 16:41, Corinna Vinschen wrote:
> On Jul 17 16:21, Corinna Vinschen wrote:
>> On Jul 17 12:51, Jon Turney wrote:
>>> On 17/07/2023 12:05, Corinna Vinschen wrote:
>>>> diff --git a/winsup/cygwin/thread.cc b/winsup/cygwin/thread.cc
>>>> index f614e01c42f6..fceb9bda1806 100644
>>>> --- a/winsup/cygwin/thread.cc
>>>> +++ b/winsup/cygwin/thread.cc
>>>> @@ -546,6 +546,13 @@ pthread::exit (void *value_ptr)
>>>>      class pthread *thread = this;
>>>>      _cygtls *tls = cygtls;	/* Save cygtls before deleting this. */
>>>> +  /* Deferred cancellation still pending? */
>>>> +  if (canceled)
>>>> +    {
>>>> +      WaitForSingleObject (cancel_event, INFINITE);
>>>> +      value_ptr = PTHREAD_CANCELED;
>>>> +    }
>>>> +
>>>>      // run cleanup handlers
>>>>      pop_all_cleanup_handlers ();
>>>> What do you think?
>>>
>>> I mean, by your own interpretation of the standard, this isn't required,
>>> because we're allowed to take arbitrarily long to deliver the async
>>> cancellation, and in this case, we took so long that the thread exited
>>> before it happened, too bad...
>>
>> True enough!
>>
>>> It doesn't seem a bad addition,
>>
> Actually, it seems we actually *have* to do this.  I just searched
> for more info on that problem and, to my surprise, I found this in the
> most obvious piece of documentation:
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_exit.html
> 
> Quote:
> 
>    As the meaning of the status is determined by the application (except
>    when the thread has been canceled, in which case it is
>    PTHREAD_CANCELED), [...]
> 
>> On second thought...
>>
>> One thing bugging me is this:
> 
> This is still a bit fuzzy, though.  I'd appreciate any input.
> 
>> Looking into pthread::cancel we have this order of things:
>>
>>      // cancel deferred
>>      mutex.unlock ();
>>      canceled = true;
>>      SetEvent (cancel_event);
>>      return 0;
>>
>> The canceled var is set before the SetEvent call.
>> What if the thread is terminated after canceled is set to true but
>> before SetEvent is called?
>>
>> pthread::testcancel claims:
>>
>>    We check for the canceled flag first. [...]
>>    Only if the thread is marked as canceled, we wait for cancel_event
>>    being really set, on the off-chance that pthread_cancel gets
>>    interrupted before calling SetEvent.
>>
>> Neat idea to speed up the code, but doesn't that mean we have a
>> potential deadlock, especially given that pthread::testcancel calls WFSO
>> with an INFINITE timeout?

I'm not sure I follow: another thread sets cancelled = true, just before 
we hit pthread::testcancel(), so we go into the WFSO, but then the other 
thread continues, signals cancel_event and everything's fine.

What meaning are you assigning to "interrupted" here?

Are we worried about the thread calling pthread_cancel being cancelled 
itself?

>> And if so, how do we fix this?  Theoretically, the most simple
>> solution might be to call SetEvent before setting the canceled
>> variable, but in fact we would have to make setting canceld
>> and cancel_event an atomic operation.

Well, yeah, that is required for them to be coherent. But we have a 
mutex on the thread object for that purpose, and I don't quite see why 
it's released so early here.



More information about the Cygwin-patches mailing list