This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Fix race in tst-mqueue5
- From: "Paul E. Murphy" <murphyp at linux dot vnet dot ibm dot com>
- To: "Carlos O'Donell" <carlos at redhat dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Cc: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>
- Date: Wed, 13 Jan 2016 11:57:11 -0600
- Subject: Re: Fix race in tst-mqueue5
- Authentication-results: sourceware.org; auth=none
- References: <569552C6 dot 8050200 at linux dot vnet dot ibm dot com> <5695AF3D dot 3020303 at redhat dot com>
On 01/12/2016 07:58 PM, Carlos O'Donell wrote:
> On 01/12/2016 02:23 PM, Paul E. Murphy wrote:
>> This seems to fix the test for ppc, and probably others too.
>
> Had you considered the 'sleep (1);' on line 394 as another source of problems?
That is yet another problem, though seemingly less likely. I ran the test in a loop
on a relatively busy machine for a night and didn't catch any failures. I'm not sure
how or if it can be fixed.
> Must it be the case that by the time '(void) pthread_barrier_wait (b3);' returns,
> the SIGRTMIN has been delivered and handled? Why isn't 'sigwait' required to make
> this test operate without a race bewtween checking 'rtmin_cnt != 2' and the signal
> arriving and being handled?
I think there is an assumption that calling mqrecv() in do_test() will generate a
signal in the child process when the child is in one of the following states:
1. thr() blocked on barrier b3, do_child() is still approaching barrier b3.
2. do_child() blocked on barrier b3, thr() is still approaching barrier b3.
3. Both are waiting in barrier b3.
Masking the signal on do_child() guarantees thr() will handle the exception.
Furthermore, this assumes the signal to thr() cannot be delayed or blocked. I don't
know if that is a safe assumption. Though, the assumption is made elsewhere in the
test.
> While I agree that any fix that makes tst-mqueue5 fail less spuriously is a good
> thing, I'm curious about your review of the test as a whole (now that I've looked
> at it again).
Each call to pthread_barrier_wait should have a comment to quickly match it up
with the matching call in the other one or two functions :).
It does validate the documented behavior of mq, so it's a good test. If the test
is still troublesome, maybe the signals can be replaced with another means of
validating the asynchronous API.
Thanks,
Paul