This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Unwarranted assumption in tst-waitid, or a kernel bug?
On 09/21, Paul Pluzhnikov wrote:
>
> 334 expecting_sigchld = 1;
> 335 if (kill (pid, SIGSTOP) != 0)
> 336 {
> 337 printf ("kill (%d, SIGSTOP): %m\n", pid);
> 338 RETURN (EXIT_FAILURE);
> 339 }
> 340 pid_t wpid = waitpid (pid, &fail, WUNTRACED);
> 341 if (wpid < 0)
> 342 {
> 343 printf ("waitpid WUNTRACED on stopped: %m\n");
> 344 RETURN (EXIT_FAILURE);
> 345 }
> 346 else if (wpid != pid)
> 347 {
> 348 printf ("waitpid WUNTRACED on stopped returned %d != %d (status %x)\n",
> 349 wpid, pid, fail);
> 350 RETURN (EXIT_FAILURE);
> 351 }
> 352 else if (!WIFSTOPPED (fail) || WIFSIGNALED (fail) || WIFEXITED (fail)
> 353 || WIFCONTINUED (fail) || WSTOPSIG (fail) != SIGSTOP)
> 354 {
> 355 printf ("waitpid WUNTRACED on stopped: status %x\n", fail);
> 356 RETURN (EXIT_FAILURE);
> 357 }
> 358 CHECK_SIGCHLD ("stopped", CLD_STOPPED, SIGSTOP);
>
> Anyway, assuming we all agree the assumption is unwarranted, what is
> the correct way to fix tst-waitid.c ?
Perhaps you can change check_sigchld(). It does
if (expecting_sigchld) {
BUG! it should be already recieved.
}
We all seem to agree that this expectation is wrong. Perhaps, you can
change this code to do something like
sigprocmask(SIG_BLOCK, sigmask(SIGCHLD));
if (expecting_sigchld) {
alarm(SOME_TIME);
sigsuspend(emptyset);
if (expecting_sigchld) {
*ok = EXIT_FAILURE;
...
}
}
expecting_sigchld = 0;
sigprocmask(SIG_UNBLOCK, sigmask(SIGCHLD));
Or, you can kill expecting_sigchld use something simple like pipe() to wait
for SIGCHLD.
> And while I have your attention, is it possible for the same problem
> to manifest itself in rt/tst-mqueue5.c ?
>
> Here the failure is "missing SIGRTMIN" at line 120:
>
> 114 /* Parent calls mqsend (q), which should trigger notification. */
> 115
> 116 (void) pthread_barrier_wait (b3);
> 117
> 118 if (rtmin_cnt != 2)
> 119 {
> 120 puts ("SIGRTMIN signal in child did not arrive");
> 121 result = 1;
> 122 }
Well. Until today I knew absolutely nothing about ipc/mqueue.c, so you
shouldn't trust me (and in fact I spent some time trying to find the
implementation of mq_send/etc in glibc's sources ;)
> (I have not yet tried to produce a small test case for this, but the
> fact that signal delivery also appears to be delayed here makes me
> think that it might be the same issue.)
Yes, it would be nice to have the small test-case (even in pseudo-code).
Of course I don't really understand tst-mqueue5.c, it is complex. So
I assume that the failing part can be described as:
- The test-case does mq_notify(q, SIGEV_SIGNAL/SIGRTMIN).
This means that mq_send() should send SIGRTMIN.
- It expects that SIGRTMIN should be already recieved
after mqrecv() succeeds.
At first glance, sys_mq_timedsend() and sys_mq_timedreceive() use
the same info->lock and should be serialized, the signal is send from
under this lock too. IOW, it is not possible that sys_mq_timedreceive()
sees mq_curmsgs != 0 but doesn't see the result of kill_pid_info().
However. I can easily misinterpret this code, but it seems that
sys_mq_timedsend() doesn't necessarily sends a signal if the caller
of sys_mq_timedreceive() already sleeps ?
sys_mq_timedsend:
receiver = wq_get_first_waiter(info, RECV);
if (receiver) {
pipelined_send(info, msg_ptr, receiver);
(if we find a sleeper, we do not send the notification, we just
wake_up this sleeper)
} else {
/* adds message to the queue */
msg_insert(msg_ptr, info);
__do_notify(info);
}
(this sends SIGRTMIN)
man mq_timedsend says nothing about this. Could you please check
that the test-case is correct or explain what I have missed?
Also. This test is multithreaded, but __do_notify() sends sigev_signo
to the thread group. This means that another thread can be choosen as
a target for this signal, and in this case the thread which checks
rtmin_cnt != 2 can obviously race with reciever.
Oleg.