This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Unwarranted assumption in tst-waitid, or a kernel bug?


On 09/21, Paul Pluzhnikov wrote:
>
>    334    expecting_sigchld = 1;
>    335    if (kill (pid, SIGSTOP) != 0)
>    336      {
>    337        printf ("kill (%d, SIGSTOP): %m\n", pid);
>    338        RETURN (EXIT_FAILURE);
>    339      }
>    340    pid_t wpid = waitpid (pid, &fail, WUNTRACED);
>    341    if (wpid < 0)
>    342      {
>    343        printf ("waitpid WUNTRACED on stopped: %m\n");
>    344        RETURN (EXIT_FAILURE);
>    345      }
>    346    else if (wpid != pid)
>    347      {
>    348        printf ("waitpid WUNTRACED on stopped returned %d != %d (status %x)\n",
>    349                wpid, pid, fail);
>    350        RETURN (EXIT_FAILURE);
>    351      }
>    352    else if (!WIFSTOPPED (fail) || WIFSIGNALED (fail) || WIFEXITED (fail)
>    353             || WIFCONTINUED (fail) || WSTOPSIG (fail) != SIGSTOP)
>    354      {
>    355        printf ("waitpid WUNTRACED on stopped: status %x\n", fail);
>    356        RETURN (EXIT_FAILURE);
>    357      }
>    358    CHECK_SIGCHLD ("stopped", CLD_STOPPED, SIGSTOP);
>
> Anyway, assuming we all agree the assumption is unwarranted, what is
> the correct way to fix tst-waitid.c ?

Perhaps you can change check_sigchld(). It does

	if (expecting_sigchld) {
		BUG! it should be already recieved.
	}

We all seem to agree that this expectation is wrong. Perhaps, you can
change this code to do something like

	sigprocmask(SIG_BLOCK, sigmask(SIGCHLD));

	if (expecting_sigchld) {
		alarm(SOME_TIME);
		sigsuspend(emptyset);

		if (expecting_sigchld) {
			*ok = EXIT_FAILURE;
			...
		}
	}

	expecting_sigchld = 0;

	sigprocmask(SIG_UNBLOCK, sigmask(SIGCHLD));

Or, you can kill expecting_sigchld use something simple like pipe() to wait
for SIGCHLD.

> And while I have your attention, is it possible for the same problem
> to manifest itself in rt/tst-mqueue5.c ?
>
> Here the failure is "missing SIGRTMIN" at line 120:
>
>    114    /* Parent calls mqsend (q), which should trigger notification.  */
>    115
>    116    (void) pthread_barrier_wait (b3);
>    117
>    118    if (rtmin_cnt != 2)
>    119      {
>    120        puts ("SIGRTMIN signal in child did not arrive");
>    121        result = 1;
>    122      }

Well. Until today I knew absolutely nothing about ipc/mqueue.c, so you
shouldn't trust me (and in fact I spent some time trying to find the
implementation of mq_send/etc in glibc's sources ;)

> (I have not yet tried to produce a small test case for this, but the
> fact that signal delivery also appears to be delayed here makes me
> think that it might be the same issue.)

Yes, it would be nice to have the small test-case (even in pseudo-code).
Of course I don't really understand tst-mqueue5.c, it is complex. So
I assume that the failing part can be described as:

	- The test-case does mq_notify(q, SIGEV_SIGNAL/SIGRTMIN).
	  This means that mq_send() should send SIGRTMIN.

	- It expects that SIGRTMIN should be already recieved
	  after mqrecv() succeeds.

At first glance, sys_mq_timedsend() and sys_mq_timedreceive() use
the same info->lock and should be serialized, the signal is send from
under this lock too. IOW, it is not possible that sys_mq_timedreceive()
sees mq_curmsgs != 0 but doesn't see the result of kill_pid_info().

However. I can easily misinterpret this code, but it seems that
sys_mq_timedsend() doesn't necessarily sends a signal if the caller
of sys_mq_timedreceive() already sleeps ?

	sys_mq_timedsend:

		receiver = wq_get_first_waiter(info, RECV);
		if (receiver) {
			pipelined_send(info, msg_ptr, receiver);

(if we find a sleeper, we do not send the notification, we just
 wake_up this sleeper)

		} else {
			/* adds message to the queue */
			msg_insert(msg_ptr, info);
			__do_notify(info);
		}

(this sends SIGRTMIN)

man mq_timedsend says nothing about this. Could you please check
that the test-case is correct or explain what I have missed?


Also. This test is multithreaded, but __do_notify() sends sigev_signo
to the thread group. This means that another thread can be choosen as
a target for this signal, and in this case the thread which checks
rtmin_cnt != 2 can obviously race with reciever.

Oleg.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]