This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Unwarranted assumption in tst-waitid, or a kernel bug?


On Tue, Sep 21, 2010 at 8:43 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 09/21, Roland McGrath wrote:
>>
>> As far as I can tell, Linux has never had a guarantee like this. ?From a
>> cursory look at the code in a few versions, I think the differences
>> you've seen between kernel versions are due to scheduling changes, not
>> that the actual local constraints in the exit/SIGCHLD/wait code paths
>> have changed at all.
>
> Agreed.
>
>
> Paul, I guess that this test-case "fails" after kill(pid, SIGSTOP),
> right?

Yes, the failure is:

  missing SIGCHLD on stopped

And is coming from line 358 in posix/tst-waitid.c:

   334    expecting_sigchld = 1;
   335    if (kill (pid, SIGSTOP) != 0)
   336      {
   337        printf ("kill (%d, SIGSTOP): %m\n", pid);
   338        RETURN (EXIT_FAILURE);
   339      }
   340    pid_t wpid = waitpid (pid, &fail, WUNTRACED);
   341    if (wpid < 0)
   342      {
   343        printf ("waitpid WUNTRACED on stopped: %m\n");
   344        RETURN (EXIT_FAILURE);
   345      }
   346    else if (wpid != pid)
   347      {
   348        printf ("waitpid WUNTRACED on stopped returned %d != %d
(status %x)\n",
   349                wpid, pid, fail);
   350        RETURN (EXIT_FAILURE);
   351      }
   352    else if (!WIFSTOPPED (fail) || WIFSIGNALED (fail) || WIFEXITED (fail)
   353             || WIFCONTINUED (fail) || WSTOPSIG (fail) != SIGSTOP)
   354      {
   355        printf ("waitpid WUNTRACED on stopped: status %x\n", fail);
   356        RETURN (EXIT_FAILURE);
   357      }
   358    CHECK_SIGCHLD ("stopped", CLD_STOPPED, SIGSTOP);


> I am a bit surprised it never fails on 2.6.18. I think you can add
> a small delay into finish_stop() (before it takes tasklist_lock),
> then I believe it should fail the same way.

You are probably in better position to confirm this -- I don't usually
build kernels :-)


Anyway, assuming we all agree the assumption is unwarranted, what is
the correct way to fix tst-waitid.c ?

And while I have your attention, is it possible for the same problem
to manifest itself in rt/tst-mqueue5.c ?

Here the failure is "missing SIGRTMIN" at line 120:

   114    /* Parent calls mqsend (q), which should trigger notification.  */
   115
   116    (void) pthread_barrier_wait (b3);
   117
   118    if (rtmin_cnt != 2)
   119      {
   120        puts ("SIGRTMIN signal in child did not arrive");
   121        result = 1;
   122      }

(I have not yet tried to produce a small test case for this, but the
fact that signal delivery also appears to be delayed here makes me
think that it might be the same issue.)

Thanks!
-- 
Paul Pluzhnikov


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]