Bug 14076 - PTHREAD_MUTEX_LOCK() in multiple threads RETURNING EOWNERDEAD
Summary: PTHREAD_MUTEX_LOCK() in multiple threads RETURNING EOWNERDEAD
Status: RESOLVED INVALID
Alias: None
Product: glibc
Classification: Unclassified
Component: nptl (show other bugs)
Version: 2.14
: P2 critical
Target Milestone: ---
Assignee: Siddhesh Poyarekar
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-08 03:40 UTC by zhenzhong.duan
Modified: 2014-06-13 14:06 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
code to test concurrency of thread calling pthread_mutex_lock (1.98 KB, text/x-c++src)
2012-05-08 03:40 UTC, zhenzhong.duan
Details
rewrite futexCase1_r1.cpp to futexCase1_r1.c (2.00 KB, application/octet-stream)
2012-06-08 06:02 UTC, zhenzhong.duan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description zhenzhong.duan 2012-05-08 03:40:37 UTC
Created attachment 6400 [details]
code to test concurrency of thread calling pthread_mutex_lock

When a thread that hold mutex lock dead, multiple other threads that call PTHREAD_MUTEX_LOCK() return EOWNERDEAD

From the POSIX docs
 (http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_03)

The implementation shall behave as if at all times there is at most one owner
of any mutex.

A thread that becomes the owner of a mutex is said to have "acquired" the
mutex and the mutex is said to have become "locked''; when a thread gives up
ownership of a mutex it is said to have "released" the mutex and the mutex is
said to have become "unlocked".

Robust mutexes provide a means to enable the implementation to notify other
threads in the event of a process terminating while one of its threads holds
a mutex lock. The next thread that acquires the mutex is notified about the
termination by the return value [EOWNERDEAD] from the locking function. The
notified thread can then attempt to recover the state protected by the mutex,
and if successful mark the state protected by the mutex as consistent by a
call to pthread_mutex_consistent(). If the notified thread is unable to
recover the state, it can declare the state as not recoverable by a call to
pthread_mutex_unlock() without a prior call to pthread_mutex_consistent().

To me, this implies that the thread that receives the EOWNERDEAD status has
also “acquired” the mutex (i.e. has locked the mutex). Given this, only one
thread should be able to receive the EOWNERDEAD notification (otherwise,
multiple threads have “acquired” the mutex – which contradicts the POSIX
descriptions above).

Attached test code futexCase1_r1.cpp,
$ g++ -Wall -O3 -m32 -march=i686 futexCase1-r1.cpp -o futexCase1_r1 –lpthread
$ futexCase1_r1
27658: created mutex: 0xf7fdc000
......
27944: pthread_mutex_consistent_np failed: 0xf7fdc000 22 Invalid argument
28032: pthread_mutex_consistent_np failed: 0xf7fdc000 22 Invalid argument
28072: pthread_mutex_consistent_np failed: 0xf7fdc000 22 Invalid argument
27658: Done! lock concurrency: 0, max: 5 

Based on the man-pages, pthread_mutex_consistent_np() should only fail if the
mutex supplied is invalid (not initialized, etc.), or if the mutex is NOT in
an inconsistent state.  Given this, my speculation for the failure is that
multiple pthread_mutex_lock() calls are being allowed to simultaneously
return (incorrectly) with the EOWNERDEAD status, causing some of the
subsequent pthread_mutex_consistent_np() calls to fail because the mutex
state has already been made consistent.

Lastly, from looking at the resulting max-concurrency (5 in this case) we see
that the code protected by the mutex is NOT being single threaded by the
mutex as expected.

We originally reproduce this bug in 2.5-81.el5_8.2. I also tried with fedora16 newest, still reproduce.
Comment 1 Rich Felker 2012-05-08 13:49:58 UTC
Could you simplify the test case a bit and narrow down the cause of the bug? I notice that you're using priority inheritance mutexes (which you didn't mention in the report) which have some different code paths, so it would be helpful to know if that's related to the problem.

Also, I'm not one of the glibc developers so you can take this or leave this, but it would be nice if test cases could be C rather than C++, especially when the C++ is gratuitous (in this case, just one use of a struct name as a type, and a few references instead of pointers). Part of my reason for asking this is that I like to test glibc testcases against my implementation of the C library and on some of my test systems I don't have a C++ development environment.
Comment 2 zhenzhong.duan 2012-06-08 06:02:35 UTC
Created attachment 6442 [details]
rewrite futexCase1_r1.cpp to futexCase1_r1.c
Comment 3 zhenzhong.duan 2012-06-08 06:04:59 UTC
Rewrite a c version. Per out customer's test, still see concurrency.

@ $ gcc -Wall -O3 -m32 -march=i686 futexCase1_r1.c -o futexCase1_r1 -lpthread
@ -D_GNU_SOURCE
@ $ ./futexCase1_r1
@ 8279: created mutex: 0xf7f1a000
@ 8419: pthread_mutex_consistent_np failed: 0xf7f1a000 22 Invalid argument
@ 8438: pthread_mutex_consistent_np failed: 0xf7f1a000 22 Invalid argument
@ 8439: pthread_mutex_consistent_np failed: 0xf7f1a000 22 Invalid argument
@ …
@ 8279: Done! lock concurrency: 0, max: 7
@ $
Comment 4 Siddhesh Poyarekar 2012-10-04 14:45:08 UTC
The description is misleading.  Your program does not use threads; it uses different processes using fork with mutexes in shared memory with pshared set.  I have modified your program to the following and it shows that the problem is seen only with forked processes and not with threads:

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <pthread.h>
#include <errno.h>
#include <string.h>
#include <stdlib.h>
#include <wait.h>
#include <sys/mman.h>
#include <assert.h>
#include <pthread.h>

#define zero_or_die(ret, str) \
  ({ \
     if (ret) \
       { \
         printf ("%s: %s\n", str, strerror (ret)); \
	 abort (); \
       } \
  })

void
initLock (pthread_mutex_t * mutex)
{
  int ok = 0;
  pthread_mutexattr_t ma;

  ok = pthread_mutexattr_init (&ma);
  zero_or_die (ok, "pthread_mutexattr_init");

  ok = pthread_mutexattr_settype (&ma, PTHREAD_MUTEX_ERRORCHECK);
  zero_or_die (ok, "pthread_mutexattr_settype");

#ifdef FORK
  ok = pthread_mutexattr_setpshared (&ma, PTHREAD_PROCESS_SHARED);
  zero_or_die (ok, "pthread_mutexattr_setpshared");
#endif

  ok = pthread_mutexattr_setprotocol (&ma, PTHREAD_PRIO_INHERIT);
  zero_or_die (ok, "pthread_mutexattr_setprotocol");

  ok = pthread_mutexattr_setrobust (&ma, PTHREAD_MUTEX_ROBUST_NP);
  zero_or_die (ok, "pthread_mutexattr_setrobust");

  ok = pthread_mutex_init (mutex, &ma);
  zero_or_die (ok, "pthread_mutex_init");

  pthread_mutexattr_destroy (&ma);
  printf ("%u: created mutex: %p\n", getpid (), mutex);
}

int
lock (pthread_mutex_t * mutex)
{
  int rc = pthread_mutex_lock (mutex);
  if (rc == EOWNERDEAD)
    {
      int crc = pthread_mutex_consistent (mutex);
      if (crc != 0)
	fprintf (stderr,
		 "%u: pthread_mutex_consistent failed: %p %d %s\n",
		 getpid (), mutex, crc, strerror (crc));
    }

  else if (rc != 0)
    fprintf (stderr, "%u: pthread_mutex_lock failed: %p %d %s\n",
	     getpid (), mutex, rc, strerror (rc));
  return rc;
}

void
unlock (pthread_mutex_t * mutex)
{
  int rc = pthread_mutex_unlock (mutex);
  if (rc != 0)
    fprintf (stderr, "%u: pthread_mutex_unlock failed: %p %d %s\n",
	     getpid (), mutex, rc, strerror (rc));
}

void
child (pthread_mutex_t * mutex)
{
  lock (mutex);
  sched_yield ();

#ifdef VERBOSE
  fprintf (stderr, "%d process suicide - abandoning lock %p\n", getpid (),
	   mutex);

#endif /*  */

  // abandon the lock (i.e. dont unlock it)
  // unlock(mutex);
}

int
main (int argc, char **argv)
{

#define iterations 500
#define parProcs 10
  int i;
#ifdef FORK
  void *shared = mmap (NULL, sizeof (pthread_mutex_t), PROT_WRITE | PROT_READ,
		       MAP_SHARED | MAP_ANON, -1, 0);
  assert (shared != NULL);
  memset (shared, 0, sizeof (pthread_mutex_t));
  pthread_mutex_t *mutex = (pthread_mutex_t *) shared;
#else
  pthread_mutex_t *mutex = malloc (sizeof (pthread_mutex_t));
  pthread_t t[iterations];
#endif

  initLock (mutex);
  lock (mutex);
  unlock (mutex);

  for (i = 0; i < (iterations + parProcs); i++)
    {
      if ((i < iterations))
	{
#ifdef FORK
	  if (fork () == 0)
	    {
	      child (mutex);
	      return -1;
	    }
#else
	  pthread_create (&t[i], NULL, child, mutex);
#endif
	}
    }
  return 0;
}

To build with fork (a variant of your original reproducer):

gcc -DFORK foo.c -pthread

To build it to use pthreads and non-shared memory:

gcc foo.c -pthread

Another thing to note is that the problem is reproducible only when PI is set.  This is important because in the PI case, it is the kernel that synchronizes the threads (or in this case, processes) with the FUTEX_LOCK_PI operation.  It may well end up being a kernel futex bug and not a glibc bug.
Comment 5 Siddhesh Poyarekar 2012-10-05 06:21:03 UTC
I misunderstood the problem first.  The race is present in threads also, albeit much less evident somehow.
Comment 6 zhenzhong.duan 2012-10-09 06:13:52 UTC
thanks for your reply.
If you need other info, just let me know.
Comment 7 Siddhesh Poyarekar 2012-10-11 15:05:19 UTC
As I suspected, this is a bug in the kernel.  I have posted a patch that works for me:

http://lkml.indiana.edu/hypermail/linux/kernel/1210.1/02508.html

So this is not a bug in glibc.
Comment 8 Jackie Rosen 2014-02-16 18:23:47 UTC Comment hidden (spam)