This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: BZ 20822 :powerpc: race condition in __lll_unlock_elision


On Thu, 2016-11-17 at 15:46 +0100, Torvald Riegel wrote:
> On Tue, 2016-11-15 at 21:42 +0530, Rajalakshmi Srinivasaraghavan wrote:
> > The initial problem reported was memory corruption in MongoDB only on 
> > Ubuntu 16.04 ppc64le(not on Ubuntu 15).
> > The problem was not easily recreatable and debugging with many tools and 
> > creative ideas, the problem is narrowed down to lock elision in glibc.
> > Ubuntu 16.04 has glibc 2.23 with lock elision enabled by default which 
> > made the testcase fail only on 16.04.
> > 
> > As stated in BZ 20822, short description is
> > 
> > "The update of *adapt_count after the release of the lock causes a race 
> > condition. Thread A unlocks, thread B continues and returns, destroying 
> > mutex on stack,
> >   then gets into another function, thread A writes to *adapt_count and 
> > corrupts stack.The window is very narrow and requires that the
> > machine be in smt mode, so likely the two threads have to be on the same 
> > core in order to hit this"
> > 
> > Looking back the file changes in 
> > sysdeps/unix/sysv/linux/powerpc/elision-unlock.c, suspecting the commit
> >   https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=fadd2ad9cc36115440d50b0eae9299e65988917d which is introduced to improve
> > the performance.
> > 
> > Thinking of the following fixes.
> > 
> > 1) Update of adapt_count before the unlock. But I have still not 
> > identified if its going to affect the performance.
> 
> Because of the requirements on mutex destruction (and same for rwlock
> etc.), a thread must not access any data associated with the mutex after
> having unlocked it (this includes reads because it's correct to also
> unmap the memory holding the mutex after destroying the mutex). 
> 
> > 2) In elision-lock.c/elision-trylock.c update adapt_count right after 
> > the lock acquire at the end of the function.
> > In either case we have to be careful not to introduce another race 
> > condition.
> 
> Every access to mutex data outside of transactions must use atomic
> operations if there would be data races otherwise (according to the C11
> memory model).  I know that this isn't followed in the current mutex
> code, but it really should be, and your change qualifies as new code so
> please take care of this too.
> 
> I also think that if we can have concurrent accesses to a particular
> variable both inside and outside of transactions, we should use atomic
> accesses inside the transactions too, just for consistency.

To illustrate, please have a look at this patch:
https://sourceware.org/ml/libc-alpha/2016-12/msg00041.html

It changes x86_64's lock elision implementation to use atomic accesses
for adapt_count.  (It adds shorter-sized atomic_load_* and atomic_store*
functions, but you don't even need that on powerpc right now I think
because powerpc sets USE_ATOMIC_COMPILER_BUILTINS to 0 and thus does not
check sizeof on atomic loads and stores.)



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]