This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On 06/29/2018 08:54 AM, Stefan Liebler wrote:
On 01/26/2017 05:22 PM, Torvald Riegel wrote:On Thu, 2017-01-26 at 11:12 -0500, Carlos O'Donell wrote:On 01/26/2017 10:29 AM, Stefan Liebler wrote:It seems as a race between futex- and exit-syscall causes ESRCH result from futex-syscall.I'll have a closer look at this.I see those fails with Linux 4.8 / 4.9 running in a z/VM guest as well as with 4.6 on a LPAR (but less often).I've seen tst-robustpi7 and tst-robustpi8 failures on all hardware across a wide number of kernels, but never tst-robustpi4. https://sourceware.org/bugzilla/show_bug.cgi?id=19004 The robustpi support is certainly not very robust as Torvald's recent fixes show, and there still remains at least one design flaw that can't be fixed. e.g. https://sourceware.org/bugzilla/show_bug.cgi?id=14485The underlying problem for that bug does not affect PI+robust, just robust, I think. Unless I forgot about something, PI+robust should always use the kernel to unlock.
in the meantime, Florian Weimer could also reproduce this issue and opened the bugzilla Bug 23183 - tst-robustpi4 test failure (https://sourceware.org/bugzilla/show_bug.cgi?id=23183).I've also dig a bit deeper - see details in bugzilla - and was also able to reproduce it on intel.If the thread with locked mutex is executing the exit-syscall while the main-thread is executing the futex-syscall,then it could lead to this ESRCH return value of the futex-syscall which triggers the assertion.In this situation, the futex-syscall has already added the FUTEX_WAITERS bit to the lock-value and is then calling attach_to_pi_owner().The exit-syscall is now setting the lock-value to FUTEX_WAITERS | FUTEX_OWNER_DIED and is proceeding.attach_to_pi_owner() is now e.g. trying to get the owner-task and/or is testing if the owner is currently exiting. In those cases, ESRCH is returned!
Does the kernel look at the TID and determine that it no longer exists, or does it use the FUTEX_OWNER_DIED bit to detect this situation?
I'm worried that using the TID introduces a TID race here. Thanks, Florian
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |