When building glibc on x86_64 linux with --enable-kernel set for 2.6.22 to 2.6.28 inclusive, the following tests fail: make[2]: *** [/build/glibc-build/nptl/tst-rwlock6.out] Error 1 make[2]: *** [/build/glibc-build/nptl/tst-rwlock7.out] Error 1 make[2]: *** [/build/glibc-build/nptl/tst-rwlock9.out] Error 1 make[2]: *** [/build/glibc-build/nptl/tst-rwlock11.out] Error 1 make[2]: *** [/build/glibc-build/nptl/tst-rwlock12.out] Error 11 make[2]: *** [/build/glibc-build/nptl/tst-rwlock14.out] Error 1 make[2]: *** [/build/glibc-build/nptl/tst-abstime.out] Error 1 This issue also results in crashes in various real-world applications. Looking at what is enabled at the failure boundaries indicates a futex issue: Support for private futexes was added in 2.6.22 Support for the FUTEX_CLOCK_REALTIME flag was added in 2.6.29 Confirming this is an issue with futex support, glibc built with one of the bad values for --enable-kernel (2.6.27) and manually adjusting the following defines: 1 - default - 2.6.22 - 2.6.28: # define __ASSUME_PRIVATE_FUTEX 1 # undef __ASSUME_FUTEX_CLOCK_REALTIME Glibc tests fail. 2 - default pre 2.6.22: # undef __ASSUME_PRIVATE_FUTEX # undef __ASSUME_FUTEX_CLOCK_REALTIME Glibc tests pass. 3 - default 2.6.29 and later: # define __ASSUME_PRIVATE_FUTEX 1 # define __ASSUME_FUTEX_CLOCK_REALTIME 1 Glibc tests pass. This issues does not occur on i686-pc-linux-gnu.
Naively trying to locate the source of this bug... Generating a list of files that have #ifdef/#ifndef on the __ASSUME_PRIVATE_FUTEX and __ASSUME_FUTEX_CLOCK_REALTIME defines (assuming this is not some more complex interaction) and are x86_64 specific (as this does not occur on i686 builds) gives: nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S nptl/sysdeps/unix/sysv/linux/x86_64/lowlevelrobustlock.S nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_timedrdlock.S nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_timedwrlock.S If we further assume that the bug requires a nested #ifdef for these two values, that restricts the issue to small parts of the last three files and given the test suite failures we can exclude the first of those.
This is no place to report such problems. *** This bug has been marked as a duplicate of bug 333 ***
I have shown the issue occurs on a specific platform and with specific values for --enable-kernel and shown it is a specific combination of defines that results in the issue. I would have thought that specific enough to be able to replicate the issue and for it not to be a #333 duplicate. What further information is needed to show this is a genuine glibc issue?
Created attachment 5208 [details] Fix stack imbalance under --assume-kernel=2.6.{22..29} in rwlock code
I'm seeing this as well. I've tracked it down to a bug in the cleanup code in pthread_rwlock_timedwrlock.S (causing a stack imbalance just before "retq") -- it uses __ASSUME_PRIVATE_FUTEX when deciding whether or not to clean up after the local variables (and saved register) created for __ASSUME_FUTEX_CLOCK_REALTIME. When these two are set differently, "retq" jumps off into never-never-land. There's a related bug in pthread_rwlock_timedrdlock.S, which emits the wrong CFI directives, but I don't think this will affect runtime. (Could be wrong though; I don't know a lot about CFI.) Attached is a patch that fixes both issues; with this, all crashing in the testsuite is gone.
Thanks. I can confirm that patch fixes the issues I was observing.
I've just spent some time debugging a crash in asterisk which I tracked down to the issue described here; my patch looks identical. Is there any reason why this is not yet in master (and at least 2.13 and 2.14).
*** Bug 13106 has been marked as a duplicate of this bug. ***
I checked in a patch.
guess only one site needed updating: http://sourceware.org/git/?p=glibc.git;a=commitdiff;h=1e4bd093e664f2889c48e63714583ef06b90d5b9
> guess only one site needed updating: Sort of. Only one site affects the generated machine code (and that site was fixed in the git change that you linked to), but git head still has broken CFI data at the other site. Which may or may not be an actual problem, depending on what happens. If the kernel tries to trace back into userspace from one of the other syscalls, the info still might be totally broken. But at least the code works now, which is a step up from before. I'm not very hopeful about the CFI data *ever* getting fixed, unfortunately. :-/
that sounds like a diff (if semi-related) bug. could you file a new one for us to track it ?
*** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Page where seen: http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla.