|Summary:||O_ATOMICLOOKUP vs O_CLOEXEC problems with RHEL4 and RHEL5 kernels|
|Product:||glibc||Reporter:||John Salmon <john>|
|Component:||libc||Assignee:||Ulrich Drepper <drepper.fsp>|
|Attachments:||a new test for opendir|
Description John Salmon 2007-10-28 01:21:38 UTC
Comment 1 Ulrich Drepper 2007-10-28 04:48:23 UTC
There will be no work-around to kernel bugs.
Comment 2 John Salmon 2007-10-29 03:26:24 UTC
Created attachment 2064 [details] a new test for opendir Fails when run with RHEL kernel. Passes when run with 2.6.18 kernel.
Comment 3 John Salmon 2007-10-29 03:28:06 UTC
Fair enough - glibc doesn't work around kernel bugs. How about widely deployed kernel "enhancements"? It turns out that the problem is that the machine on which opendir fails has a RedHat EL kernel with TUX enhancements, and that kernel (and presumably thousands like it) was compiled with the following in fcntl.h: #define O_NOATIME 01000000 #define O_ATOMICLOOKUP 02000000 /* TUX */ So my kernel thinks that the 02000000 bit of OFLAGS is a request for an ATOMICLOOKUP, but glibc (and the linux main line kernels since 2.6.something) thinks that it's a request to set the close-on-exec bit. Wonderful. I can understand if glibc mainenance team simply refuses to deal with non-standard kernels. You have to draw the line somewhere. But at least let's have a test so people who run 'make check' won't think they've got a working library when they don't. I've attached 'opendir-tst2.c' that fails on my RHEL system but that works fine on my 2.6.18 system. Note that the first opendir succeeds - we just created tmpXXXX, so it's very likely in the dentry_cache, and hence O_ATOMICLOOKUP has no problem. Trying to opendir("tmpXXX/doesnotexist") on the other hand is a miss in the dentry cache and fails the ATOMICLOOKUP test, leading to a non-standard errno which we can test for. Lucky for us that the TUX patches set errno to the crazy value of 530 when the dentry cache lookup fails. If not for that, I wouldn't know how to reliably reproduce the problem.
Comment 4 Ulrich Drepper 2007-10-29 04:25:58 UTC
Stop reopening. This is no bug. There will be no support for nonstandard kernels.
Comment 5 John Salmon 2007-10-29 04:41:05 UTC
"glibc does not support nonstandard kernels". Is that a reason to ignore a straightforward test, modeled after the other tests in dirent/, that passes when glibc is working correctly and that fails on some systems which happen to be unsupported?
Comment 6 Axel Thimm 2007-12-28 01:53:51 UTC
Note that this is still the case with RHEL5 as well. I agree that vendors undefining or redefining constants is a very bad thing, but at this point this kind of setup is really widely deployed. Given that Ulrich is even working for that vendor, could there be some solution/workaround for RHEL4/RHEL5 users? Maybe something like the *ASSUME_KERNEL environment setting? A consequence of this bug is that all build systems based on RHEL5 carrying Fedora chroots or for that matter any glibc 2.7 system are randomly breaking with unknown error 530 (actually I don't understand why this is random and not always, but that's probably another story). Other systems seem to suffer in a similar way and browsing through google's "unknown error 530" hits one sees that no user is even close to suspecting a kernel/glibc ABI incompatibility. Is there any way to keep those Fedora 8/9 chroots running on a RHEL5 kernel? I'm reopening not to pin that as a glibc bug, but as a request for a workaround or advice for action to take. People desperately googling for "unknown error 530" (like I did) will eventually find this report and would like to see what they can do to fix it. Help! :) I also filed this bug against the vendor as he should also take action to fix these problems with the next kernel release (which according to the vendor's schedule would happen the earliest in three months after a submission): https://bugzilla.redhat.com/show_bug.cgi?id=426890 Thanks for any advice in advance!
Comment 7 Ulrich Drepper 2007-12-28 03:45:07 UTC
Stop reopening the bug. This is entirely a kernel problem. Just use correct kernels.
Comment 8 Axel Thimm 2007-12-28 09:18:37 UTC
(In reply to comment #7) > Stop reopening the bug. This is entirely a kernel problem. Just use correct > kernels. So the recommendation is to not use RHEL???
Comment 9 Jakub Jelinek 2007-12-28 09:35:26 UTC
Using correct kernels doesn't imply that. You can get fixed kernels for RHEL5 e.g. from http://people.redhat.com/dzickus/el5/62.el5/ and it will surely eventually make it into official updates.
Comment 10 Axel Thimm 2007-12-28 19:53:28 UTC
Thanks Jakub! I updated the bugzilla.redhat.com entry with that information and will knock myself out with the 62 kernels :)