There is some strange error in 20090124 snapshot: % uname -a Linux loki 2.6.29-rc2 #1 SMP PREEMPT Sun Jan 18 18:40:46 CET 2009 x86_64 GNU/Linux % id uid=1000(eugen) gid=1000(eugen) groups=0(root),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),44(video),46(plugdev),116(stapdev),1000(eugen) % ls -l /usr/bin/staprun -rwsr-xr-x 1 root root 31752 січ 25 03:10 /usr/bin/staprun % ./helloworld.stp -v Pass 1: parsed user script and 47 library script(s) in 280usr/0sys/304real ms. Pass 2: analyzed script: 1 probe(s), 1 function(s), 0 embed(s), 0 global(s) in 10usr/0sys/4real ms. Pass 3: using cached /home/eugen/.systemtap/cache/11/stap_11c0f8dddd8437f12d5b2ecdd542a4fd_325.c Pass 4: using cached /home/eugen/.systemtap/cache/11/stap_11c0f8dddd8437f12d5b2ecdd542a4fd_325.ko Pass 5: starting run. Error inserting module '/tmp/stapJa73Uz/stap_11c0f8dddd8437f12d5b2ecdd542a4fd_325.ko': File exists Retrying, after attempted removal of module stap_11c0f8dddd8437f12d5b2ecdd542a4fd_325 (rc 0) hello world ERROR: The effective user ID of staprun must be set to the root user. Check permissions on staprun and ensure it is a setuid root program. Pass 5: run completed in 0usr/10sys/122real ms. Pass 5: run failed. Try again with another '--vp 00001' option. Here staprun is suid-root, and stap is able to run staprun and it can remove old module and load new one (it displays "hello world"), but after that staprun complains that it is not suid-root and cannot remove the module. Everything works fine when run under root. I do not remember anything similar with 20090117 snapshot and 2.6.28 kernel.
After adding some debug print into staprun I have: uid = 1000, euid = 0, pid = 20241 hello world uid = 1000, euid = 1000, pid = 20241 ERROR: The effective user ID of staprun must be set to the root user. Check permissions on staprun and ensure it is a setuid root program.
We believe this is a recent regression in the kernel, possibly related to the user-credential patches to task_struct.
According to a git bisect of the kernel that I just finished, the regression is caused by the following kernel change: commit d84f4f992cbd76e8f39c488cf0c5d123843923b1 Author: David Howells <dhowells@redhat.com> Date: Fri Nov 14 10:39:23 2008 +1100 CRED: Inaugurate COW credentials
I've added a workaround for this bug in commit 69aa1bd. Originally staprun exec's stapio, which exec's staprun when it is time to remove the module. Now staprun exec's stapio, which forks when it is time to remove the module. The new child exec's staprun. The parent (stapio) waits for the child to finish, then exits.
Created attachment 3697 [details] test program 1
Created attachment 3698 [details] test program 2 source
Created attachment 3699 [details] test programs makefile
I've attached 2 small test programs and a Makefile that demonstrate this problem. While developing these test programs, I've discovered that the setuid doesn't take effect only when the 2nd program (stapio for systemtap, test2.c for the small test programs) creates a second thread.
Its time the problem is brought to lkml notice rather than working around it in SystemTap -- this clearly looks like a regression, unless SystemTap was depending on the feature's buggy behaviour earlier.
(In reply to comment #9) > Its time the problem is brought to lkml notice rather than working around it in > SystemTap -- this clearly looks like a regression, unless SystemTap was > depending on the feature's buggy behaviour earlier. I agree - I've filed redhat bugzilla #481783 against this and sent a message to lkml (<http://lkml.indiana.edu/hypermail/linux/kernel/0901.3/02268.html>) with the test programs included here.
This is also being tracked as linux kernel bug 12602 <http://bugzilla.kernel.org/show_bug.cgi?id=12602>
David Howells has posted a patch upstream that fixes this problem. I've verified that this works correctly on 2.6.29-0.99.rc4.git1.fc11.x86_64. I'll wait a week or so and remove the workaround.
I suggest waiting till the next major release of systemtap. The workaround present is costing us nothing.
The kernel bug has been fixed, and our backup workaround is in place.
from comment #13: > The workaround present is costing us nothing. ... it turns out the workaround permits a race condition between stapio exiting (thus releasing the .cmd fd) and "staprun -d" starting (and trying to open the .cmd fd).