When trying to run the "make installcheck" tests on a updated x86_64 rawhide/fc6 machine many of the tests fail, even simple ones like src/testsuite/systemtap.base/add.stp. ./stap -k -vv /home/wcohen/stap_testing_200609201420/src/testsuite/systemtap.base/add.stp towards the end gives the following error message Running sudo /home/wcohen/stap_testing_200609201420/install/libexec/systemtap/stpd -u wcohen -d 25459 /tmp/stap4llS8n/stap_25459.ko insmod: error inserting '/tmp/stap4llS8n/stap_25459.ko': -1 Cannot allocate memory Doing a narrows the problem down to something that change between september 18 and september 19: 20060914 Worked 20060918 Worked /tmp/stapCto3iJ/stap_26001.c 20060919 Failed /tmp/stapLIgdOB/stap_26093.c 20060920 Failed There do not seem to be obvious differences between the .c files used to generate the modules that would cause a problem. However, the problem seems to follow the compile module, .ko: $ sudo /home/wcohen/stap_testing_200609180830/install/libexec/systemtap/stpd -u wcohen -d 26455 /tmp/stapCto3iJ/stap_26001.ko systemtap starting probe systemtap ending probe systemtap test success $ sudo /home/wcohen/stap_testing_200609180830/install/libexec/systemtap/stpd -u wcohen -d 26455 /tmp/stapLIgdOB/stap_26093.ko insmod: error inserting '/tmp/stapLIgdOB/stap_26093.ko': -1 Cannot allocate memory ERROR, couldn't insmod probe module /tmp/stapLIgdOB/stap_26093.ko
There were no translator changes over the last few days, but several tapset & runtime changes were checked in.
After I filed the bz, I realized that translator was unlikely cause of the problem. The generated C code was very similar between working and non working versions. With all the tapsets being translated into C by the translator it seems like the runtime would be the like cause. The following change set looks like a possible canidate: PatchSet 1154 Date: 2006/09/18 12:44:19 Author: hunt Branch: HEAD Tag: (none) Log: 2006-09-18 Martin Hunt <hunt@redhat.com> * print.c (_stp_print_flush): Rewrite so one version works for relayfs or procfs. Use proper per-cpu functions. (_stp_reserve_bytes): New function. Reserve bytes in the output buffer. (_stp_print_binary): New function. Write a variable number of 64-bit values directly into the output buffer. * string.c (_stp_sprintf): Rewrite using new per-cpu buffers. (_stp_vsprintf): Ditto. (_stp_string_cat_cstr): Ditto. (_stp_string_cat_char): Ditto. * runtime.h: Set defaults for MAXTRYLOCK and TRYLOCKDELAY to make runtime tests in bench2 happy. Doing a binary search to find out which checkin is causing the problem.
The RHEL4 x86_64 tests ran fine on 2.6.9-42.0.2.ELsmp, so the particular kernel plays some part in problem.
Subject: Re: unable to allocate memory for context on rawhide x86_64 The big change in memory allocation I made was using DEFINE_PER_CPU. I'm allocating a bit over 8K per cpu. Previously I was just declaring a static array indexed by cpu number.
OK, this is my bug. Need to find a percpu allocation that works for modules on all kernels.
Analysis: DEFINE_PER_CPU puts variables in the kernel's .data.percpu section. This section is fixed in size at PERCPU_ENOUGH_ROOM, currently defaulting to 32K. On i686 2.6.17-1.2187_FC5smp, the kernel .data.percpu section is 0x4ce4 in size. On x86_64, it is 0x6708. This does not leave room on x86_64 for the slightly more that 8K that systemtap modules want. Even worse, multiple systemtap modules certainly won't work on either kernel. So using DEFINE_PER_CPU is a very bad idea as it is currently implemented.
Checked in a fix to this last week.