Bug 3232 - unable to allocate memory for context on rawhide x86_64
Summary: unable to allocate memory for context on rawhide x86_64
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: translator (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Martin Hunt
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-09-20 15:44 UTC by William Cohen
Modified: 2006-09-26 17:04 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description William Cohen 2006-09-20 15:44:17 UTC
When trying to run the "make installcheck" tests on a updated x86_64 rawhide/fc6
machine many of the tests fail, even simple ones like
src/testsuite/systemtap.base/add.stp.

 ./stap -k -vv
/home/wcohen/stap_testing_200609201420/src/testsuite/systemtap.base/add.stp

towards the end gives the following error message


Running sudo
/home/wcohen/stap_testing_200609201420/install/libexec/systemtap/stpd -u wcohen
-d 25459 /tmp/stap4llS8n/stap_25459.ko
insmod: error inserting '/tmp/stap4llS8n/stap_25459.ko': -1 Cannot allocate memory

Doing a narrows the problem down to something that change between september 18
and september 19:

20060914 Worked
20060918 Worked /tmp/stapCto3iJ/stap_26001.c
20060919 Failed /tmp/stapLIgdOB/stap_26093.c
20060920 Failed

There do not seem to be obvious differences between the .c files used to
generate the modules that would cause a problem. However, the problem seems to
follow the compile module, .ko:


$ sudo /home/wcohen/stap_testing_200609180830/install/libexec/systemtap/stpd -u
wcohen -d 26455 /tmp/stapCto3iJ/stap_26001.ko
systemtap starting probe
systemtap ending probe
systemtap test success
$ sudo /home/wcohen/stap_testing_200609180830/install/libexec/systemtap/stpd -u
wcohen -d 26455 /tmp/stapLIgdOB/stap_26093.ko
insmod: error inserting '/tmp/stapLIgdOB/stap_26093.ko': -1 Cannot allocate memory
ERROR, couldn't insmod probe module /tmp/stapLIgdOB/stap_26093.ko
Comment 1 Frank Ch. Eigler 2006-09-20 18:01:24 UTC
There were no translator changes over the last few days,
but several tapset & runtime changes were checked in.
Comment 2 William Cohen 2006-09-20 18:26:05 UTC
After I filed the bz, I realized that translator was unlikely cause of the
problem. The generated C code was very similar between working and non working
versions. With all the tapsets being translated into C by the translator it
seems like the runtime would be the like cause. The following change set looks
like a possible canidate:

PatchSet 1154
Date: 2006/09/18 12:44:19
Author: hunt
Branch: HEAD
Tag: (none)
Log:
2006-09-18  Martin Hunt  <hunt@redhat.com>

        * print.c (_stp_print_flush): Rewrite so one version works for
        relayfs or procfs. Use proper per-cpu functions.
        (_stp_reserve_bytes): New function. Reserve bytes in the output buffer.
        (_stp_print_binary): New function. Write a variable number of
        64-bit values directly into the output buffer.

        * string.c (_stp_sprintf): Rewrite using new per-cpu buffers.
        (_stp_vsprintf): Ditto.
        (_stp_string_cat_cstr): Ditto.
        (_stp_string_cat_char): Ditto.

        * runtime.h: Set defaults for MAXTRYLOCK and TRYLOCKDELAY to make
        runtime tests in bench2 happy.

Doing a binary search to find out which checkin is causing the problem.


Comment 3 William Cohen 2006-09-20 18:34:39 UTC
The RHEL4 x86_64 tests ran fine on 2.6.9-42.0.2.ELsmp, so the particular kernel
plays some part in problem.
Comment 4 Martin Hunt 2006-09-20 18:43:53 UTC
Subject: Re:  unable to allocate memory for context on
	rawhide x86_64

The big change in memory allocation I made was using DEFINE_PER_CPU. I'm
allocating a bit over 8K per cpu. Previously I was just declaring a
static array indexed by cpu number.


Comment 5 Martin Hunt 2006-09-20 18:51:56 UTC
OK, this is my bug.  Need to find a percpu allocation that works for modules on
all kernels.
Comment 6 Martin Hunt 2006-09-21 17:59:48 UTC
Analysis: DEFINE_PER_CPU puts variables in the kernel's .data.percpu section.
This section is fixed in size at PERCPU_ENOUGH_ROOM, currently defaulting to
32K.  On 
i686 2.6.17-1.2187_FC5smp, the kernel .data.percpu section is 0x4ce4 in size. On
x86_64, it is 0x6708. This does not leave room on x86_64 for the slightly more
that 8K that systemtap modules want. Even worse, multiple systemtap modules
certainly won't work on either kernel.  So using DEFINE_PER_CPU is a very bad
idea as it is currently implemented. 

Comment 7 Martin Hunt 2006-09-26 17:04:39 UTC
Checked in a fix to this last week.