Bug 3857

Summary: Flight Recorder on Memory
Product: systemtap Reporter: Masami Hiramatsu <masami.hiramatsu.pt>
Component: runtimeAssignee: Martin Hunt <hunt>
Status: RESOLVED FIXED    
Severity: enhancement CC: mhiramat, soshima
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Attachments: flight-recorder in kernel memory
support overwrite mode and dump buffer option
add "-O" (overwrite mode) option to stap command
Automatically using overwrite mode while the script runs in flight-recorder mode.

Description Masami Hiramatsu 2007-01-11 11:30:25 UTC
Currently, the SystemTap�fs module is unloaded when the user
daemon (staprun) exits. I think this behavior is not good for the
flight recorder because the staprun will suddenly be terminated
by being involved in the user space crash.
So I suggest introducing the on-memory flight recorder mode to
the SystemTap. Under this mode, staprun just launches (load and 
run) the kernel module. After that it�fs detached from the 
module soon. If user wants to get the output of this module,
attach the staprun to the module again.

I'll attach a series of the prototype patches which add some options
(see below) to the staprun and the runtime for supporting this feature.
-L: Launch the pre-compiled module and detach form it.
-A: Attach to the detached module. Use without -L.
-D: Detach from the module immediately. Use with -A.

Usage Sample:
1. compile a script.
 $ stap -p4 -k -m flightrec flightrec.stp (you can add '-b' if want)
2. launch the script.
 $ staprun -L flightrec.ko
3. attach, read, and detach
 $ staprun -AD flightrec
4. attach and exit
 $ staprun -A flightrec
 ^C

I believe this feature is very useful for system administrators
if they use it with the kernel dump mechanism.

thanks,
Comment 1 Masami Hiramatsu 2007-01-11 11:32:48 UTC
Created attachment 1486 [details]
flight-recorder in kernel memory
Comment 2 Frank Ch. Eigler 2007-01-22 20:12:41 UTC
Yes, we've been looking for an approach to this problem for some time.

(In reply to comment #1)
> Created an attachment (id=1486)
> flight-recorder in kernel memory

I'm sorry, the attachment appears to be unreadable.
Could you reattach it in plain ASCII, or just post it
to the mailing list?
Comment 3 Masami Hiramatsu 2007-01-23 06:00:31 UTC
Subject: Re:  Flight Recorder on Memory

Hi,

fche at redhat dot com wrote:
> ------- Additional Comments From fche at redhat dot com  2007-01-22 20:12 -------
> Yes, we've been looking for an approach to this problem for some time.
> 
> (In reply to comment #1)
>> Created an attachment (id=1486)
>  --> (http://sourceware.org/bugzilla/attachment.cgi?id=1486&action=view)
>> flight-recorder in kernel memory
> 
> I'm sorry, the attachment appears to be unreadable.
> Could you reattach it in plain ASCII, or just post it
> to the mailing list?

OK, I attach it to this mail.

Thanks,

transport-prevent_double_init_relay.patch 
transport-support_flightrecorder.patch
staprun-add_launch_only.patch 
staprun-add_attach_module.patch
staprun-add_detach_module.patch
---
 runtime/stpd/librelay.c |   58 +++++++++++++++++++++++++-----------------------
 runtime/stpd/stpd.c     |   14 +++++++++--
 2 files changed, 43 insertions(+), 29 deletions(-)

 This patch adds -A option to the staprun for supporting "attach
 to the module" feature. If you run the staprun with -A option,
 it opens the control interface of the specified module, and sends
 STP_TRANSPORT_INFO and STP_START for getting transport information
 and starting data transporting.

Index: systemtap/runtime/stpd/librelay.c
===================================================================
--- systemtap.orig/runtime/stpd/librelay.c
+++ systemtap/runtime/stpd/librelay.c
@@ -50,6 +50,7 @@ static struct params
 	unsigned subbuf_size;
 	unsigned n_subbufs;
 	int merge;
+	int pid;
 	char relay_filebase[256];
 } params;
 
@@ -81,7 +82,7 @@ static pthread_t reader[NR_CPUS];
 int control_channel;
 
 /* flags */
-extern int print_only, quiet, verbose, launch_only;
+extern int print_only, quiet, verbose, launch_only, attach_mod;
 extern unsigned int buffer_size;
 extern char *modname;
 extern char *modpath;
@@ -394,11 +395,11 @@ int init_relayfs(void)
 	}
 
 	if (statfs("/mnt/relay", &st) == 0 && (int) st.f_type == (int) RELAYFS_MAGIC)
- 		sprintf(params.relay_filebase, "/mnt/relay/systemtap/%d/cpu", getpid());
+ 		sprintf(params.relay_filebase, "/mnt/relay/systemtap/%d/cpu", params.pid);
  	else if (statfs("/sys/kernel/debug", &st) == 0 && (int) st.f_type == (int) DEBUGFS_MAGIC)
- 		sprintf(params.relay_filebase, "/sys/kernel/debug/systemtap/%d/cpu", getpid());
+ 		sprintf(params.relay_filebase, "/sys/kernel/debug/systemtap/%d/cpu", params.pid);
  	else
- 		sprintf(params.relay_filebase, "/debug/systemtap/%d/cpu", getpid());
+ 		sprintf(params.relay_filebase, "/debug/systemtap/%d/cpu", params.pid);
 	
 	for (i = 0; i < ncpus; i++) {
 		if (open_relayfs_files(i, params.relay_filebase) < 0) {
@@ -494,29 +495,31 @@ int init_stp(int print_summary)
 	ncpus = sysconf(_SC_NPROCESSORS_ONLN);
 	print_totals = print_summary;
 
-	/* insert module */
-	sprintf(buf, "_stp_pid=%d", (int)getpid());
-        modoptions[0] = "insmod";
-        modoptions[1] = modpath;
-        modoptions[2] = buf;
-        /* modoptions[3...N] set by command line parser. */
-
-	if ((pid = fork()) < 0) {
-		perror ("fork");
-		exit(-1);
-	} else if (pid == 0) {
-		if (execvp("/sbin/insmod",  modoptions) < 0)
+	if (!attach_mod) {
+		/* insert module */
+		sprintf(buf, "_stp_pid=%d", (int)getpid());
+		modoptions[0] = "insmod";
+		modoptions[1] = modpath;
+		modoptions[2] = buf;
+		/* modoptions[3...N] set by command line parser. */
+
+		if ((pid = fork()) < 0) {
+			perror ("fork");
+			exit(-1);
+		} else if (pid == 0) {
+			if (execvp("/sbin/insmod",  modoptions) < 0)
 			_exit(-1);
+		}
+		if (waitpid(pid, &rstatus, 0) < 0) {
+			perror("waitpid");
+			exit(-1);
+		}
+		if (WIFEXITED(rstatus) && WEXITSTATUS(rstatus)) {
+			fprintf(stderr, "ERROR, couldn't insmod probe module %s\n", modpath);
+			return -1;
+		}
 	}
-	if (waitpid(pid, &rstatus, 0) < 0) {
-		perror("waitpid");
-		exit(-1);
-	}
-	if (WIFEXITED(rstatus) && WEXITSTATUS(rstatus)) {
-		fprintf(stderr, "ERROR, couldn't insmod probe module %s\n", modpath);
-		return -1;
-	}
-	
+
 	sprintf (proc_filebase, "/proc/systemtap/%s", modname);
 	char *ptr = index(proc_filebase,'.');
 	if (ptr)
@@ -550,7 +553,7 @@ int init_stp(int print_summary)
 
  do_rmmod:
 	snprintf(buf, sizeof(buf), "/sbin/rmmod -w %s", modname);
-	if (system(buf))
+	if (!attach_mod && system(buf))
 		fprintf(stderr, "ERROR: couldn't rmmod probe module %s.\n", modname);
 	return -1;
 }
@@ -749,7 +752,7 @@ int stp_main_loop(void)
 		type = *(int *)recvbuf;
 		data = (void *)(recvbuf + sizeof(int));
 
-		if (!transport_mode && type != STP_TRANSPORT_INFO && type != STP_EXIT) {
+		if (!transport_mode && type != STP_TRANSPORT_INFO && type != STP_EXIT && !attach_mod) {
 			fprintf(stderr, "WARNING: invalid stp command: no transport\n");
 			continue;
 		}
@@ -796,6 +799,7 @@ int stp_main_loop(void)
 			params.subbuf_size = info->subbuf_size;
 			params.n_subbufs = info->n_subbufs;
 			params.merge = info->merge;
+			params.pid = info->pid;
 			if (launch_only) goto out;
 #ifdef DEBUG
 			if (transport_mode == STP_TRANSPORT_RELAYFS) {
Index: systemtap/runtime/stpd/stpd.c
===================================================================
--- systemtap.orig/runtime/stpd/stpd.c
+++ systemtap/runtime/stpd/stpd.c
@@ -33,6 +33,7 @@ int verbose = 0;
 int target_pid = 0;
 int driver_pid = 0;
 int launch_only = 0;
+int attach_mod = 0;
 unsigned int buffer_size = 0;
 char *modname = NULL;
 char *modpath = NULL;
@@ -46,11 +47,12 @@ gid_t cmd_gid;
 
 static void usage(char *prog)
 {
-	fprintf(stderr, "\n%s [-m] [-p] [-q] [-r] [-L] [-c cmd ] [-t pid]\n"
+	fprintf(stderr, "\n%s [-m] [-p] [-q] [-r] [-L] [-A] [-c cmd ] [-t pid]\n"
                 "\t[-b bufsize] [-o FILE] kmod-name [kmod-options]\n", prog);
 	fprintf(stderr, "-p  Print only.  Don't log to files.\n");
 	fprintf(stderr, "-q  Quiet. Don't display trace to stdout.\n");
 	fprintf(stderr, "-L  Launch only. Just after launching the module, detach from it.\n");
+	fprintf(stderr, "-A  Attach to the launched module. \n");
 	fprintf(stderr, "-c cmd.  Command \'cmd\' will be run and staprun will exit when it does.\n");
 	fprintf(stderr, "   _stp_target will contain the pid for the command.\n");
 	fprintf(stderr, "-t pid.  Sets _stp_target to pid.\n");
@@ -68,7 +70,7 @@ int main(int argc, char **argv)
 {
 	int c;
 	
-	while ((c = getopt(argc, argv, "mpqrLb:n:t:d:c:vo:u:")) != EOF)
+	while ((c = getopt(argc, argv, "mpqrLAb:n:t:d:c:vo:u:")) != EOF)
 	{
 		switch (c) {
 		case 'm':
@@ -116,6 +118,9 @@ int main(int argc, char **argv)
 		case 'L':
 			launch_only = 1;
 			break;
+		case 'A':
+			attach_mod = 1;
+			break;
 		default:
 			usage(argv[0]);
 		}
@@ -152,6 +157,11 @@ int main(int argc, char **argv)
 		usage(argv[0]);
 	}
 
+	if (attach_mod && launch_only) {
+		fprintf (stderr, "Cannot do \"-A\" and \"-L\" both.\n");
+		usage(argv[0]);
+	}
+
 	if (print_only && quiet) {
 		fprintf (stderr, "Cannot do \"-p\" and \"-q\" both.\n");
 		usage(argv[0]);
---
 runtime/stpd/librelay.c       |   20 +++++++++++++++-----
 runtime/stpd/stpd.c           |   14 ++++++++++++--
 runtime/transport/transport.c |    2 ++
 3 files changed, 29 insertions(+), 7 deletions(-)

 This patch adds -D option to the staprun for supporting "Detach 
 from module" feature. This option can be used with -A (attach)
 option. If you run the staprun with both -D and -A options, it
 exits after reading all data in the data message queue.

Index: systemtap/runtime/stpd/librelay.c
===================================================================
--- systemtap.orig/runtime/stpd/librelay.c
+++ systemtap/runtime/stpd/librelay.c
@@ -32,7 +32,7 @@
 #include <linux/limits.h>
 #include <sys/wait.h>
 #include <sys/statfs.h>
-
+#include <limits.h>
 
 /* stp_check script */
 #ifdef PKGLIBDIR
@@ -71,6 +71,7 @@ static int transport_mode;
 static int ncpus;
 static int print_totals;
 static int exiting;
+static int nr_buffers = INT_MAX;
 
 /* per-cpu data */
 static int relay_file[NR_CPUS];
@@ -82,7 +83,7 @@ static pthread_t reader[NR_CPUS];
 int control_channel;
 
 /* flags */
-extern int print_only, quiet, verbose, launch_only, attach_mod;
+extern int print_only, quiet, verbose, launch_only, attach_mod, detach_soon;
 extern unsigned int buffer_size;
 extern char *modname;
 extern char *modpath;
@@ -332,7 +333,7 @@ static void *reader_thread(void *data)
 	pollfd.events = POLLIN;
 
 	do {
-		rc = poll(&pollfd, 1, -1);
+		rc = poll(&pollfd, 1, detach_soon ? 0 : -1);
 		if (rc < 0) {
 			if (errno != EINTR) {
 				fprintf(stderr, "ERROR: poll error: %s\n",
@@ -356,7 +357,7 @@ static void *reader_thread(void *data)
 			if (write (proc_file[cpu], &consumed_info, sizeof(struct _stp_consumed_info)) < 0)
 				fprintf(stderr,"WARNING: writing consumed info failed.\n");
 		}
-		if (status[cpu].info.flushing)
+		if (status[cpu].info.flushing || detach_soon)
 			pthread_exit(NULL);
 	} while (1);
 }
@@ -543,7 +544,7 @@ int init_stp(int print_summary)
 	ti.n_subbufs = 0;
 	ti.target = target_pid;
 	if (send_request(STP_TRANSPORT_INFO, &ti, sizeof(ti)) < 0) {
-		fprintf(stderr, "staprun failed because TRANSPORT_INFO returned an error.\n");
+		fprintf(stderr, "staprun failed because TRANSPORT_INFO returned an error. (%s)\n", strerror(errno));
 		if (target_cmd)
 			kill (target_pid, SIGKILL);
 		close(control_channel);
@@ -728,6 +729,7 @@ int stp_main_loop(void)
 	void *data;
 	int type;
 	FILE *ofp = stdout;
+	int count = 0;
 
 	setvbuf(ofp, (char *)NULL, _IOLBF, 0);
 
@@ -739,15 +741,22 @@ int stp_main_loop(void)
         if (driver_pid)
           driver_poll(0); // And by the way, I'm also the signal handler.
 
+	if (detach_soon)
+		fcntl(control_channel, F_SETFL, O_NONBLOCK|O_RDWR);
+
 	dbug("in main loop\n");
 
 	while (1) { /* handle messages from control channel */
 		nb = read(control_channel, recvbuf, sizeof(recvbuf));
 		if (nb <= 0) {
+			if (errno == EAGAIN && detach_soon)
+				cleanup_and_exit(1);
 			perror("recv");
 			fprintf(stderr, "WARNING: unexpected EOF. nb=%d\n", nb);
 			continue;
 		}
+		if (++count >= nr_buffers && detach_soon)
+			cleanup_and_exit(1);
 
 		type = *(int *)recvbuf;
 		data = (void *)(recvbuf + sizeof(int));
@@ -795,6 +804,7 @@ int stp_main_loop(void)
 		{
 			struct _stp_transport_info *info = (struct _stp_transport_info *)data;
 			struct _stp_transport_start ts;
+			nr_buffers = info->buf_size;
 			transport_mode = info->transport_mode;
 			params.subbuf_size = info->subbuf_size;
 			params.n_subbufs = info->n_subbufs;
Index: systemtap/runtime/stpd/stpd.c
===================================================================
--- systemtap.orig/runtime/stpd/stpd.c
+++ systemtap/runtime/stpd/stpd.c
@@ -34,6 +34,7 @@ int target_pid = 0;
 int driver_pid = 0;
 int launch_only = 0;
 int attach_mod = 0;
+int detach_soon = 0;
 unsigned int buffer_size = 0;
 char *modname = NULL;
 char *modpath = NULL;
@@ -47,12 +48,13 @@ gid_t cmd_gid;
 
 static void usage(char *prog)
 {
-	fprintf(stderr, "\n%s [-m] [-p] [-q] [-r] [-L] [-A] [-c cmd ] [-t pid]\n"
+	fprintf(stderr, "\n%s [-m] [-p] [-q] [-r] [-L] [-A] [-D] [-c cmd ] [-t pid]\n"
                 "\t[-b bufsize] [-o FILE] kmod-name [kmod-options]\n", prog);
 	fprintf(stderr, "-p  Print only.  Don't log to files.\n");
 	fprintf(stderr, "-q  Quiet. Don't display trace to stdout.\n");
 	fprintf(stderr, "-L  Launch only. Just after launching the module, detach from it.\n");
 	fprintf(stderr, "-A  Attach to the launched module. \n");
+	fprintf(stderr, "-D  Detach from the module immediately. Use with -A.\n");
 	fprintf(stderr, "-c cmd.  Command \'cmd\' will be run and staprun will exit when it does.\n");
 	fprintf(stderr, "   _stp_target will contain the pid for the command.\n");
 	fprintf(stderr, "-t pid.  Sets _stp_target to pid.\n");
@@ -70,7 +72,7 @@ int main(int argc, char **argv)
 {
 	int c;
 	
-	while ((c = getopt(argc, argv, "mpqrLAb:n:t:d:c:vo:u:")) != EOF)
+	while ((c = getopt(argc, argv, "mpqrLADb:n:t:d:c:vo:u:")) != EOF)
 	{
 		switch (c) {
 		case 'm':
@@ -121,6 +123,9 @@ int main(int argc, char **argv)
 		case 'A':
 			attach_mod = 1;
 			break;
+		case 'D':
+			detach_soon = 1;
+			break;
 		default:
 			usage(argv[0]);
 		}
@@ -167,6 +172,11 @@ int main(int argc, char **argv)
 		usage(argv[0]);
 	}
 
+	if (detach_soon && !attach_mod) {
+		fprintf (stderr, "Cannot do \"-D\" without \"-A\".\n");
+		usage(argv[0]);
+	}
+
 	if (username) {
 		struct passwd *pw = getpwnam(username);
 		if (!pw) {
Index: systemtap/runtime/transport/transport.c
===================================================================
--- systemtap.orig/runtime/transport/transport.c
+++ systemtap/runtime/transport/transport.c
@@ -314,6 +314,8 @@ int _stp_transport_open(struct _stp_tran
 			_stp_set_buffers(info->buf_size * 1024 * 1024 / STP_BUFFER_SIZE);
 	}
 
+	info->buf_size = _stp_current_buffers;
+
 	/* send reply */
 	return _stp_transport_send (STP_TRANSPORT_INFO, info, sizeof(*info));
 }
---
 runtime/stpd/librelay.c |    6 +++++-
 runtime/stpd/stpd.c     |    9 +++++++--
 2 files changed, 12 insertions(+), 3 deletions(-)

 This patch adds -L option to the staprun for supporting "launch
 module" feature. If you run the staprun with -L option, it loads
 a compiled module, initializes the transport, sends symbols and
 module information, and exit. 

Index: systemtap/runtime/stpd/librelay.c
===================================================================
--- systemtap.orig/runtime/stpd/librelay.c
+++ systemtap/runtime/stpd/librelay.c
@@ -81,7 +81,7 @@ static pthread_t reader[NR_CPUS];
 int control_channel;
 
 /* flags */
-extern int print_only, quiet, verbose;
+extern int print_only, quiet, verbose, launch_only;
 extern unsigned int buffer_size;
 extern char *modname;
 extern char *modpath;
@@ -778,6 +778,8 @@ int stp_main_loop(void)
 				cleanup_and_exit(0);
 			} else if (target_cmd)
 				kill (target_pid, SIGUSR1);
+			if (launch_only)
+				cleanup_and_exit(1);
 			break;
 		}
 		case STP_SYSTEM:
@@ -794,6 +796,7 @@ int stp_main_loop(void)
 			params.subbuf_size = info->subbuf_size;
 			params.n_subbufs = info->n_subbufs;
 			params.merge = info->merge;
+			if (launch_only) goto out;
 #ifdef DEBUG
 			if (transport_mode == STP_TRANSPORT_RELAYFS) {
 				fprintf(stderr,"TRANSPORT_INFO recvd: RELAYFS %d bufs of %d bytes.\n", 
@@ -819,6 +822,7 @@ int stp_main_loop(void)
 					cleanup_and_exit(0);
 				}
 			}
+ out:
 			ts.pid = getpid();
 			send_request(STP_START, &ts, sizeof(ts));
 			break;
Index: systemtap/runtime/stpd/stpd.c
===================================================================
--- systemtap.orig/runtime/stpd/stpd.c
+++ systemtap/runtime/stpd/stpd.c
@@ -32,6 +32,7 @@ int quiet = 0;
 int verbose = 0;
 int target_pid = 0;
 int driver_pid = 0;
+int launch_only = 0;
 unsigned int buffer_size = 0;
 char *modname = NULL;
 char *modpath = NULL;
@@ -45,10 +46,11 @@ gid_t cmd_gid;
 
 static void usage(char *prog)
 {
-	fprintf(stderr, "\n%s [-m] [-p] [-q] [-r] [-c cmd ] [-t pid]\n"
+	fprintf(stderr, "\n%s [-m] [-p] [-q] [-r] [-L] [-c cmd ] [-t pid]\n"
                 "\t[-b bufsize] [-o FILE] kmod-name [kmod-options]\n", prog);
 	fprintf(stderr, "-p  Print only.  Don't log to files.\n");
 	fprintf(stderr, "-q  Quiet. Don't display trace to stdout.\n");
+	fprintf(stderr, "-L  Launch only. Just after launching the module, detach from it.\n");
 	fprintf(stderr, "-c cmd.  Command \'cmd\' will be run and staprun will exit when it does.\n");
 	fprintf(stderr, "   _stp_target will contain the pid for the command.\n");
 	fprintf(stderr, "-t pid.  Sets _stp_target to pid.\n");
@@ -66,7 +68,7 @@ int main(int argc, char **argv)
 {
 	int c;
 	
-	while ((c = getopt(argc, argv, "mpqrb:n:t:d:c:vo:u:")) != EOF)
+	while ((c = getopt(argc, argv, "mpqrLb:n:t:d:c:vo:u:")) != EOF)
 	{
 		switch (c) {
 		case 'm':
@@ -111,6 +113,9 @@ int main(int argc, char **argv)
 		case 'u':
 			username = optarg;
 			break;
+		case 'L':
+			launch_only = 1;
+			break;
 		default:
 			usage(argv[0]);
 		}
---
 runtime/transport/transport.c      |   37 ++++++++++++++++++++++---------------
 runtime/transport/transport_msgs.h |    1 +
 2 files changed, 23 insertions(+), 15 deletions(-)

 When the staprun attaches to launched module, it will issue 
 STP_TRANSPORT_INFO and STP_START for getting transport information
 and starting data transporting.
 This patch avoids multiple initialization of the kernel module if
 the staprun sends STP_TRANSPORT_INFO and STP_START again.
 Also, this patch returns pid to tell the interface name of relayfs
 provided by this module.

Index: systemtap/runtime/transport/transport.c
===================================================================
--- systemtap.orig/runtime/transport/transport.c
+++ systemtap/runtime/transport/transport.c
@@ -127,18 +127,23 @@ void _stp_handle_start (struct _stp_tran
 		_stp_transport_send(STP_MODULE, &mod, sizeof(struct _stp_module));
 		return;
 	}
+#endif
 
-	if (register_module_notifier(&_stp_module_load_nb))
-		printk("Systemtap error: failed to load module notifier\n");
+	if (!atomic_read(&_stp_start_finished)) {
+#ifdef CONFIG_MODULES
+		if (register_module_notifier(&_stp_module_load_nb))
+			printk("Systemtap error: failed to load module notifier\n");
 #endif
 
-	/* note: st->pid is actually the return code for the reply packet */
-	st->pid = probe_start();
-	atomic_set(&_stp_start_finished,1);
+		/* note: st->pid is actually the return code for the reply packet */
+		st->pid = probe_start();
+		atomic_set(&_stp_start_finished,1);
 
-	/* if probe_start() failed, suppress calling probe_exit() */
-	if (st->pid < 0)
-		_stp_exit_called = 1;
+		/* if probe_start() failed, suppress calling probe_exit() */
+		if (st->pid < 0)
+			_stp_exit_called = 1;
+	} else
+		st->pid = 0;
 
 	_stp_transport_send(STP_START, st, sizeof(*st));
 }
@@ -279,7 +284,7 @@ int _stp_transport_open(struct _stp_tran
 
 #ifdef STP_RELAYFS
 	if (_stp_transport_mode == STP_TRANSPORT_RELAYFS) {
-		if (info->buf_size) {
+		if (info->buf_size && !_stp_chan) {
 			unsigned size = info->buf_size * 1024 * 1024;
 			subbuf_size = ((size >> 2) + 1) * 65536;
 			n_subbufs = size / subbuf_size;
@@ -290,13 +295,15 @@ int _stp_transport_open(struct _stp_tran
 #ifdef STP_RELAYFS_MERGE
 		info->merge = 1;
 #endif
+		if (!_stp_chan) {
+			_stp_chan = _stp_relayfs_open(n_subbufs, subbuf_size, _stp_pid, &_stp_dir);
 
-		_stp_chan = _stp_relayfs_open(n_subbufs, subbuf_size, _stp_pid, &_stp_dir);
-
-		if (!_stp_chan)
-			return -ENOMEM;
-		kbug ("stp_transport_open: %u Mb buffers, subbuf_size=%u, n_subbufs=%u\n",
-		      info->buf_size, subbuf_size, n_subbufs);
+			if (!_stp_chan)
+				return -ENOMEM;
+			kbug ("stp_transport_open: %u Mb buffers, subbuf_size=%u, n_subbufs=%u\n",
+			      info->buf_size, subbuf_size, n_subbufs);
+		}
+		info->pid = _stp_pid;
 	} else 
 #endif
 	{
Index: systemtap/runtime/transport/transport_msgs.h
===================================================================
--- systemtap.orig/runtime/transport/transport_msgs.h
+++ systemtap/runtime/transport/transport_msgs.h
@@ -43,6 +43,7 @@ struct _stp_transport_info
 	int32_t transport_mode;
 	int32_t merge;		// merge relayfs output?
 	int32_t target;		// target pid
+	int32_t pid;		// launcher pid
 #if 0
 	char cmd[256];		// cmd to process data
 #endif
---
 runtime/transport/procfs.c    |  101 +++++++++++++++++++++++++++++++++++-------
 runtime/transport/relayfs.c   |    2 
 runtime/transport/transport.c |    5 +-
 3 files changed, 90 insertions(+), 18 deletions(-)

 This patch introduces the following features.
 - use subbufs of relayfs cyclically.
 - record the owner process who attaches to this module.
   (If a process has already attached, other processes can't attach.)
 - separate the session message queue from the data message queue.
   (only STP_REALTIME_DATA and STP_SYSTEM are sent to the data message 
   queue)
 - the session queue is cleared when the owner process disconnects.

Index: systemtap/runtime/transport/relayfs.c
===================================================================
--- systemtap.orig/runtime/transport/relayfs.c
+++ systemtap/runtime/transport/relayfs.c
@@ -31,7 +31,7 @@ static int _stp_subbuf_start(struct rcha
 			     size_t prev_padding)
 {
 	if (relay_buf_full(buf))
-		return 0;
+		relay_subbufs_consumed(buf->chan, buf->cpu, 1);
 
 	if (prev_subbuf)
 		*((unsigned *)prev_subbuf) = prev_padding;
Index: systemtap/runtime/transport/procfs.c
===================================================================
--- systemtap.orig/runtime/transport/procfs.c
+++ systemtap/runtime/transport/procfs.c
@@ -13,10 +13,13 @@
 static int _stp_current_buffers = STP_DEFAULT_BUFFERS;
 
 static struct list_head _stp_ready_q;
+static struct list_head _stp_session_q;
 static struct list_head _stp_pool_q;
 spinlock_t _stp_pool_lock = SPIN_LOCK_UNLOCKED;
 spinlock_t _stp_ready_lock = SPIN_LOCK_UNLOCKED;
 
+static pid_t _stp_control_owner = 0;
+
 #ifdef STP_RELAYFS
 extern int _stp_relay_flushing;
 /* handle the per-cpu subbuf info read for relayfs */
@@ -139,7 +142,7 @@ struct _stp_buffer {
 
 static DECLARE_WAIT_QUEUE_HEAD(_stp_proc_wq);
 
-static int _stp_write (int type, void *data, int len)
+static int _stp_write_queue (int type, void *data, int len, struct list_head *queue)
 {
 	struct _stp_buffer *bptr;
 	unsigned long flags;
@@ -154,8 +157,8 @@ static int _stp_write (int type, void *d
 	if (unlikely (numtrylock >= MAXTRYLOCK))
 		return 0;
 
-	if (!list_empty(&_stp_ready_q)) {
-		bptr = (struct _stp_buffer *)_stp_ready_q.prev;
+	if (!list_empty(queue)) {
+		bptr = (struct _stp_buffer *)queue->prev;
 		if (bptr->len + len <= STP_BUFFER_SIZE 
 		    && type == STP_REALTIME_DATA 
 		    && bptr->type == STP_REALTIME_DATA) {
@@ -174,32 +177,55 @@ static int _stp_write (int type, void *d
 	if (unlikely (numtrylock >= MAXTRYLOCK))
 		return 0;
 
-	if (list_empty(&_stp_pool_q)) {
+	if (!list_empty(&_stp_pool_q)) {
+		/* get the next buffer from the pool */
+		bptr = (struct _stp_buffer *)_stp_pool_q.next;
+		list_del_init(&bptr->list);
 		spin_unlock_irqrestore(&_stp_pool_lock, flags);
-		return -1;
+	} else {
+		spin_unlock_irqrestore(&_stp_pool_lock, flags);
+		/* reclaim a message from the ready queue */
+		numtrylock = 0;
+		while (!spin_trylock_irqsave (&_stp_ready_lock, flags) && (++numtrylock < MAXTRYLOCK))
+			ndelay (TRYLOCKDELAY);
+		if (unlikely (numtrylock >= MAXTRYLOCK))
+			return 0;
+		if (!list_empty(&_stp_ready_q)) {
+			bptr = (struct _stp_buffer *)_stp_ready_q.next;
+			list_del_init(&bptr->list);
+			spin_unlock_irqrestore(&_stp_ready_lock, flags);
+		} else {
+			spin_unlock_irqrestore(&_stp_ready_lock, flags);
+			return -1;
+		}
 	}
 
-	/* get the next buffer from the pool */
-	bptr = (struct _stp_buffer *)_stp_pool_q.next;
-	list_del_init(&bptr->list);
-	spin_unlock_irqrestore(&_stp_pool_lock, flags);
-
 	bptr->type = type;
 	memcpy (bptr->buf, data, len);
 	bptr->len = len;
 	
-	/* put it on the pool of ready buffers */
+	/* put it on the specified pool of ready buffers */
 	numtrylock = 0;
 	while (!spin_trylock_irqsave (&_stp_ready_lock, flags) && (++numtrylock < MAXTRYLOCK)) 
 		ndelay (TRYLOCKDELAY);
 	if (unlikely (numtrylock >= MAXTRYLOCK))
 		return 0;
-	list_add_tail(&bptr->list, &_stp_ready_q);
+	list_add_tail(&bptr->list, queue);
 	spin_unlock_irqrestore(&_stp_ready_lock, flags);
 
 	return len;
 }
 
+static int _stp_write (int type, void *data, int len)
+{
+	return _stp_write_queue(type, data, len, &_stp_ready_q);
+}
+
+static int _stp_write_session (int type, void *data, int len)
+{
+	return _stp_write_queue(type, data, len, &_stp_session_q);
+}
+
 static ssize_t
 _stp_proc_read_cmd (struct file *file, char __user *buf, size_t count, loff_t *ppos)
 {
@@ -211,17 +237,21 @@ _stp_proc_read_cmd (struct file *file, c
 
 	/* wait for nonempty ready queue */
 	spin_lock_irqsave(&_stp_ready_lock, flags);
-	while (list_empty(&_stp_ready_q)) {
+	while (list_empty(&_stp_ready_q) && list_empty(&_stp_session_q)) {
 		spin_unlock_irqrestore(&_stp_ready_lock, flags);
 		if (file->f_flags & O_NONBLOCK)
 			return -EAGAIN;
-		if (wait_event_interruptible(_stp_proc_wq, !list_empty(&_stp_ready_q)))
+		if (wait_event_interruptible(_stp_proc_wq, (!list_empty(&_stp_ready_q) || !list_empty(&_stp_session_q))))
 			return -ERESTARTSYS;
 		spin_lock_irqsave(&_stp_ready_lock, flags);
 	}
   
-	/* get the next buffer off the ready list */
-	bptr = (struct _stp_buffer *)_stp_ready_q.next;
+	/* get the next buffer off the ready or the session list */
+	if (!list_empty(&_stp_session_q)) {
+		bptr = (struct _stp_buffer *)_stp_session_q.next;
+	} else {
+		bptr = (struct _stp_buffer *)_stp_ready_q.next;
+	}
 	list_del_init(&bptr->list);
 	spin_unlock_irqrestore(&_stp_ready_lock, flags);
 
@@ -243,11 +273,49 @@ _stp_proc_read_cmd (struct file *file, c
 	return len;
 }
 
+static int _stp_proc_open_cmd (struct inode * inode, struct file * file)
+{
+	if (_stp_control_owner != 0)
+		return -EBUSY;
+	_stp_control_owner = current->pid;
+	return 0;
+}
+
+int _stp_proc_close_cmd (struct inode * inode, struct file * file)
+{
+	unsigned long flags;
+	struct _stp_buffer *bptr;
+
+	/* clear messages in the session queue */
+	do {
+		spin_lock_irqsave(&_stp_ready_lock, flags);
+		if (list_empty(&_stp_session_q)) {
+			spin_unlock_irqrestore(&_stp_ready_lock, flags);
+			break;
+		}
+
+		/* get the next buffer off the ready list */
+		bptr = (struct _stp_buffer *)_stp_session_q.next;
+		list_del_init(&bptr->list);
+		spin_unlock_irqrestore(&_stp_ready_lock, flags);
+
+		/* put it on the pool of free buffers */
+		spin_lock_irqsave(&_stp_pool_lock, flags);
+		list_add_tail(&bptr->list, &_stp_pool_q);
+		spin_unlock_irqrestore(&_stp_pool_lock, flags);
+	} while (1);
+
+	_stp_control_owner = 0;
+
+	return 0;
+}
 
 static struct file_operations _stp_proc_fops_cmd = {
 	.owner = THIS_MODULE,
 	.read = _stp_proc_read_cmd,
 	.write = _stp_proc_write_cmd,
+	.open = _stp_proc_open_cmd,
+	.release = _stp_proc_close_cmd,
 //	.poll = _stp_proc_poll_cmd
 };
 
@@ -310,6 +378,7 @@ static int _stp_register_procfs (void)
 	struct list_head *p, *tmp;
 
 	INIT_LIST_HEAD(&_stp_ready_q);
+	INIT_LIST_HEAD(&_stp_session_q);
 	INIT_LIST_HEAD(&_stp_pool_q);
 
 	/* allocate buffers */
Index: systemtap/runtime/transport/transport.c
===================================================================
--- systemtap.orig/runtime/transport/transport.c
+++ systemtap/runtime/transport/transport.c
@@ -61,7 +61,10 @@ int _stp_transport_open(struct _stp_tran
 static int _stp_transport_send (int type, void *data, int len)
 {
 	int err, trylimit = 50;
-	while ((err = _stp_write(type, data, len)) < 0 && trylimit--)
+	struct list_head *q = &_stp_session_q;
+	if (type == STP_REALTIME_DATA || type == STP_SYSTEM)
+		q = &_stp_ready_q;
+	while ((err = _stp_write_queue(type, data, len, q)) < 0 && trylimit--)
 		msleep (5);
 	return err;
 }
Comment 4 Frank Ch. Eigler 2007-01-23 14:57:36 UTC
On a first glance, this patch looks good enough to start from.

It needs some automated test cases to stress it (with multiple
modules; probes that abort suddenly; probes that produce "too
much" I/O if possible; other scenarios that you may imagine).

I'm not sure, but it may be useful to add something like -A/-L/-D
to stap also, for passing through to staprun.

Martin, please assess and if suitable, oversee its inclusion.
Comment 5 Martin Hunt 2007-03-15 15:47:11 UTC
What is the purpose of _stp_write_session() and the session queue?
Comment 6 Martin Hunt 2007-03-15 20:27:51 UTC
Why would you want to detach immediately?  What do you think about having
staprun catch SIGQUIT (Ctrl-4 or Ctrl-\) and detach?

That would leave "-L" launch and exit, and "-A" attach.

Recent changes have communications channels using the pid of staprun. When
attaching, you would have to use the original pid.

> stap -L foo.ko
Loaded module foo.ko and detached.
Use "staprun -A 5213" to attach.

If you really prefer module names, I can make some changes so that works too.



Comment 7 Masami Hiramatsu 2007-03-19 10:45:51 UTC
(In reply to comment #5)
> What is the purpose of _stp_write_session() and the session queue?

This writes messages in the _stp_session_q queue which will be read before
normal message queue(_stp_ready_q) and be cleared when the owner process closes
the message channel (means "detach"). 

When staprun attaches to the module, there are many messages in the message
queue already. So, staprun will get an unexpected response when it send
STP_TRANSPORT_INFO to start the session. Thus, I introduced this new queue to
attach next process safely.
Comment 8 Masami Hiramatsu 2007-03-19 10:49:20 UTC
(In reply to comment #6)
> Why would you want to detach immediately?  What do you think about having
> staprun catch SIGQUIT (Ctrl-4 or Ctrl-\) and detach?

Would you mean use SIGQUIT instead of -D option?
If so, it is very useful for me. Actually, I just didn't know how to issue the
SIGQUIT signal.

> That would leave "-L" launch and exit, and "-A" attach.

Thanks.

> Recent changes have communications channels using the pid of staprun. When
> attaching, you would have to use the original pid.
> 
> > stap -L foo.ko
> Loaded module foo.ko and detached.
> Use "staprun -A 5213" to attach.
> 
> If you really prefer module names, I can make some changes so that works too.

I prefer to use module names, because we cannot know the relationship between
modules and pids after the modules were loaded.
Comment 9 Martin Hunt 2007-03-26 17:42:53 UTC
I've checked in some changes inspired by your patch.  Please try the latest CVS
code out and let me know if it does what you need and if you have further
suggestions.

Usage Sample:
1. compile a script.
 $ stap -p4 -k -m flightrec flightrec.stp (you can add '-b' if want)

2. launch the script.
 $ sudo staprun -L flightrec.ko
Disconnecting from systemtap module.
To reconnect, type "staprun -A flightrec"

3. attach
 $ sudo staprun -A flightrec

(HIT Ctl-C to exit (which removes module), or Ctl-\ (or Ctl-4) to detach again.
You can also send SIGQUIT to the process to detach)

You can always use SIGQUIT (Ctl-\ or Ctl-4).
$ stap foo.stp
...
(Ctl-\)
Disconnecting from systemtap module.
To reconnect, type "staprun -A stap_a14f7e0a5f2fc7144b6cc6d852d9f08e_6469"

ONE IMPORTANT NOTE!

When a new staprun attaches, it has no way of knowing how much the previous
staprun read from the current relayfs subbuffer.  So it will read from the
beginning.  So if you attach, detach, and attach again quickly enough you will
see the same data repeated again.


Comment 10 Masami Hiramatsu 2007-04-04 11:37:46 UTC
(In reply to comment #9)
> I've checked in some changes inspired by your patch.  Please try the latest CVS
> code out and let me know if it does what you need and if you have further
> suggestions.

Thank you! It works fine and is very good to me.
Comment 11 Masami Hiramatsu 2007-05-30 11:26:57 UTC
Hi Martin,

I found that your flight recorder mode seems to prohibit 
overwriting sub-buffers. In that case, we can't catch latest 
events in the kernel while we detach it from staprun.

I'd like to use this feature for custom kernel tracing script which
records events on memory until the kernel crashes. After that, I'll
dump the kernel image by the kdump, and extract events logs from
that image.

For this purpose, I need to record the latest events. Previous events 
are not so important. Thus, I need to overwrite the oldest sub-buffer
in the relay buffers.
Comment 12 Masami Hiramatsu 2007-05-30 14:46:39 UTC
(In reply to comment #11)
> I found that your flight recorder mode seems to prohibit 
> overwriting sub-buffers. In that case, we can't catch latest 
> events in the kernel while we detach it from staprun.

I think the global relay buffer hard to support this "overwrite" mode.
Thus, I suggest that only "bulk" mode modules overwrite sub-buffers and
staprun uses mmap() if the attached module runs in the "bulk" mode.

Or, if someone want to use read() in the "bulk" mode, I think we can
introduce new "bulk-overwritten" mode instead of changing "bulk" mode.

What would you think about these ideas?
Comment 13 Masami Hiramatsu 2007-06-05 15:40:10 UTC
Created attachment 1879 [details]
support overwrite mode and dump buffer option

Hi,

Here is a patch for support the overwrite mode and dumping buffers.

Since read() systemcall always starts reading from the fixed position,
we can not access whole contents of all sub-buffers.
To solve this issue, I introduced the dumping buffers option.
If you run staprun with "-D" option instead of "-A" option, it will 
attach to a module, dump all traced-data from relay-buffers, and 
detach from it. For this purpose, I used mmap() systemcall instead of
read().
Comment 14 Martin Hunt 2007-06-07 19:50:00 UTC
Thank you for the patch.  I have reviewed it and in general I like it, but I
have some questions.

What do you think about having a flag for stap and staprun that enables
overwrite mode instead of requiring the user to set the module param? Maybe this
would be a flag for "flight recorder mode" that might do more that just set
overwrite on.

I don't see the need for any of the changes to relay.c.  _stp_main_loop() is
already calling cleanup_and_exit() and that forces an immediate read of all 
data in the buffer(s) and exit. I tried it and there is a small bug with a hang
at cleanup, but it is easy to fix. Why do you need a special read with mmap()?

Stream (non-bulk) mode seems to work just fine with this too.

I am planning to make a final change to the transport so there is no difference
in the kernel between bulk and stream mode. staprun will take a  bulk-mode flag.
WIth bulkmode on, it will work as it does now.  With bulk mode off (stream mode)
it wil poll for data from relayfs percpu channels, put it in order, and display
it (or send to a single file) in realtime. 

Will this help you flight recorder work or make it more difficult? It will only
matter if you used stream mode for anything and depended on it having a single
global buffer.





Comment 15 Masami Hiramatsu 2007-06-08 13:29:09 UTC
(In reply to comment #14)
> Thank you for the patch.  I have reviewed it and in general I like it, but I
> have some questions.

Thank you.

> What do you think about having a flag for stap and staprun that enables
> overwrite mode instead of requiring the user to set the module param? Maybe this
> would be a flag for "flight recorder mode" that might do more that just set
> overwrite on.

Sure, that's a good idea.

> I don't see the need for any of the changes to relay.c.  _stp_main_loop() is
> already calling cleanup_and_exit() and that forces an immediate read of all 
> data in the buffer(s) and exit. I tried it and there is a small bug with a hang
> at cleanup, but it is easy to fix. Why do you need a special read with mmap()?

I think read() syscall against relay-channels has a problem when the staprun
attaches to a module again.
As far as I know, because the read() checks the writing position and we can not
read beyond that position, sometimes it drops almost all data in the sub-buffers
(but the data are still there).

You can see it if you directly read from debugfs entry and count number of lines
periodically.
# while true; do wc -l /sys/kernel/debug/systemtap/stap_*/trace0; sleep 2; done
At the beginning, the number of lines will be increasing. But after a long
moment, it will rewind to zero, and will be increasing again.

I just checked it on RHEL5, so I'll check it on the latest kernel again.

> Stream (non-bulk) mode seems to work just fine with this too.
> 
> I am planning to make a final change to the transport so there is no difference
> in the kernel between bulk and stream mode. staprun will take a  bulk-mode flag.
> WIth bulkmode on, it will work as it does now.  With bulk mode off (stream mode)
> it wil poll for data from relayfs percpu channels, put it in order, and display
> it (or send to a single file) in realtime. 
> 
> Will this help you flight recorder work or make it more difficult? It will only
> matter if you used stream mode for anything and depended on it having a single
> global buffer.

Sorry, I'm not sure. It seems a good idea. However, I'm concerned about read()
syscall, not the bulk/stream mode.

Thanks,
Comment 16 Martin Hunt 2007-06-08 15:27:41 UTC
Subject: Re:  Flight Recorder on Memory

On Fri, 2007-06-08 at 13:29 +0000, masami dot hiramatsu dot pt at
hitachi dot com wrote:
> 
> > I don't see the need for any of the changes to relay.c.  _stp_main_loop() is
> > already calling cleanup_and_exit() and that forces an immediate read of all 
> > data in the buffer(s) and exit. I tried it and there is a small bug with a hang
> > at cleanup, but it is easy to fix. Why do you need a special read with mmap()?
> 
> I think read() syscall against relay-channels has a problem when the staprun
> attaches to a module again.
> As far as I know, because the read() checks the writing position and we can not
> read beyond that position, sometimes it drops almost all data in the sub-buffers
> (but the data are still there).
> 
> You can see it if you directly read from debugfs entry and count number of lines
> periodically.
> # while true; do wc -l /sys/kernel/debug/systemtap/stap_*/trace0; sleep 2; done
> At the beginning, the number of lines will be increasing. But after a long
> moment, it will rewind to zero, and will be increasing again.

You are seeing correct behavior for read() with relayfs. When you read
from a relayfs channel, you get all the data in the current subbuffer.
If you read past the end of that subbuffer, it is marked as consumed,
and data is read from the next subbuf. When you close and open a channel
again, you start from the beginning of the current subbuf, but any
subbufs that have been consumed will not be seen.  They are waiting to
be filled by incoming data.

If you use mmap, you get to see all the buffer space, but must figure
out somehow what order the data is and where the padding is, etc.

I think the read() interface is simpler and would only be a problem if
You need "staprun -D" to read data from subbufs that have already been
read.



Comment 17 Masami Hiramatsu 2007-06-19 02:58:48 UTC
Created attachment 1896 [details]
add "-O" (overwrite mode) option to stap command

Add "-O"(overwrite mode) flag to stap instead of using a module param.
Comment 18 Masami Hiramatsu 2007-08-10 16:56:58 UTC
Created attachment 1958 [details]
Automatically using overwrite mode while the script runs in flight-recorder mode.

This patch fixes systemtap to overwrite its buffers on memory while it runs in
the flight-recorder mode.
With this patch, systemtap checks whether it connected to staprun and
automatically changes the relayfs mode.
Comment 19 Martin Hunt 2007-08-17 20:05:07 UTC
(In reply to comment #18)
> Created an attachment (id=1958)
> Automatically using overwrite mode while the script runs in flight-recorder
> mode.

I've checked this patch in.