This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

tracking memory map changes


One of the requirements of doing user-space probing is being able to
follow user-space program memory map changes.  I've been poking around
and I thought I'd share what I found and the possibilities of where to
go from here.

I did all of the following on f8 and looking at 2.6.24.4 kernel source.
 I took a peek at the source for 2.6.26, but didn't see any drastic
differences in this area.

Let me start by describing what is composed by a memory map.  A thread's
memory map is described in the kernel by a struct mm_struct.  Each
individual piece is described by a struct vm_area_struct.  Both
structures' definitions can be found in include/linux/mm_types.h.

The easiest way to look at memory map is by looking at /proc/PID/maps.
On an x86 f8 system, I ran /bin/cat, and here's what you would see in
/proc/PID/maps for /bin/cat (note that I added the column headers):

vm_start-vm_end  flags vm_pgoff MJ:MN inode      path
----------------- ---- -------- ----- -------    -----------------
00110000-00111000 r-xp 00110000 00:00 0          [vdso]
00655000-00670000 r-xp 00000000 fd:00 3080583    /lib/ld-2.7.so
00670000-00671000 r-xp 0001a000 fd:00 3080583    /lib/ld-2.7.so
00671000-00672000 rwxp 0001b000 fd:00 3080583    /lib/ld-2.7.so
00674000-007c7000 r-xp 00000000 fd:00 3083040    /lib/libc-2.7.so
007c7000-007c9000 r-xp 00153000 fd:00 3083040    /lib/libc-2.7.so
007c9000-007ca000 rwxp 00155000 fd:00 3083040    /lib/libc-2.7.so
007ca000-007cd000 rwxp 007ca000 00:00 0
08048000-0804d000 r-xp 00000000 fd:00 2621473    /bin/cat
0804d000-0804e000 rw-p 00004000 fd:00 2621473    /bin/cat
08e7e000-08e9f000 rw-p 08e7e000 00:00 0
b7dbe000-b7fbe000 r--p 00000000 fd:00 2526405
/usr/lib/locale/locale-archive
b7fbe000-b7fc0000 rw-p b7fbe000 00:00 0
bfa14000-bfa29000 rw-p bffea000 00:00 0          [stack]

The vm_start, vm_end, flags, and vm_pgoff columns are fields straight
out of vm_area_struct.  The 'MJ:MN' header denotes the MAJOR and MINOR
numbers of the inode's device.  Both the inode and device come from
looking at vma->vm_file.  vm_pgoff is the offset within the associated
vm_file where this vm_area_struct starts.

At first I was confused by multiple vm_area_structs for /lib/ld-2.7.so,
/lib/libc-2.7.so and /bin/cat, until I realized they were for the .text,
.data, and .bss sections of those files.  According to the 'size'
command, /bin/cat has no .bss section, so it only has 2 vm_area_structs.

Note that there are no explicit flags set on a vm_area_struct for the
differences between sections - in other words, there is nothing that
definitively says that this particular vm_area_struct maps a .text
section vs. a .data section vs. a .bss section.  A guess could be made
that the first (vm_pgoff == 0), read-only, executable, vm_area_struct
associated with a particular file is probably the .text section.  .bss
sections are the only writable sections out of the three.

Frank, here are some initial questions.

Q1: What information will the runtime need from each vm_area_struct?
I'd guess the path, vm_start, and vm_end at a minimum.

Q2: Will the runtime want to know only about new text sections being
added or all sections?

Q3: Will the runtime want to know about any of the vm_area_structs not
associated with a file?

When /bin/cat, gets exec'ed, the /lib/ld-2.7.so and /bin/cat files are
already mapped in.  As /bin/cat runs, it loads in /lib/libc-2.7.so.
This means that we've got 2 related problems: enumerating the sections
when first attaching to a thread (either by being exec'ed or by
attaching to an existing thread) then tracking memory map changes as
they occur (as in loading /lib/libc-2.7.so or by a thread calling dlopen()).

Enumerating the existing vm_area_structs seems easy enough.  Tracking
new vm_area_structs as they get added is harder.  Finding the right
point and the right method is the problem.

The sys_mmap2() system call is a wrapper around
mm/mmap.c:do_mmap_pgoff(). do_mmap_pgoff() does lots of error checking,
then calls mm/mmap.c:mmap_region() to actually add a new vm_area_struct.
 Toward the end of mmap_region(), vm_stat_account() is called (if
CONFIG_PROC_FS is on).

So, where/how to track memory map changes?  Here are a few ideas:

1) Set a kretprobe on sys_mmap2()/do_mmap_pgoff()/mmap_region().  One
problem here is that kretprobes are limited in quantity.

2) Set a kprobe on vm_stat_account().  This would require that the
kernel was configured with CONFIG_PROC_FS and that vm_stat_account() is
getting called in the correct place in all the kernels we're interested in.

3) Turn on utrace syscall return tracing for that thread and wait for
mmap calls to return.  This is probably the easiest route, but it forces
every syscall for that thread to go through the slow path.  A big
advantage here is that an all utrace solution wouldn't require any
debugging info for the kernel to be present on the system.

Does any have any better ideas or preferences here?

In all of the above methods the code won't know what was added, just
that a new vm_area_struct might exist, so I'll have to figure out a way
to track changes.

Finally, the shortest path to something somewhat useful would be to
first work on providing notification of existing vm_area_structs.  This
might help move user-space tracing along while I work on the harder
problem of tracking memory map changes.  Providing notification of
existing vm_area_structs might allow attaching to an existing thread
(which already has all its shared libraries loaded and doesn't call
dlopen()) and being able to figure out the right address to probe.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]