]>
Commit | Line | Data |
---|---|---|
cc55c579 DD |
1 | |
2 | SSP - The Single Step Profiler | |
3 | ||
4 | Original Author: DJ Delorie <dj@redhat.com> | |
5 | ||
6 | The SSP is a program that uses the Win32 debug API to run a program | |
7 | one ASM instruction at a time. It records the location of each | |
8 | instruction used, how many times that instruction is used, and all | |
9 | function calls. The results are saved in a format that is usable by | |
10 | the profiling program "gprof", although gprof will claim the values | |
11 | are seconds, they really are instruction counts. More on that later. | |
12 | ||
13 | Because the SSP was originally designed to profile the cygwin DLL, it | |
14 | does not automatically select a block of code to report statistics on. | |
15 | You must specify the range of memory addresses to keep track of | |
16 | manually, but it's not hard to figure out what to specify. Use the | |
17 | "objdump" program to determine the bounds of the target's ".text" | |
18 | section. Let's say we're profiling cygwin1.dll. Make sure you've | |
19 | built it with debug symbols (else gprof won't run) and run objdump | |
20 | like this: | |
21 | ||
eedc36cb | 22 | objdump -h cygwin1.dll |
cc55c579 DD |
23 | |
24 | It will print a report like this: | |
25 | ||
26 | cygwin1.dll: file format pei-i386 | |
27 | ||
28 | Sections: | |
29 | Idx Name Size VMA LMA File off Algn | |
30 | 0 .text 0007ea00 61001000 61001000 00000400 2**2 | |
31 | CONTENTS, ALLOC, LOAD, READONLY, CODE, DATA | |
32 | 1 .data 00008000 61080000 61080000 0007ee00 2**2 | |
33 | CONTENTS, ALLOC, LOAD, DATA | |
34 | . . . | |
35 | ||
36 | The only information we're concerned with are the VMA of the .text | |
37 | section and the VMA of the section after it (sections are usually | |
38 | contiguous; you can also add the Size to the VMA to get the end | |
39 | address). In this case, the VMA is 0x61001000 and the ending address | |
40 | is either 0x61080000 (start of .data method) or 0x0x6107fa00 (VMA+Size | |
41 | method). | |
42 | ||
43 | There are two basic ways to use SSP - either profiling a whole | |
44 | program, or selectively profiling parts of the program. | |
45 | ||
46 | To profile a whole program, just run ssp without options. By default, | |
47 | it will step the whole program. Here's a simple example, using the | |
48 | numbers above: | |
49 | ||
50 | ssp 0x61001000 0x61080000 hello.exe | |
51 | ||
52 | This will step the whole program. It will take at least 8 minutes on | |
53 | a PII/300 (yes, really). When it's done, it will create a file called | |
54 | "gmon.out". You can turn this data file into a readable report with | |
55 | gprof: | |
56 | ||
57 | gprof -b cygwin1.dll | |
58 | ||
59 | The "-b" means "skip the help pages". You can omit this until you're | |
60 | familiar with the report layout. The gprof documentation explains | |
61 | a lot about this report, but ssp changes a few things. For example, | |
62 | the first part of the report reports the amount of time spent in each | |
63 | function, like this: | |
64 | ||
65 | Each sample counts as 0.01 seconds. | |
eedc36cb CF |
66 | % cumulative self self total |
67 | time seconds seconds calls ms/call ms/call name | |
cc55c579 DD |
68 | 10.02 231.22 72.43 46 1574.57 1574.57 strcspn |
69 | 7.95 288.70 57.48 130 442.15 442.15 strncasematch | |
70 | ||
71 | The "seconds" columns are really CPU opcodes, 1/100 second per opcode. | |
72 | So, "231.22" above means 23,122 opcodes. The ms/call values are 10x | |
73 | too big; 1574.57 means 157.457 opcodes per call. Similar adjustments | |
74 | need to be made for the "self" and "children" columns in the second | |
75 | part of the report. | |
76 | ||
77 | OK, so now we've got a huge report that took a long time to generate, | |
78 | and we've identified a spot we want to work on optimizing. Let's say | |
79 | it's the time() function. We can use SSP to selectively profile this | |
80 | function by using OutputDebugString() to control SSP from within the | |
81 | program. Here's a sample program: | |
82 | ||
83 | #include <windows.h> | |
84 | main() | |
85 | { | |
86 | time_t t; | |
87 | OutputDebugString("ssp on"); | |
88 | time(&t); | |
89 | OutputDebugString("ssp off"); | |
90 | } | |
91 | ||
92 | Then, add the "-d" option to ssp to default to *disabling* profiling. | |
93 | The program will run at full speed until the first OutputDebugString, | |
94 | then step until the second. | |
95 | ||
96 | ssp -d 0x61001000 0x61080000 hello.exe | |
97 | ||
98 | You can then use gprof (as usual) to see the performance profile for | |
99 | just that portion of the program's execution. | |
100 | ||
101 | OK, now for the other ssp options, and when to use them: | |
102 | ||
103 | "-v" - verbose. This prints messages about threads starting and | |
104 | stopping, OutputDebugString calls, DLLs loading, etc. | |
105 | ||
106 | "-t" and "-tc" - tracing. With -t, *every* step's address is written | |
107 | to the file "trace.ssp". This can be used to help debug functions, | |
108 | since it can trace multiple threads. Clever use of scripts can match | |
109 | addresses with disassembled opcodes if needed. Warning: creates | |
110 | *huge* files, very quickly. "-tc" prints each address to the console, | |
111 | useful for debugging key chunks of assembler. | |
112 | ||
113 | "-s" - subthreads. Usually, you only need to trace the main thread, | |
114 | but sometimes you need to trace all threads, so this enables that. | |
115 | It's also needed when you want to profile a function that only a | |
116 | subthread calls. However, using OutputDebugString automatically | |
117 | enables profiling on the thread that called it, not the main thread. | |
118 | ||
119 | "-dll" - dll profiling. Generates a pretty table of how much time was | |
120 | spent in each dll the program used. No sense optimizing a function in | |
121 | your program if most of the time is spent in the DLL. | |
122 | ||
123 | I usually use the -v, -s, and -dll options: | |
124 | ||
125 | ssp -v -s -dll -d 0x61001000 0x61080000 hello.exe |