diff options
author | Scott Rifenbark <scott.m.rifenbark@intel.com> | 2013-01-16 12:49:45 -0800 |
---|---|---|
committer | Richard Purdie <richard.purdie@linuxfoundation.org> | 2013-01-27 13:56:02 +0000 |
commit | 86478ff9486516ed428e470afbe76f3cd857888c (patch) | |
tree | b883e2138166f3a9daf986931779a8f4058169bf | |
parent | 487a76f7c7bfdb47ddec76659794834e84ed04eb (diff) | |
download | poky-86478ff9486516ed428e470afbe76f3cd857888c.tar.gz |
profile-manual: Updated indentation and sectioning
Had the sectioning off and it was preventing the manual from
making. I also added more indentation as needed throughout
most of the manual up to the error.
(From yocto-docs rev: 5de3544593a63e96b349babc177970d8605e0098)
Signed-off-by: Scott Rifenbark <scott.m.rifenbark@intel.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
-rw-r--r-- | documentation/profile-manual/profile-manual-usage.xml | 1257 |
1 files changed, 629 insertions, 628 deletions
diff --git a/documentation/profile-manual/profile-manual-usage.xml b/documentation/profile-manual/profile-manual-usage.xml index cfe916a795..51c483b700 100644 --- a/documentation/profile-manual/profile-manual-usage.xml +++ b/documentation/profile-manual/profile-manual-usage.xml | |||
@@ -108,28 +108,28 @@ | |||
108 | top System profiling tool. | 108 | top System profiling tool. |
109 | 109 | ||
110 | See 'perf help COMMAND' for more information on a specific command. | 110 | See 'perf help COMMAND' for more information on a specific command. |
111 | </literallayout> | 111 | </literallayout> |
112 | </para> | 112 | </para> |
113 | 113 | ||
114 | <section id='using-perf-to-do-basic-profiling'> | 114 | <section id='using-perf-to-do-basic-profiling'> |
115 | <title>Using perf to do Basic Profiling</title> | 115 | <title>Using perf to do Basic Profiling</title> |
116 | 116 | ||
117 | <para> | 117 | <para> |
118 | As a simple test case, we'll profile the 'wget' of a fairly large | 118 | As a simple test case, we'll profile the 'wget' of a fairly large |
119 | file, which is a minimally interesting case because it has both | 119 | file, which is a minimally interesting case because it has both |
120 | file and network I/O aspects, and at least in the case of standard | 120 | file and network I/O aspects, and at least in the case of standard |
121 | Yocto images, it's implemented as part of busybox, so the methods | 121 | Yocto images, it's implemented as part of busybox, so the methods |
122 | we use to analyze it can be used in a very similar way to the whole | 122 | we use to analyze it can be used in a very similar way to the whole |
123 | host of supported busybox applets in Yocto. | 123 | host of supported busybox applets in Yocto. |
124 | <literallayout class='monospaced'> | 124 | <literallayout class='monospaced'> |
125 | root@crownbay:~# rm linux-2.6.19.2.tar.bz2; \ | 125 | root@crownbay:~# rm linux-2.6.19.2.tar.bz2; \ |
126 | wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 | 126 | wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 |
127 | </literallayout> | 127 | </literallayout> |
128 | The quickest and easiest way to get some basic overall data about | 128 | The quickest and easiest way to get some basic overall data about |
129 | what's going on for a particular workload it to profile it using | 129 | what's going on for a particular workload it to profile it using |
130 | 'perf stat'. 'perf stat' basically profiles using a few default | 130 | 'perf stat'. 'perf stat' basically profiles using a few default |
131 | counters and displays the summed counts at the end of the run: | 131 | counters and displays the summed counts at the end of the run: |
132 | <literallayout class='monospaced'> | 132 | <literallayout class='monospaced'> |
133 | root@crownbay:~# perf stat wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 | 133 | root@crownbay:~# perf stat wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 |
134 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) | 134 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) |
135 | linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA | 135 | linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA |
@@ -148,19 +148,19 @@ | |||
148 | 19550329 branch-misses # 11.82% of all branches | 148 | 19550329 branch-misses # 11.82% of all branches |
149 | 149 | ||
150 | 59.836627620 seconds time elapsed | 150 | 59.836627620 seconds time elapsed |
151 | </literallayout> | 151 | </literallayout> |
152 | Many times such a simple-minded test doesn't yield much of | 152 | Many times such a simple-minded test doesn't yield much of |
153 | interest, but sometimes it does (see Real-world Yocto bug | 153 | interest, but sometimes it does (see Real-world Yocto bug |
154 | (slow loop-mounted write speed)). | 154 | (slow loop-mounted write speed)). |
155 | </para> | 155 | </para> |
156 | 156 | ||
157 | <para> | 157 | <para> |
158 | Also, note that 'perf stat' isn't restricted to a fixed set of | 158 | Also, note that 'perf stat' isn't restricted to a fixed set of |
159 | counters - basically any event listed in the output of 'perf list' | 159 | counters - basically any event listed in the output of 'perf list' |
160 | can be tallied by 'perf stat'. For example, suppose we wanted to | 160 | can be tallied by 'perf stat'. For example, suppose we wanted to |
161 | see a summary of all the events related to kernel memory | 161 | see a summary of all the events related to kernel memory |
162 | allocation/freeing along with cache hits and misses: | 162 | allocation/freeing along with cache hits and misses: |
163 | <literallayout class='monospaced'> | 163 | <literallayout class='monospaced'> |
164 | root@crownbay:~# perf stat -e kmem:* -e cache-references -e cache-misses wget http:// downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 | 164 | root@crownbay:~# perf stat -e kmem:* -e cache-references -e cache-misses wget http:// downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 |
165 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) | 165 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) |
166 | linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA | 166 | linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA |
@@ -183,28 +183,28 @@ | |||
183 | 2917740 cache-misses # 4.365 % of all cache refs | 183 | 2917740 cache-misses # 4.365 % of all cache refs |
184 | 184 | ||
185 | 44.831023415 seconds time elapsed | 185 | 44.831023415 seconds time elapsed |
186 | </literallayout> | 186 | </literallayout> |
187 | So 'perf stat' gives us a nice easy way to get a quick overview of | 187 | So 'perf stat' gives us a nice easy way to get a quick overview of |
188 | what might be happening for a set of events, but normally we'd | 188 | what might be happening for a set of events, but normally we'd |
189 | need a little more detail in order to understand what's going on | 189 | need a little more detail in order to understand what's going on |
190 | in a way that we can act on in a useful way. | 190 | in a way that we can act on in a useful way. |
191 | </para> | 191 | </para> |
192 | 192 | ||
193 | <para> | 193 | <para> |
194 | To dive down into a next level of detail, we can use 'perf | 194 | To dive down into a next level of detail, we can use 'perf |
195 | record'/'perf report' which will collect profiling data and | 195 | record'/'perf report' which will collect profiling data and |
196 | present it to use using an interactive text-based UI (or | 196 | present it to use using an interactive text-based UI (or |
197 | simply as text if we specify --stdio to 'perf report'). | 197 | simply as text if we specify --stdio to 'perf report'). |
198 | </para> | 198 | </para> |
199 | 199 | ||
200 | <para> | 200 | <para> |
201 | As our first attempt at profiling this workload, we'll simply | 201 | As our first attempt at profiling this workload, we'll simply |
202 | run 'perf record', handing it the workload we want to profile | 202 | run 'perf record', handing it the workload we want to profile |
203 | (everything after 'perf record' and any perf options we hand | 203 | (everything after 'perf record' and any perf options we hand |
204 | it - here none - will be executedin a new shell). perf collects | 204 | it - here none - will be executedin a new shell). perf collects |
205 | samples until the process exits and records them in a file named | 205 | samples until the process exits and records them in a file named |
206 | 'perf.data' in the current working directory. | 206 | 'perf.data' in the current working directory. |
207 | <literallayout class='monospaced'> | 207 | <literallayout class='monospaced'> |
208 | root@crownbay:~# perf record wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 | 208 | root@crownbay:~# perf record wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 |
209 | 209 | ||
210 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) | 210 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) |
@@ -215,44 +215,44 @@ | |||
215 | To see the results in a 'text-based UI' (tui), simply run | 215 | To see the results in a 'text-based UI' (tui), simply run |
216 | 'perf report', which will read the perf.data file in the current | 216 | 'perf report', which will read the perf.data file in the current |
217 | working directory and display the results in an interactive UI: | 217 | working directory and display the results in an interactive UI: |
218 | <literallayout class='monospaced'> | 218 | <literallayout class='monospaced'> |
219 | root@crownbay:~# perf report | 219 | root@crownbay:~# perf report |
220 | </literallayout> | 220 | </literallayout> |
221 | </para> | 221 | </para> |
222 | 222 | ||
223 | <para> | 223 | <para> |
224 | <imagedata fileref="figures/perf-wget-flat-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> | 224 | <imagedata fileref="figures/perf-wget-flat-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> |
225 | </para> | 225 | </para> |
226 | 226 | ||
227 | <para> | 227 | <para> |
228 | The above screenshot displays a 'flat' profile, one entry for | 228 | The above screenshot displays a 'flat' profile, one entry for |
229 | each 'bucket' corresponding to the functions that were profiled | 229 | each 'bucket' corresponding to the functions that were profiled |
230 | during the profiling run, ordered from the most popular to the | 230 | during the profiling run, ordered from the most popular to the |
231 | least (perf has options to sort in various orders and keys as | 231 | least (perf has options to sort in various orders and keys as |
232 | well as display entries only above a certain threshold and so | 232 | well as display entries only above a certain threshold and so |
233 | on - see the perf documentation for details). Note that this | 233 | on - see the perf documentation for details). Note that this |
234 | includes both userspace functions (entries containing a [.]) and | 234 | includes both userspace functions (entries containing a [.]) and |
235 | kernel functions accounted to the process (entries containing | 235 | kernel functions accounted to the process (entries containing |
236 | a [k]). (perf has command-line modifiers that can be used to | 236 | a [k]). (perf has command-line modifiers that can be used to |
237 | restrict the profiling to kernel or userspace, among others). | 237 | restrict the profiling to kernel or userspace, among others). |
238 | </para> | 238 | </para> |
239 | 239 | ||
240 | <para> | 240 | <para> |
241 | Notice also that the above report shows an entry for 'busybox', | 241 | Notice also that the above report shows an entry for 'busybox', |
242 | which is the executable that implements 'wget' in Yocto, but that | 242 | which is the executable that implements 'wget' in Yocto, but that |
243 | instead of a useful function name in that entry, it displays | 243 | instead of a useful function name in that entry, it displays |
244 | an not-so-friendly hex value instead. The steps below will show | 244 | an not-so-friendly hex value instead. The steps below will show |
245 | how to fix that problem. | 245 | how to fix that problem. |
246 | </para> | 246 | </para> |
247 | 247 | ||
248 | <para> | 248 | <para> |
249 | Before we do that, however, let's try running a different profile, | 249 | Before we do that, however, let's try running a different profile, |
250 | one which shows something a little more interesting. The only | 250 | one which shows something a little more interesting. The only |
251 | difference between the new profile and the previous one is that | 251 | difference between the new profile and the previous one is that |
252 | we'll add the -g option, which will record not just the address | 252 | we'll add the -g option, which will record not just the address |
253 | of a sampled function, but the entire callchain to the sampled | 253 | of a sampled function, but the entire callchain to the sampled |
254 | function as well: | 254 | function as well: |
255 | <literallayout class='monospaced'> | 255 | <literallayout class='monospaced'> |
256 | root@crownbay:~# perf record -g wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 | 256 | root@crownbay:~# perf record -g wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 |
257 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) | 257 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) |
258 | linux-2.6.19.2.tar.b 100% |************************************************| 41727k 0:00:00 ETA | 258 | linux-2.6.19.2.tar.b 100% |************************************************| 41727k 0:00:00 ETA |
@@ -261,217 +261,217 @@ | |||
261 | 261 | ||
262 | 262 | ||
263 | root@crownbay:~# perf report | 263 | root@crownbay:~# perf report |
264 | </literallayout> | 264 | </literallayout> |
265 | </para> | 265 | </para> |
266 | 266 | ||
267 | <para> | 267 | <para> |
268 | <imagedata fileref="figures/perf-wget-g-copy-to-user-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> | 268 | <imagedata fileref="figures/perf-wget-g-copy-to-user-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> |
269 | </para> | 269 | </para> |
270 | 270 | ||
271 | <para> | 271 | <para> |
272 | Using the callgraph view, we can actually see not only which | 272 | Using the callgraph view, we can actually see not only which |
273 | functions took the most time, but we can also see a summary of | 273 | functions took the most time, but we can also see a summary of |
274 | how those functions were called and learn something about how the | 274 | how those functions were called and learn something about how the |
275 | program interacts with the kernel in the process. | 275 | program interacts with the kernel in the process. |
276 | </para> | 276 | </para> |
277 | 277 | ||
278 | <para> | 278 | <para> |
279 | Notice that each entry in the above screenshot now contains a '+' | 279 | Notice that each entry in the above screenshot now contains a '+' |
280 | on the left-hand side. This means that we can expand the entry and | 280 | on the left-hand side. This means that we can expand the entry and |
281 | drill down into the callchains that feed into that entry. | 281 | drill down into the callchains that feed into that entry. |
282 | Pressing 'enter' on any one of them will expand the callchain | 282 | Pressing 'enter' on any one of them will expand the callchain |
283 | (you can also press 'E' to expand them all at the same time or 'C' | 283 | (you can also press 'E' to expand them all at the same time or 'C' |
284 | to collapse them all). | 284 | to collapse them all). |
285 | </para> | 285 | </para> |
286 | 286 | ||
287 | <para> | 287 | <para> |
288 | In the screenshot above, we've toggled the __copy_to_user_ll() | 288 | In the screenshot above, we've toggled the __copy_to_user_ll() |
289 | entry and several subnodes all the way down. This lets us see | 289 | entry and several subnodes all the way down. This lets us see |
290 | which callchains contributed to the profiled __copy_to_user_ll() | 290 | which callchains contributed to the profiled __copy_to_user_ll() |
291 | function which contributed 1.77% to the total profile. | 291 | function which contributed 1.77% to the total profile. |
292 | </para> | 292 | </para> |
293 | 293 | ||
294 | <para> | 294 | <para> |
295 | As a bit of background explanation for these callchains, think | 295 | As a bit of background explanation for these callchains, think |
296 | about what happens at a high level when you run wget to get a file | 296 | about what happens at a high level when you run wget to get a file |
297 | out on the network. Basically what happens is that the data comes | 297 | out on the network. Basically what happens is that the data comes |
298 | into the kernel via the network connection (socket) and is passed | 298 | into the kernel via the network connection (socket) and is passed |
299 | to the userspace program 'wget' (which is actually a part of | 299 | to the userspace program 'wget' (which is actually a part of |
300 | busybox, but that's not important for now), which takes the buffers | 300 | busybox, but that's not important for now), which takes the buffers |
301 | the kernel passes to it and writes it to a disk file to save it. | 301 | the kernel passes to it and writes it to a disk file to save it. |
302 | </para> | 302 | </para> |
303 | 303 | ||
304 | <para> | 304 | <para> |
305 | The part of this process that we're looking at in the above call | 305 | The part of this process that we're looking at in the above call |
306 | stacks is the part where the kernel passes the data it's read from | 306 | stacks is the part where the kernel passes the data it's read from |
307 | the socket down to wget i.e. a copy-to-user. | 307 | the socket down to wget i.e. a copy-to-user. |
308 | </para> | 308 | </para> |
309 | 309 | ||
310 | <para> | 310 | <para> |
311 | Notice also that here there's also a case where the a hex value | 311 | Notice also that here there's also a case where the a hex value |
312 | is displayed in the callstack, here in the expanded | 312 | is displayed in the callstack, here in the expanded |
313 | sys_clock_gettime() function. Later we'll see it resolve to a | 313 | sys_clock_gettime() function. Later we'll see it resolve to a |
314 | userspace function call in busybox. | 314 | userspace function call in busybox. |
315 | </para> | 315 | </para> |
316 | 316 | ||
317 | <para> | 317 | <para> |
318 | <imagedata fileref="figures/perf-wget-g-copy-from-user-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> | 318 | <imagedata fileref="figures/perf-wget-g-copy-from-user-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> |
319 | </para> | 319 | </para> |
320 | 320 | ||
321 | <para> | 321 | <para> |
322 | The above screenshot shows the other half of the journey for the | 322 | The above screenshot shows the other half of the journey for the |
323 | data - from the wget program's userspace buffers to disk. To get | 323 | data - from the wget program's userspace buffers to disk. To get |
324 | the buffers to disk, the wget program issues a write(2), which | 324 | the buffers to disk, the wget program issues a write(2), which |
325 | does a copy-from-user to the kernel, which then takes care via | 325 | does a copy-from-user to the kernel, which then takes care via |
326 | some circuitous path (probably also present somewhere in the | 326 | some circuitous path (probably also present somewhere in the |
327 | profile data), to get it safely to disk. | 327 | profile data), to get it safely to disk. |
328 | </para> | 328 | </para> |
329 | 329 | ||
330 | <para> | 330 | <para> |
331 | Now that we've seen the basic layout of the profile data and the | 331 | Now that we've seen the basic layout of the profile data and the |
332 | basics of how to extract useful information out of it, let's get | 332 | basics of how to extract useful information out of it, let's get |
333 | back to the task at hand and see if we can get some basic idea | 333 | back to the task at hand and see if we can get some basic idea |
334 | about where the time is spent in the program we're profiling, | 334 | about where the time is spent in the program we're profiling, |
335 | wget. Remember that wget is actually implemented as an applet | 335 | wget. Remember that wget is actually implemented as an applet |
336 | in busybox, so while the process name is 'wget', the executable | 336 | in busybox, so while the process name is 'wget', the executable |
337 | we're actually interested in is busybox. So let's expand the | 337 | we're actually interested in is busybox. So let's expand the |
338 | first entry containing busybox: | 338 | first entry containing busybox: |
339 | </para> | 339 | </para> |
340 | 340 | ||
341 | <para> | 341 | <para> |
342 | <imagedata fileref="figures/perf-wget-busybox-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> | 342 | <imagedata fileref="figures/perf-wget-busybox-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> |
343 | </para> | 343 | </para> |
344 | 344 | ||
345 | <para> | 345 | <para> |
346 | Again, before we expanded we saw that the function was labeled | 346 | Again, before we expanded we saw that the function was labeled |
347 | with a hex value instead of a symbol as with most of the kernel | 347 | with a hex value instead of a symbol as with most of the kernel |
348 | entries. Expanding the busybox entry doesn't make it any better. | 348 | entries. Expanding the busybox entry doesn't make it any better. |
349 | </para> | 349 | </para> |
350 | 350 | ||
351 | <para> | 351 | <para> |
352 | The problem is that perf can't find the symbol information for the | 352 | The problem is that perf can't find the symbol information for the |
353 | busybox binary, which is actually stripped out by the Yocto build | 353 | busybox binary, which is actually stripped out by the Yocto build |
354 | system. | 354 | system. |
355 | </para> | 355 | </para> |
356 | 356 | ||
357 | <para> | 357 | <para> |
358 | One way around that is to put the following in your local.conf | 358 | One way around that is to put the following in your local.conf |
359 | when you build the image: | 359 | when you build the image: |
360 | <literallayout class='monospaced'> | 360 | <literallayout class='monospaced'> |
361 | INHIBIT_PACKAGE_STRIP = "1" | 361 | INHIBIT_PACKAGE_STRIP = "1" |
362 | </literallayout> | 362 | </literallayout> |
363 | However, we already have an image with the binaries stripped, | 363 | However, we already have an image with the binaries stripped, |
364 | so what can we do to get perf to resolve the symbols? Basically | 364 | so what can we do to get perf to resolve the symbols? Basically |
365 | we need to install the debuginfo for the busybox package. | 365 | we need to install the debuginfo for the busybox package. |
366 | </para> | 366 | </para> |
367 | 367 | ||
368 | <para> | 368 | <para> |
369 | To generate the debug info for the packages in the image, we can | 369 | To generate the debug info for the packages in the image, we can |
370 | to add dbg-pkgs to EXTRA_IMAGE_FEATURES in local.conf. For example: | 370 | to add dbg-pkgs to EXTRA_IMAGE_FEATURES in local.conf. For example: |
371 | <literallayout class='monospaced'> | 371 | <literallayout class='monospaced'> |
372 | EXTRA_IMAGE_FEATURES = "debug-tweaks tools-profile dbg-pkgs" | 372 | EXTRA_IMAGE_FEATURES = "debug-tweaks tools-profile dbg-pkgs" |
373 | </literallayout> | 373 | </literallayout> |
374 | Additionally, in order to generate the type of debuginfo that | 374 | Additionally, in order to generate the type of debuginfo that |
375 | perf understands, we also need to add the following to local.conf: | 375 | perf understands, we also need to add the following to local.conf: |
376 | <literallayout class='monospaced'> | 376 | <literallayout class='monospaced'> |
377 | PACKAGE_DEBUG_SPLIT_STYLE = 'debug-file-directory' | 377 | PACKAGE_DEBUG_SPLIT_STYLE = 'debug-file-directory' |
378 | </literallayout> | 378 | </literallayout> |
379 | Once we've done that, we can install the debuginfo for busybox. | 379 | Once we've done that, we can install the debuginfo for busybox. |
380 | The debug packages once built can be found in | 380 | The debug packages once built can be found in |
381 | build/tmp/deploy/rpm/* on the host system. Find the | 381 | build/tmp/deploy/rpm/* on the host system. Find the |
382 | busybox-dbg-...rpm file and copy it to the target. For example: | 382 | busybox-dbg-...rpm file and copy it to the target. For example: |
383 | <literallayout class='monospaced'> | 383 | <literallayout class='monospaced'> |
384 | [trz@empanada core2]$ scp /home/trz/yocto/crownbay-tracing-dbg/build/tmp/deploy/rpm/core2/busybox-dbg-1.20.2-r2.core2.rpm root@192.168.1.31: | 384 | [trz@empanada core2]$ scp /home/trz/yocto/crownbay-tracing-dbg/build/tmp/deploy/rpm/core2/busybox-dbg-1.20.2-r2.core2.rpm root@192.168.1.31: |
385 | root@192.168.1.31's password: | 385 | root@192.168.1.31's password: |
386 | busybox-dbg-1.20.2-r2.core2.rpm 100% 1826KB 1.8MB/s 00:01 | 386 | busybox-dbg-1.20.2-r2.core2.rpm 100% 1826KB 1.8MB/s 00:01 |
387 | </literallayout> | 387 | </literallayout> |
388 | Now install the debug rpm on the target: | 388 | Now install the debug rpm on the target: |
389 | <literallayout class='monospaced'> | 389 | <literallayout class='monospaced'> |
390 | root@crownbay:~# rpm -i busybox-dbg-1.20.2-r2.core2.rpm | 390 | root@crownbay:~# rpm -i busybox-dbg-1.20.2-r2.core2.rpm |
391 | </literallayout> | 391 | </literallayout> |
392 | Now that the debuginfo is installed, we see that the busybox | 392 | Now that the debuginfo is installed, we see that the busybox |
393 | entries now display their functions symbolically: | 393 | entries now display their functions symbolically: |
394 | </para> | 394 | </para> |
395 | 395 | ||
396 | <para> | 396 | <para> |
397 | <imagedata fileref="figures/perf-wget-busybox-debuginfo.png" width="6in" depth="7in" align="center" scalefit="1" /> | 397 | <imagedata fileref="figures/perf-wget-busybox-debuginfo.png" width="6in" depth="7in" align="center" scalefit="1" /> |
398 | </para> | 398 | </para> |
399 | 399 | ||
400 | <para> | 400 | <para> |
401 | If we expand one of the entries and press 'enter' on a leaf node, | 401 | If we expand one of the entries and press 'enter' on a leaf node, |
402 | we're presented with a menu of actions we can take to get more | 402 | we're presented with a menu of actions we can take to get more |
403 | information related to that entry: | 403 | information related to that entry: |
404 | </para> | 404 | </para> |
405 | 405 | ||
406 | <para> | 406 | <para> |
407 | <imagedata fileref="figures/perf-wget-busybox-dso-zoom-menu.png" width="6in" depth="7in" align="center" scalefit="1" /> | 407 | <imagedata fileref="figures/perf-wget-busybox-dso-zoom-menu.png" width="6in" depth="7in" align="center" scalefit="1" /> |
408 | </para> | 408 | </para> |
409 | 409 | ||
410 | <para> | 410 | <para> |
411 | One of these actions allows us to show a view that displays a | 411 | One of these actions allows us to show a view that displays a |
412 | busybox-centric view of the profiled functions (in this case we've | 412 | busybox-centric view of the profiled functions (in this case we've |
413 | also expanded all the nodes using the 'E' key): | 413 | also expanded all the nodes using the 'E' key): |
414 | </para> | 414 | </para> |
415 | 415 | ||
416 | <para> | 416 | <para> |
417 | <imagedata fileref="figures/perf-wget-busybox-dso-zoom.png" width="6in" depth="7in" align="center" scalefit="1" /> | 417 | <imagedata fileref="figures/perf-wget-busybox-dso-zoom.png" width="6in" depth="7in" align="center" scalefit="1" /> |
418 | </para> | 418 | </para> |
419 | 419 | ||
420 | <para> | 420 | <para> |
421 | Finally, we can see that now that the busybox debuginfo is | 421 | Finally, we can see that now that the busybox debuginfo is |
422 | installed, the previously unresolved symbol in the | 422 | installed, the previously unresolved symbol in the |
423 | sys_clock_gettime() entry mentioned previously is now resolved, | 423 | sys_clock_gettime() entry mentioned previously is now resolved, |
424 | and shows that the sys_clock_gettime system call that was the | 424 | and shows that the sys_clock_gettime system call that was the |
425 | source of 6.75% of the copy-to-user overhead was initiated by | 425 | source of 6.75% of the copy-to-user overhead was initiated by |
426 | the handle_input() busybox function: | 426 | the handle_input() busybox function: |
427 | </para> | 427 | </para> |
428 | 428 | ||
429 | <para> | 429 | <para> |
430 | <imagedata fileref="figures/perf-wget-g-copy-to-user-expanded-debuginfo.png" width="6in" depth="7in" align="center" scalefit="1" /> | 430 | <imagedata fileref="figures/perf-wget-g-copy-to-user-expanded-debuginfo.png" width="6in" depth="7in" align="center" scalefit="1" /> |
431 | </para> | 431 | </para> |
432 | 432 | ||
433 | <para> | 433 | <para> |
434 | At the lowest level of detail, we can dive down to the assembly | 434 | At the lowest level of detail, we can dive down to the assembly |
435 | level and see which instructions caused the most overhead in a | 435 | level and see which instructions caused the most overhead in a |
436 | function. Pressing 'enter' on the 'udhcpc_main' function, we're | 436 | function. Pressing 'enter' on the 'udhcpc_main' function, we're |
437 | again presented with a menu: | 437 | again presented with a menu: |
438 | </para> | 438 | </para> |
439 | 439 | ||
440 | <para> | 440 | <para> |
441 | <imagedata fileref="figures/perf-wget-busybox-annotate-menu.png" width="6in" depth="7in" align="center" scalefit="1" /> | 441 | <imagedata fileref="figures/perf-wget-busybox-annotate-menu.png" width="6in" depth="7in" align="center" scalefit="1" /> |
442 | </para> | 442 | </para> |
443 | 443 | ||
444 | <para> | 444 | <para> |
445 | Selecting 'Annotate udhcpc_main', we get a detailed listing of | 445 | Selecting 'Annotate udhcpc_main', we get a detailed listing of |
446 | percentages by instruction for the udhcpc_main function. From the | 446 | percentages by instruction for the udhcpc_main function. From the |
447 | display, we can see that over 50% of the time spent in this | 447 | display, we can see that over 50% of the time spent in this |
448 | function is taken up by a couple tests and the move of a | 448 | function is taken up by a couple tests and the move of a |
449 | constant (1) to a register: | 449 | constant (1) to a register: |
450 | </para> | 450 | </para> |
451 | 451 | ||
452 | <para> | 452 | <para> |
453 | <imagedata fileref="figures/perf-wget-busybox-annotate-udhcpc.png" width="6in" depth="7in" align="center" scalefit="1" /> | 453 | <imagedata fileref="figures/perf-wget-busybox-annotate-udhcpc.png" width="6in" depth="7in" align="center" scalefit="1" /> |
454 | </para> | 454 | </para> |
455 | 455 | ||
456 | <para> | 456 | <para> |
457 | As a segue into tracing, let's try another profile using a | 457 | As a segue into tracing, let's try another profile using a |
458 | different counter, something other than the default 'cycles'. | 458 | different counter, something other than the default 'cycles'. |
459 | </para> | 459 | </para> |
460 | 460 | ||
461 | <para> | 461 | <para> |
462 | The tracing and profiling infrastructure in Linux has become | 462 | The tracing and profiling infrastructure in Linux has become |
463 | unified in a way that allows us to use the same tool with a | 463 | unified in a way that allows us to use the same tool with a |
464 | completely different set of counters, not just the standard | 464 | completely different set of counters, not just the standard |
465 | hardware counters that traditionally tools have had to restrict | 465 | hardware counters that traditionally tools have had to restrict |
466 | themselves to (of course the traditional tools can also make use | 466 | themselves to (of course the traditional tools can also make use |
467 | of the expanded possibilities now available to them, and in some | 467 | of the expanded possibilities now available to them, and in some |
468 | cases have, as mentioned previously). | 468 | cases have, as mentioned previously). |
469 | </para> | 469 | </para> |
470 | 470 | ||
471 | <para> | 471 | <para> |
472 | We can get a list of the available events that can be used to | 472 | We can get a list of the available events that can be used to |
473 | profile a workload via 'perf list': | 473 | profile a workload via 'perf list': |
474 | <literallayout class='monospaced'> | 474 | <literallayout class='monospaced'> |
475 | root@crownbay:~# perf list | 475 | root@crownbay:~# perf list |
476 | 476 | ||
477 | List of pre-defined events (to be used in -e): | 477 | List of pre-defined events (to be used in -e): |
@@ -602,23 +602,23 @@ | |||
602 | syscalls:sys_exit_unshare [Tracepoint event] | 602 | syscalls:sys_exit_unshare [Tracepoint event] |
603 | raw_syscalls:sys_enter [Tracepoint event] | 603 | raw_syscalls:sys_enter [Tracepoint event] |
604 | raw_syscalls:sys_exit [Tracepoint event] | 604 | raw_syscalls:sys_exit [Tracepoint event] |
605 | </literallayout> | 605 | </literallayout> |
606 | </para> | 606 | </para> |
607 | 607 | ||
608 | <note> | 608 | <note> |
609 | Tying It Together: These are exactly the same set of events defined | 609 | Tying It Together: These are exactly the same set of events defined |
610 | by the trace event subsystem and exposed by | 610 | by the trace event subsystem and exposed by |
611 | ftrace/tracecmd/kernelshark as files in | 611 | ftrace/tracecmd/kernelshark as files in |
612 | /sys/kernel/debug/tracing/events, by SystemTap as | 612 | /sys/kernel/debug/tracing/events, by SystemTap as |
613 | kernel.trace("tracepoint_name") and (partially) accessed by LTTng. | 613 | kernel.trace("tracepoint_name") and (partially) accessed by LTTng. |
614 | </note> | 614 | </note> |
615 | 615 | ||
616 | <para> | 616 | <para> |
617 | Only a subset of these would be of interest to us when looking at | 617 | Only a subset of these would be of interest to us when looking at |
618 | this workload, so let's choose the most likely subsystems | 618 | this workload, so let's choose the most likely subsystems |
619 | (identified by the string before the colon in the Tracepoint events) | 619 | (identified by the string before the colon in the Tracepoint events) |
620 | and do a 'perf stat' run using only those wildcarded subsystems: | 620 | and do a 'perf stat' run using only those wildcarded subsystems: |
621 | <literallayout class='monospaced'> | 621 | <literallayout class='monospaced'> |
622 | root@crownbay:~# perf stat -e skb:* -e net:* -e napi:* -e sched:* -e workqueue:* -e irq:* -e syscalls:* wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 | 622 | root@crownbay:~# perf stat -e skb:* -e net:* -e napi:* -e sched:* -e workqueue:* -e irq:* -e syscalls:* wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 |
623 | Performance counter stats for 'wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2': | 623 | Performance counter stats for 'wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2': |
624 | 624 | ||
@@ -675,26 +675,26 @@ | |||
675 | . | 675 | . |
676 | 676 | ||
677 | 58.029710972 seconds time elapsed | 677 | 58.029710972 seconds time elapsed |
678 | </literallayout> | 678 | </literallayout> |
679 | Let's pick one of these tracepoints and tell perf to do a profile | 679 | Let's pick one of these tracepoints and tell perf to do a profile |
680 | using it as the sampling event: | 680 | using it as the sampling event: |
681 | <literallayout class='monospaced'> | 681 | <literallayout class='monospaced'> |
682 | root@crownbay:~# perf record -g -e sched:sched_wakeup wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 | 682 | root@crownbay:~# perf record -g -e sched:sched_wakeup wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 |
683 | </literallayout> | 683 | </literallayout> |
684 | </para> | 684 | </para> |
685 | 685 | ||
686 | <para> | 686 | <para> |
687 | <imagedata fileref="figures/sched-wakeup-profile.png" width="6in" depth="7in" align="center" scalefit="1" /> | 687 | <imagedata fileref="figures/sched-wakeup-profile.png" width="6in" depth="7in" align="center" scalefit="1" /> |
688 | </para> | 688 | </para> |
689 | 689 | ||
690 | <para> | 690 | <para> |
691 | The screenshot above shows the results of running a profile using | 691 | The screenshot above shows the results of running a profile using |
692 | sched:sched_switch tracepoint, which shows the relative costs of | 692 | sched:sched_switch tracepoint, which shows the relative costs of |
693 | various paths to sched_wakeup (note that sched_wakeup is the | 693 | various paths to sched_wakeup (note that sched_wakeup is the |
694 | name of the tracepoint - it's actually defined just inside | 694 | name of the tracepoint - it's actually defined just inside |
695 | ttwu_do_wakeup(), which accounts for the function name actually | 695 | ttwu_do_wakeup(), which accounts for the function name actually |
696 | displayed in the profile: | 696 | displayed in the profile: |
697 | <literallayout class='monospaced'> | 697 | <literallayout class='monospaced'> |
698 | /* | 698 | /* |
699 | * Mark the task runnable and perform wakeup-preemption. | 699 | * Mark the task runnable and perform wakeup-preemption. |
700 | */ | 700 | */ |
@@ -706,55 +706,55 @@ | |||
706 | . | 706 | . |
707 | . | 707 | . |
708 | } | 708 | } |
709 | </literallayout> | 709 | </literallayout> |
710 | A couple of the more interesting callchains are expanded and | 710 | A couple of the more interesting callchains are expanded and |
711 | displayed above, basically some network receive paths that | 711 | displayed above, basically some network receive paths that |
712 | presumably end up waking up wget (busybox) when network data is | 712 | presumably end up waking up wget (busybox) when network data is |
713 | ready. | 713 | ready. |
714 | </para> | 714 | </para> |
715 | 715 | ||
716 | <para> | 716 | <para> |
717 | Note that because tracepoints are normally used for tracing, | 717 | Note that because tracepoints are normally used for tracing, |
718 | the default sampling period for tracepoints is 1 i.e. for | 718 | the default sampling period for tracepoints is 1 i.e. for |
719 | tracepoints perf will sample on every event occurrence (this | 719 | tracepoints perf will sample on every event occurrence (this |
720 | can be changed using the -c option). This is in contrast to | 720 | can be changed using the -c option). This is in contrast to |
721 | hardware counters such as for example the default 'cycles' | 721 | hardware counters such as for example the default 'cycles' |
722 | hardware counter used for normal profiling, where sampling | 722 | hardware counter used for normal profiling, where sampling |
723 | periods are much higher (in the thousands) because profiling should | 723 | periods are much higher (in the thousands) because profiling should |
724 | have as low an overhead as possible and sampling on every cycle w | 724 | have as low an overhead as possible and sampling on every cycle |
725 | ould be prohibitively expensive. | 725 | would be prohibitively expensive. |
726 | </para> | 726 | </para> |
727 | </section> | 727 | </section> |
728 | 728 | ||
729 | <section id='using-perf-to-do-basic-tracing'> | 729 | <section id='using-perf-to-do-basic-tracing'> |
730 | <title>Using perf to do Basic Tracing</title> | 730 | <title>Using perf to do Basic Tracing</title> |
731 | 731 | ||
732 | <para> | 732 | <para> |
733 | Profiling is a great tool for solving many problems or for | 733 | Profiling is a great tool for solving many problems or for |
734 | getting a high-level view of what's going on with a workload or | 734 | getting a high-level view of what's going on with a workload or |
735 | across the system. It is however by definition an approximation, | 735 | across the system. It is however by definition an approximation, |
736 | as suggested by the most prominent word associated with it, | 736 | as suggested by the most prominent word associated with it, |
737 | 'sampling'. On the one hand, it allows a representative picture of | 737 | 'sampling'. On the one hand, it allows a representative picture of |
738 | what's going on in the system to be cheaply taken, but on the other | 738 | what's going on in the system to be cheaply taken, but on the other |
739 | hand, that cheapness limits its utility when that data suggests a | 739 | hand, that cheapness limits its utility when that data suggests a |
740 | need to 'dive down' more deeply to discover what's really going | 740 | need to 'dive down' more deeply to discover what's really going |
741 | on. In such cases, the only way to see what's really going on is | 741 | on. In such cases, the only way to see what's really going on is |
742 | to be able to look at (or summarize more intelligently) the | 742 | to be able to look at (or summarize more intelligently) the |
743 | individual steps that go into the higher-level behavior exposed | 743 | individual steps that go into the higher-level behavior exposed |
744 | by the coarse-grained profiling data. | 744 | by the coarse-grained profiling data. |
745 | </para> | 745 | </para> |
746 | 746 | ||
747 | <para> | 747 | <para> |
748 | As a concrete example, we can trace all the events we think might | 748 | As a concrete example, we can trace all the events we think might |
749 | be applicable to our workload: | 749 | be applicable to our workload: |
750 | <literallayout class='monospaced'> | 750 | <literallayout class='monospaced'> |
751 | root@crownbay:~# perf record -g -e skb:* -e net:* -e napi:* -e sched:sched_switch -e sched:sched_wakeup -e irq:* | 751 | root@crownbay:~# perf record -g -e skb:* -e net:* -e napi:* -e sched:sched_switch -e sched:sched_wakeup -e irq:* |
752 | -e syscalls:sys_enter_read -e syscalls:sys_exit_read -e syscalls:sys_enter_write -e syscalls:sys_exit_write | 752 | -e syscalls:sys_enter_read -e syscalls:sys_exit_read -e syscalls:sys_enter_write -e syscalls:sys_exit_write |
753 | wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 | 753 | wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 |
754 | </literallayout> | 754 | </literallayout> |
755 | We can look at the raw trace output using 'perf script' with no | 755 | We can look at the raw trace output using 'perf script' with no |
756 | arguments: | 756 | arguments: |
757 | <literallayout class='monospaced'> | 757 | <literallayout class='monospaced'> |
758 | root@crownbay:~# perf script | 758 | root@crownbay:~# perf script |
759 | 759 | ||
760 | perf 1262 [000] 11624.857082: sys_exit_read: 0x0 | 760 | perf 1262 [000] 11624.857082: sys_exit_read: 0x0 |
@@ -779,90 +779,90 @@ | |||
779 | wget 1262 [001] 11624.859888: sys_exit_read: 0x400 | 779 | wget 1262 [001] 11624.859888: sys_exit_read: 0x400 |
780 | wget 1262 [001] 11624.859935: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400 | 780 | wget 1262 [001] 11624.859935: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400 |
781 | wget 1262 [001] 11624.859944: sys_exit_read: 0x400 | 781 | wget 1262 [001] 11624.859944: sys_exit_read: 0x400 |
782 | </literallayout> | 782 | </literallayout> |
783 | This gives us a detailed timestamped sequence of events that | 783 | This gives us a detailed timestamped sequence of events that |
784 | occurred within the workload with respect to those events. | 784 | occurred within the workload with respect to those events. |
785 | </para> | 785 | </para> |
786 | 786 | ||
787 | <para> | 787 | <para> |
788 | In many ways, profiling can be viewed as a subset of tracing - | 788 | In many ways, profiling can be viewed as a subset of tracing - |
789 | theoretically, if you have a set of trace events that's sufficient | 789 | theoretically, if you have a set of trace events that's sufficient |
790 | to capture all the important aspects of a workload, you can derive | 790 | to capture all the important aspects of a workload, you can derive |
791 | any of the results or views that a profiling run can. | 791 | any of the results or views that a profiling run can. |
792 | </para> | 792 | </para> |
793 | 793 | ||
794 | <para> | 794 | <para> |
795 | Another aspect of traditional profiling is that while powerful in | 795 | Another aspect of traditional profiling is that while powerful in |
796 | many ways, it's limited by the granularity of the underlying data. | 796 | many ways, it's limited by the granularity of the underlying data. |
797 | Profiling tools offer various ways of sorting and presenting the | 797 | Profiling tools offer various ways of sorting and presenting the |
798 | sample data, which make it much more useful and amenable to user | 798 | sample data, which make it much more useful and amenable to user |
799 | experimentation, but in the end it can't be used in an open-ended | 799 | experimentation, but in the end it can't be used in an open-ended |
800 | way to extract data that just isn't present as a consequence of | 800 | way to extract data that just isn't present as a consequence of |
801 | the fact that conceptually, most of it has been thrown away. | 801 | the fact that conceptually, most of it has been thrown away. |
802 | </para> | 802 | </para> |
803 | 803 | ||
804 | <para> | 804 | <para> |
805 | Full-blown detailed tracing data does however offer the opportunity | 805 | Full-blown detailed tracing data does however offer the opportunity |
806 | to manipulate and present the information collected during a | 806 | to manipulate and present the information collected during a |
807 | tracing run in an infinite variety of ways. | 807 | tracing run in an infinite variety of ways. |
808 | </para> | 808 | </para> |
809 | 809 | ||
810 | <para> | 810 | <para> |
811 | Another way to look at it is that there are only so many ways that | 811 | Another way to look at it is that there are only so many ways that |
812 | the 'primitive' counters can be used on their own to generate | 812 | the 'primitive' counters can be used on their own to generate |
813 | interesting output; to get anything more complicated than simple | 813 | interesting output; to get anything more complicated than simple |
814 | counts requires some amount of additional logic, which is typically | 814 | counts requires some amount of additional logic, which is typically |
815 | very specific to the problem at hand. For example, if we wanted to | 815 | very specific to the problem at hand. For example, if we wanted to |
816 | make use of a 'counter' that maps to the value of the time | 816 | make use of a 'counter' that maps to the value of the time |
817 | difference between when a process was scheduled to run on a | 817 | difference between when a process was scheduled to run on a |
818 | processor and the time it actually ran, we wouldn't expect such | 818 | processor and the time it actually ran, we wouldn't expect such |
819 | a counter to exist on its own, but we could derive one called say | 819 | a counter to exist on its own, but we could derive one called say |
820 | 'wakeup_latency' and use it to extract a useful view of that metric | 820 | 'wakeup_latency' and use it to extract a useful view of that metric |
821 | from trace data. Likewise, we really can't figure out from standard | 821 | from trace data. Likewise, we really can't figure out from standard |
822 | profiling tools how much data every process on the system reads and | 822 | profiling tools how much data every process on the system reads and |
823 | writes, along with how many of those reads and writes fail | 823 | writes, along with how many of those reads and writes fail |
824 | completely. If we have sufficient trace data, however, we could | 824 | completely. If we have sufficient trace data, however, we could |
825 | with the right tools easily extract and present that information, | 825 | with the right tools easily extract and present that information, |
826 | but we'd need something other than pre-canned profiling tools to | 826 | but we'd need something other than pre-canned profiling tools to |
827 | do that. | 827 | do that. |
828 | </para> | 828 | </para> |
829 | 829 | ||
830 | <para> | 830 | <para> |
831 | Luckily, there is general-purpose way to handle such needs, | 831 | Luckily, there is general-purpose way to handle such needs, |
832 | called 'programming languages'. Making programming languages | 832 | called 'programming languages'. Making programming languages |
833 | easily available to apply to such problems given the specific | 833 | easily available to apply to such problems given the specific |
834 | format of data is called a 'programming language binding' for | 834 | format of data is called a 'programming language binding' for |
835 | that data and language. Perf supports two programming language | 835 | that data and language. Perf supports two programming language |
836 | bindings, one for Python and one for Perl. | 836 | bindings, one for Python and one for Perl. |
837 | </para> | 837 | </para> |
838 | 838 | ||
839 | <note> | 839 | <note> |
840 | Tying It Together: Language bindings for manipulating and | 840 | Tying It Together: Language bindings for manipulating and |
841 | aggregating trace data are of course not a new | 841 | aggregating trace data are of course not a new |
842 | idea. One of the first projects to do this was IBM's DProbes | 842 | idea. One of the first projects to do this was IBM's DProbes |
843 | dpcc compiler, an ANSI C compiler which targeted a low-level | 843 | dpcc compiler, an ANSI C compiler which targeted a low-level |
844 | assembly language running on an in-kernel interpreter on the | 844 | assembly language running on an in-kernel interpreter on the |
845 | target system. This is exactly analagous to what Sun's DTrace | 845 | target system. This is exactly analagous to what Sun's DTrace |
846 | did, except that DTrace invented its own language for the purpose. | 846 | did, except that DTrace invented its own language for the purpose. |
847 | Systemtap, heavily inspired by DTrace, also created its own | 847 | Systemtap, heavily inspired by DTrace, also created its own |
848 | one-off language, but rather than running the product on an | 848 | one-off language, but rather than running the product on an |
849 | in-kernel interpreter, created an elaborate compiler-based | 849 | in-kernel interpreter, created an elaborate compiler-based |
850 | machinery to translate its language into kernel modules written | 850 | machinery to translate its language into kernel modules written |
851 | in C. | 851 | in C. |
852 | </note> | 852 | </note> |
853 | 853 | ||
854 | <para> | 854 | <para> |
855 | Now that we have the trace data in perf.data, we can use | 855 | Now that we have the trace data in perf.data, we can use |
856 | 'perf script -g' to generate a skeleton script with handlers | 856 | 'perf script -g' to generate a skeleton script with handlers |
857 | for the read/write entry/exit events we recorded: | 857 | for the read/write entry/exit events we recorded: |
858 | <literallayout class='monospaced'> | 858 | <literallayout class='monospaced'> |
859 | root@crownbay:~# perf script -g python | 859 | root@crownbay:~# perf script -g python |
860 | generated Python script: perf-script.py | 860 | generated Python script: perf-script.py |
861 | </literallayout> | 861 | </literallayout> |
862 | The skeleton script simply creates a python function for each | 862 | The skeleton script simply creates a python function for each |
863 | event type in the perf.data file. The body of each function simply | 863 | event type in the perf.data file. The body of each function simply |
864 | prints the event name along with its parameters. For example: | 864 | prints the event name along with its parameters. For example: |
865 | <literallayout class='monospaced'> | 865 | <literallayout class='monospaced'> |
866 | def net__netif_rx(event_name, context, common_cpu, | 866 | def net__netif_rx(event_name, context, common_cpu, |
867 | common_secs, common_nsecs, common_pid, common_comm, | 867 | common_secs, common_nsecs, common_pid, common_comm, |
868 | skbaddr, len, name): | 868 | skbaddr, len, name): |
@@ -870,10 +870,10 @@ | |||
870 | common_pid, common_comm) | 870 | common_pid, common_comm) |
871 | 871 | ||
872 | print "skbaddr=%u, len=%u, name=%s\n" % (skbaddr, len, name), | 872 | print "skbaddr=%u, len=%u, name=%s\n" % (skbaddr, len, name), |
873 | </literallayout> | 873 | </literallayout> |
874 | We can run that script directly to print all of the events | 874 | We can run that script directly to print all of the events |
875 | contained in the perf.data file: | 875 | contained in the perf.data file: |
876 | <literallayout class='monospaced'> | 876 | <literallayout class='monospaced'> |
877 | root@crownbay:~# perf script -s perf-script.py | 877 | root@crownbay:~# perf script -s perf-script.py |
878 | 878 | ||
879 | in trace_begin | 879 | in trace_begin |
@@ -899,39 +899,39 @@ | |||
899 | syscalls__sys_exit_read 1 11624.859888770 1262 wget nr=3, ret=1024 | 899 | syscalls__sys_exit_read 1 11624.859888770 1262 wget nr=3, ret=1024 |
900 | syscalls__sys_enter_read 1 11624.859935140 1262 wget nr=3, fd=3, buf=3077701632, count=1024 | 900 | syscalls__sys_enter_read 1 11624.859935140 1262 wget nr=3, fd=3, buf=3077701632, count=1024 |
901 | syscalls__sys_exit_read 1 11624.859944032 1262 wget nr=3, ret=1024 | 901 | syscalls__sys_exit_read 1 11624.859944032 1262 wget nr=3, ret=1024 |
902 | </literallayout> | 902 | </literallayout> |
903 | That in itself isn't very useful; after all, we can accomplish | 903 | That in itself isn't very useful; after all, we can accomplish |
904 | pretty much the same thing by simply running 'perf script' | 904 | pretty much the same thing by simply running 'perf script' |
905 | without arguments in the same directory as the perf.data file. | 905 | without arguments in the same directory as the perf.data file. |
906 | </para> | 906 | </para> |
907 | 907 | ||
908 | <para> | 908 | <para> |
909 | We can however replace the print statements in the generated | 909 | We can however replace the print statements in the generated |
910 | function bodies with whatever we want, and thereby make it | 910 | function bodies with whatever we want, and thereby make it |
911 | infinitely more useful. | 911 | infinitely more useful. |
912 | </para> | 912 | </para> |
913 | 913 | ||
914 | <para> | 914 | <para> |
915 | As a simple example, let's just replace the print statements in | 915 | As a simple example, let's just replace the print statements in |
916 | the function bodies with a simple function that does nothing but | 916 | the function bodies with a simple function that does nothing but |
917 | increment a per-event count. When the program is run against a | 917 | increment a per-event count. When the program is run against a |
918 | perf.data file, each time a particular event is encountered, | 918 | perf.data file, each time a particular event is encountered, |
919 | a tally is incremented for that event. For example: | 919 | a tally is incremented for that event. For example: |
920 | <literallayout class='monospaced'> | 920 | <literallayout class='monospaced'> |
921 | def net__netif_rx(event_name, context, common_cpu, | 921 | def net__netif_rx(event_name, context, common_cpu, |
922 | common_secs, common_nsecs, common_pid, common_comm, | 922 | common_secs, common_nsecs, common_pid, common_comm, |
923 | skbaddr, len, name): | 923 | skbaddr, len, name): |
924 | inc_counts(event_name) | 924 | inc_counts(event_name) |
925 | </literallayout> | 925 | </literallayout> |
926 | Each event handler function in the generated code is modified | 926 | Each event handler function in the generated code is modified |
927 | to do this. For convenience, we define a common function called | 927 | to do this. For convenience, we define a common function called |
928 | inc_counts() that each handler calls; inc_counts simply tallies | 928 | inc_counts() that each handler calls; inc_counts simply tallies |
929 | a count for each event using the 'counts' hash, which is a | 929 | a count for each event using the 'counts' hash, which is a |
930 | specialized has function that does Perl-like autovivification, a | 930 | specialized has function that does Perl-like autovivification, a |
931 | capability that's extremely useful for kinds of multi-level | 931 | capability that's extremely useful for kinds of multi-level |
932 | aggregation commonly used in processing traces (see perf's | 932 | aggregation commonly used in processing traces (see perf's |
933 | documentation on the Python language binding for details): | 933 | documentation on the Python language binding for details): |
934 | <literallayout class='monospaced'> | 934 | <literallayout class='monospaced'> |
935 | counts = autodict() | 935 | counts = autodict() |
936 | 936 | ||
937 | def inc_counts(event_name): | 937 | def inc_counts(event_name): |
@@ -939,18 +939,18 @@ | |||
939 | counts[event_name] += 1 | 939 | counts[event_name] += 1 |
940 | except TypeError: | 940 | except TypeError: |
941 | counts[event_name] = 1 | 941 | counts[event_name] = 1 |
942 | </literallayout> | 942 | </literallayout> |
943 | Finally, at the end of the trace processing run, we want to | 943 | Finally, at the end of the trace processing run, we want to |
944 | print the result of all the per-event tallies. For that, we | 944 | print the result of all the per-event tallies. For that, we |
945 | use the special 'trace_end()' function: | 945 | use the special 'trace_end()' function: |
946 | <literallayout class='monospaced'> | 946 | <literallayout class='monospaced'> |
947 | def trace_end(): | 947 | def trace_end(): |
948 | for event_name, count in counts.iteritems(): | 948 | for event_name, count in counts.iteritems(): |
949 | print "%-40s %10s\n" % (event_name, count) | 949 | print "%-40s %10s\n" % (event_name, count) |
950 | </literallayout> | 950 | </literallayout> |
951 | The end result is a summary of all the events recorded in the | 951 | The end result is a summary of all the events recorded in the |
952 | trace: | 952 | trace: |
953 | <literallayout> | 953 | <literallayout> |
954 | skb__skb_copy_datagram_iovec 13148 | 954 | skb__skb_copy_datagram_iovec 13148 |
955 | irq__softirq_entry 4796 | 955 | irq__softirq_entry 4796 |
956 | irq__irq_handler_exit 3805 | 956 | irq__irq_handler_exit 3805 |
@@ -968,112 +968,112 @@ | |||
968 | net__netif_rx 2 | 968 | net__netif_rx 2 |
969 | napi__napi_poll 1877 | 969 | napi__napi_poll 1877 |
970 | syscalls__sys_exit_write 8990 | 970 | syscalls__sys_exit_write 8990 |
971 | </literallayout> | 971 | </literallayout> |
972 | Note that this is pretty much exactly the same information we get | 972 | Note that this is pretty much exactly the same information we get |
973 | from 'perf stat', which goes a little way to support the idea | 973 | from 'perf stat', which goes a little way to support the idea |
974 | mentioned previously that given the right kind of trace data, | 974 | mentioned previously that given the right kind of trace data, |
975 | higher-level profiling-type summaries can be derived from it. | 975 | higher-level profiling-type summaries can be derived from it. |
976 | </para> | 976 | </para> |
977 | 977 | ||
978 | <para> | 978 | <para> |
979 | Documentation on using the | 979 | Documentation on using the |
980 | <ulink url='http://linux.die.net/man/1/perf-script-python'>'perf script' python binding</ulink>. | 980 | <ulink url='http://linux.die.net/man/1/perf-script-python'>'perf script' python binding</ulink>. |
981 | </para> | 981 | </para> |
982 | </section> | 982 | </section> |
983 | 983 | ||
984 | <section id='system-wide-tracing-and-profiling'> | 984 | <section id='system-wide-tracing-and-profiling'> |
985 | <title>System-Wide Tracing and Profiling</title> | 985 | <title>System-Wide Tracing and Profiling</title> |
986 | 986 | ||
987 | <para> | 987 | <para> |
988 | The examples so far have focused on tracing a particular program or | 988 | The examples so far have focused on tracing a particular program or |
989 | workload - in other words, every profiling run has specified the | 989 | workload - in other words, every profiling run has specified the |
990 | program to profile in the command-line e.g. 'perf record wget ...'. | 990 | program to profile in the command-line e.g. 'perf record wget ...'. |
991 | </para> | 991 | </para> |
992 | 992 | ||
993 | <para> | 993 | <para> |
994 | It's also possible, and more interesting in many cases, to run a | 994 | It's also possible, and more interesting in many cases, to run a |
995 | system-wide profile or trace while running the workload in a | 995 | system-wide profile or trace while running the workload in a |
996 | separate shell. | 996 | separate shell. |
997 | </para> | 997 | </para> |
998 | 998 | ||
999 | <para> | 999 | <para> |
1000 | To do system-wide profiling or tracing, you typically use | 1000 | To do system-wide profiling or tracing, you typically use |
1001 | the -a flag to 'perf record'. | 1001 | the -a flag to 'perf record'. |
1002 | </para> | 1002 | </para> |
1003 | 1003 | ||
1004 | <para> | 1004 | <para> |
1005 | To demonstrate this, open up one window and start the profile | 1005 | To demonstrate this, open up one window and start the profile |
1006 | using the -a flag (press Ctrl-C to stop tracing): | 1006 | using the -a flag (press Ctrl-C to stop tracing): |
1007 | <literallayout class='monospaced'> | 1007 | <literallayout class='monospaced'> |
1008 | root@crownbay:~# perf record -g -a | 1008 | root@crownbay:~# perf record -g -a |
1009 | ^C[ perf record: Woken up 6 times to write data ] | 1009 | ^C[ perf record: Woken up 6 times to write data ] |
1010 | [ perf record: Captured and wrote 1.400 MB perf.data (~61172 samples) ] | 1010 | [ perf record: Captured and wrote 1.400 MB perf.data (~61172 samples) ] |
1011 | </literallayout> | 1011 | </literallayout> |
1012 | In another window, run the wget test: | 1012 | In another window, run the wget test: |
1013 | <literallayout class='monospaced'> | 1013 | <literallayout class='monospaced'> |
1014 | root@crownbay:~# wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink> | 1014 | root@crownbay:~# wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink> |
1015 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) | 1015 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) |
1016 | linux-2.6.19.2.tar.b 100% |*******************************| 41727k 0:00:00 ETA | 1016 | linux-2.6.19.2.tar.b 100% |*******************************| 41727k 0:00:00 ETA |
1017 | </literallayout> | 1017 | </literallayout> |
1018 | Here we see entries not only for our wget load, but for other | 1018 | Here we see entries not only for our wget load, but for other |
1019 | processes running on the system as well: | 1019 | processes running on the system as well: |
1020 | </para> | 1020 | </para> |
1021 | 1021 | ||
1022 | <para> | 1022 | <para> |
1023 | <imagedata fileref="figures/perf-systemwide.png" width="6in" depth="7in" align="center" scalefit="1" /> | 1023 | <imagedata fileref="figures/perf-systemwide.png" width="6in" depth="7in" align="center" scalefit="1" /> |
1024 | </para> | 1024 | </para> |
1025 | 1025 | ||
1026 | <para> | 1026 | <para> |
1027 | In the snapshot above, we can see callchains that originate in | 1027 | In the snapshot above, we can see callchains that originate in |
1028 | libc, and a callchain from Xorg that demonstrates that we're | 1028 | libc, and a callchain from Xorg that demonstrates that we're |
1029 | using a proprietary X driver in userspace (notice the presence | 1029 | using a proprietary X driver in userspace (notice the presence |
1030 | of 'PVR' and some other unresolvable symbols in the expanded | 1030 | of 'PVR' and some other unresolvable symbols in the expanded |
1031 | Xorg callchain). | 1031 | Xorg callchain). |
1032 | </para> | 1032 | </para> |
1033 | 1033 | ||
1034 | <para> | 1034 | <para> |
1035 | Note also that we have both kernel and userspace entries in the | 1035 | Note also that we have both kernel and userspace entries in the |
1036 | above snapshot. We can also tell perf to focus on userspace but | 1036 | above snapshot. We can also tell perf to focus on userspace but |
1037 | providing a modifier, in this case 'u', to the 'cycles' hardware | 1037 | providing a modifier, in this case 'u', to the 'cycles' hardware |
1038 | counter when we record a profile: | 1038 | counter when we record a profile: |
1039 | <literallayout class='monospaced'> | 1039 | <literallayout class='monospaced'> |
1040 | root@crownbay:~# perf record -g -a -e cycles:u | 1040 | root@crownbay:~# perf record -g -a -e cycles:u |
1041 | ^C[ perf record: Woken up 2 times to write data ] | 1041 | ^C[ perf record: Woken up 2 times to write data ] |
1042 | [ perf record: Captured and wrote 0.376 MB perf.data (~16443 samples) ] | 1042 | [ perf record: Captured and wrote 0.376 MB perf.data (~16443 samples) ] |
1043 | </literallayout> | 1043 | </literallayout> |
1044 | </para> | 1044 | </para> |
1045 | 1045 | ||
1046 | <para> | 1046 | <para> |
1047 | <imagedata fileref="figures/perf-report-cycles-u.png" width="6in" depth="7in" align="center" scalefit="1" /> | 1047 | <imagedata fileref="figures/perf-report-cycles-u.png" width="6in" depth="7in" align="center" scalefit="1" /> |
1048 | </para> | 1048 | </para> |
1049 | 1049 | ||
1050 | <para> | 1050 | <para> |
1051 | Notice in the screenshot above, we see only userspace entries ([.]) | 1051 | Notice in the screenshot above, we see only userspace entries ([.]) |
1052 | </para> | 1052 | </para> |
1053 | 1053 | ||
1054 | <para> | 1054 | <para> |
1055 | Finally, we can press 'enter' on a leaf node and select the 'Zoom | 1055 | Finally, we can press 'enter' on a leaf node and select the 'Zoom |
1056 | into DSO' menu item to show only entries associated with a | 1056 | into DSO' menu item to show only entries associated with a |
1057 | specific DSO. In the screenshot below, we've zoomed into the | 1057 | specific DSO. In the screenshot below, we've zoomed into the |
1058 | 'libc' DSO which shows all the entries associated with the | 1058 | 'libc' DSO which shows all the entries associated with the |
1059 | libc-xxx.so DSO. | 1059 | libc-xxx.so DSO. |
1060 | </para> | 1060 | </para> |
1061 | 1061 | ||
1062 | <para> | 1062 | <para> |
1063 | <imagedata fileref="figures/perf-systemwide-libc.png" width="6in" depth="7in" align="center" scalefit="1" /> | 1063 | <imagedata fileref="figures/perf-systemwide-libc.png" width="6in" depth="7in" align="center" scalefit="1" /> |
1064 | </para> | 1064 | </para> |
1065 | 1065 | ||
1066 | <para> | 1066 | <para> |
1067 | We can also use the system-wide -a switch to do system-wide | 1067 | We can also use the system-wide -a switch to do system-wide |
1068 | tracing. Here we'll trace a couple of scheduler events: | 1068 | tracing. Here we'll trace a couple of scheduler events: |
1069 | <literallayout class='monospaced'> | 1069 | <literallayout class='monospaced'> |
1070 | root@crownbay:~# perf record -a -e sched:sched_switch -e sched:sched_wakeup | 1070 | root@crownbay:~# perf record -a -e sched:sched_switch -e sched:sched_wakeup |
1071 | ^C[ perf record: Woken up 38 times to write data ] | 1071 | ^C[ perf record: Woken up 38 times to write data ] |
1072 | [ perf record: Captured and wrote 9.780 MB perf.data (~427299 samples) ] | 1072 | [ perf record: Captured and wrote 9.780 MB perf.data (~427299 samples) ] |
1073 | </literallayout> | 1073 | </literallayout> |
1074 | We can look at the raw output using 'perf script' with no | 1074 | We can look at the raw output using 'perf script' with no |
1075 | arguments: | 1075 | arguments: |
1076 | <literallayout class='monospaced'> | 1076 | <literallayout class='monospaced'> |
1077 | root@crownbay:~# perf script | 1077 | root@crownbay:~# perf script |
1078 | 1078 | ||
1079 | perf 1383 [001] 6171.460045: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 | 1079 | perf 1383 [001] 6171.460045: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 |
@@ -1086,21 +1086,21 @@ | |||
1086 | perf 1383 [001] 6171.470058: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120 | 1086 | perf 1383 [001] 6171.470058: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120 |
1087 | kworker/1:1 21 [001] 6171.470082: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120 | 1087 | kworker/1:1 21 [001] 6171.470082: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120 |
1088 | perf 1383 [001] 6171.480035: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 | 1088 | perf 1383 [001] 6171.480035: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 |
1089 | </literallayout> | 1089 | </literallayout> |
1090 | </para> | 1090 | </para> |
1091 | |||
1092 | <section id='perf-filtering'> | ||
1093 | <title>Filtering</title> | ||
1094 | 1091 | ||
1095 | <para> | 1092 | <section id='perf-filtering'> |
1096 | Notice that there are a lot of events that don't really have | 1093 | <title>Filtering</title> |
1097 | anything to do with what we're interested in, namely events | 1094 | |
1098 | that schedule 'perf' itself in and out or that wake perf up. | 1095 | <para> |
1099 | We can get rid of those by using the '--filter' option - | 1096 | Notice that there are a lot of events that don't really have |
1100 | for each event we specify using -e, we can add a --filter | 1097 | anything to do with what we're interested in, namely events |
1101 | after that to filter out trace events that contain fields | 1098 | that schedule 'perf' itself in and out or that wake perf up. |
1102 | with specific values: | 1099 | We can get rid of those by using the '--filter' option - |
1103 | <literallayout class='monospaced'> | 1100 | for each event we specify using -e, we can add a --filter |
1101 | after that to filter out trace events that contain fields | ||
1102 | with specific values: | ||
1103 | <literallayout class='monospaced'> | ||
1104 | root@crownbay:~# perf record -a -e sched:sched_switch --filter 'next_comm != perf && prev_comm != perf' -e sched:sched_wakeup --filter 'comm != perf' | 1104 | root@crownbay:~# perf record -a -e sched:sched_switch --filter 'next_comm != perf && prev_comm != perf' -e sched:sched_wakeup --filter 'comm != perf' |
1105 | ^C[ perf record: Woken up 38 times to write data ] | 1105 | ^C[ perf record: Woken up 38 times to write data ] |
1106 | [ perf record: Captured and wrote 9.688 MB perf.data (~423279 samples) ] | 1106 | [ perf record: Captured and wrote 9.688 MB perf.data (~423279 samples) ] |
@@ -1122,59 +1122,59 @@ | |||
1122 | swapper 0 [000] 7932.326109: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000 | 1122 | swapper 0 [000] 7932.326109: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000 |
1123 | swapper 0 [000] 7932.326171: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120 | 1123 | swapper 0 [000] 7932.326171: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120 |
1124 | kworker/0:3 1209 [000] 7932.326214: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 | 1124 | kworker/0:3 1209 [000] 7932.326214: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 |
1125 | </literallayout> | 1125 | </literallayout> |
1126 | In this case, we've filtered out all events that have 'perf' | 1126 | In this case, we've filtered out all events that have 'perf' |
1127 | in their 'comm' or 'comm_prev' or 'comm_next' fields. Notice | 1127 | in their 'comm' or 'comm_prev' or 'comm_next' fields. Notice |
1128 | that there are still events recorded for perf, but notice | 1128 | that there are still events recorded for perf, but notice |
1129 | that those events don't have values of 'perf' for the filtered | 1129 | that those events don't have values of 'perf' for the filtered |
1130 | fields. To completely filter out anything from perf will | 1130 | fields. To completely filter out anything from perf will |
1131 | require a bit more work, but for the purpose of demonstrating | 1131 | require a bit more work, but for the purpose of demonstrating |
1132 | how to use filters, it's close enough. | 1132 | how to use filters, it's close enough. |
1133 | </para> | 1133 | </para> |
1134 | 1134 | ||
1135 | <note> | 1135 | <note> |
1136 | Tying It Together: These are exactly the same set of event | 1136 | Tying It Together: These are exactly the same set of event |
1137 | filters defined by the trace event subsystem. See the | 1137 | filters defined by the trace event subsystem. See the |
1138 | ftrace/tracecmd/kernelshark section for more discussion about | 1138 | ftrace/tracecmd/kernelshark section for more discussion about |
1139 | these event filters. | 1139 | these event filters. |
1140 | </note> | 1140 | </note> |
1141 | 1141 | ||
1142 | <note> | 1142 | <note> |
1143 | Tying It Together: These event filters are implemented by a | 1143 | Tying It Together: These event filters are implemented by a |
1144 | special-purpose pseudo-interpreter in the kernel and are an | 1144 | special-purpose pseudo-interpreter in the kernel and are an |
1145 | integral and indispensable part of the perf design as it | 1145 | integral and indispensable part of the perf design as it |
1146 | relates to tracing. kernel-based event filters provide a | 1146 | relates to tracing. kernel-based event filters provide a |
1147 | mechanism to precisely throttle the event stream that appears | 1147 | mechanism to precisely throttle the event stream that appears |
1148 | in user space, where it makes sense to provide bindings to real | 1148 | in user space, where it makes sense to provide bindings to real |
1149 | programming languages for postprocessing the event stream. | 1149 | programming languages for postprocessing the event stream. |
1150 | This architecture allows for the intelligent and flexible | 1150 | This architecture allows for the intelligent and flexible |
1151 | partitioning of processing between the kernel and user space. | 1151 | partitioning of processing between the kernel and user space. |
1152 | Contrast this with other tools such as SystemTap, which does | 1152 | Contrast this with other tools such as SystemTap, which does |
1153 | all of its processing in the kernel and as such requires a | 1153 | all of its processing in the kernel and as such requires a |
1154 | special project-defined language in order to accommodate that | 1154 | special project-defined language in order to accommodate that |
1155 | design, or LTTng, where everything is sent to userspace and | 1155 | design, or LTTng, where everything is sent to userspace and |
1156 | as such requires a super-efficient kernel-to-userspace | 1156 | as such requires a super-efficient kernel-to-userspace |
1157 | transport mechanism in order to function properly. While | 1157 | transport mechanism in order to function properly. While |
1158 | perf certainly can benefit from for instance advances in | 1158 | perf certainly can benefit from for instance advances in |
1159 | the design of the transport, it doesn't fundamentally depend | 1159 | the design of the transport, it doesn't fundamentally depend |
1160 | on them. Basically, if you find that your perf tracing | 1160 | on them. Basically, if you find that your perf tracing |
1161 | application is causing buffer I/O overruns, it probably | 1161 | application is causing buffer I/O overruns, it probably |
1162 | means that you aren't taking enough advantage of the | 1162 | means that you aren't taking enough advantage of the |
1163 | kernel filtering engine. | 1163 | kernel filtering engine. |
1164 | </note> | 1164 | </note> |
1165 | </section> | ||
1165 | </section> | 1166 | </section> |
1166 | </section> | ||
1167 | 1167 | ||
1168 | <section id='using-dynamic-tracepoints'> | 1168 | <section id='using-dynamic-tracepoints'> |
1169 | <title>Using Dynamic Tracepoints</title> | 1169 | <title>Using Dynamic Tracepoints</title> |
1170 | 1170 | ||
1171 | <para> | 1171 | <para> |
1172 | perf isn't restricted to the fixed set of static tracepoints | 1172 | perf isn't restricted to the fixed set of static tracepoints |
1173 | listed by 'perf list'. Users can also add their own 'dynamic' | 1173 | listed by 'perf list'. Users can also add their own 'dynamic' |
1174 | tracepoints anywhere in the kernel. For instance, suppose we | 1174 | tracepoints anywhere in the kernel. For instance, suppose we |
1175 | want to define our own tracepoint on do_fork(). We can do that | 1175 | want to define our own tracepoint on do_fork(). We can do that |
1176 | using the 'perf probe' perf subcommand: | 1176 | using the 'perf probe' perf subcommand: |
1177 | <literallayout class='monospaced'> | 1177 | <literallayout class='monospaced'> |
1178 | root@crownbay:~# perf probe do_fork | 1178 | root@crownbay:~# perf probe do_fork |
1179 | Added new event: | 1179 | Added new event: |
1180 | probe:do_fork (on do_fork) | 1180 | probe:do_fork (on do_fork) |
@@ -1182,13 +1182,13 @@ | |||
1182 | You can now use it in all perf tools, such as: | 1182 | You can now use it in all perf tools, such as: |
1183 | 1183 | ||
1184 | perf record -e probe:do_fork -aR sleep 1 | 1184 | perf record -e probe:do_fork -aR sleep 1 |
1185 | </literallayout> | 1185 | </literallayout> |
1186 | Adding a new tracepoint via 'perf probe' results in an event | 1186 | Adding a new tracepoint via 'perf probe' results in an event |
1187 | with all the expected files and format in | 1187 | with all the expected files and format in |
1188 | /sys/kernel/debug/tracing/events, just the same as for static | 1188 | /sys/kernel/debug/tracing/events, just the same as for static |
1189 | tracepoints (as discussed in more detail in the trace events | 1189 | tracepoints (as discussed in more detail in the trace events |
1190 | subsystem section: | 1190 | subsystem section: |
1191 | <literallayout class='monospaced'> | 1191 | <literallayout class='monospaced'> |
1192 | root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# ls -al | 1192 | root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# ls -al |
1193 | drwxr-xr-x 2 root root 0 Oct 28 11:42 . | 1193 | drwxr-xr-x 2 root root 0 Oct 28 11:42 . |
1194 | drwxr-xr-x 3 root root 0 Oct 28 11:42 .. | 1194 | drwxr-xr-x 3 root root 0 Oct 28 11:42 .. |
@@ -1210,23 +1210,23 @@ | |||
1210 | field:unsigned long __probe_ip; offset:12; size:4; signed:0; | 1210 | field:unsigned long __probe_ip; offset:12; size:4; signed:0; |
1211 | 1211 | ||
1212 | print fmt: "(%lx)", REC->__probe_ip | 1212 | print fmt: "(%lx)", REC->__probe_ip |
1213 | </literallayout> | 1213 | </literallayout> |
1214 | We can list all dynamic tracepoints currently in existence: | 1214 | We can list all dynamic tracepoints currently in existence: |
1215 | <literallayout class='monospaced'> | 1215 | <literallayout class='monospaced'> |
1216 | root@crownbay:~# perf probe -l | 1216 | root@crownbay:~# perf probe -l |
1217 | probe:do_fork (on do_fork) | 1217 | probe:do_fork (on do_fork) |
1218 | probe:schedule (on schedule) | 1218 | probe:schedule (on schedule) |
1219 | </literallayout> | 1219 | </literallayout> |
1220 | Let's record system-wide ('sleep 30' is a trick for recording | 1220 | Let's record system-wide ('sleep 30' is a trick for recording |
1221 | system-wide but basically do nothing and then wake up after | 1221 | system-wide but basically do nothing and then wake up after |
1222 | 30 seconds): | 1222 | 30 seconds): |
1223 | <literallayout class='monospaced'> | 1223 | <literallayout class='monospaced'> |
1224 | root@crownbay:~# perf record -g -a -e probe:do_fork sleep 30 | 1224 | root@crownbay:~# perf record -g -a -e probe:do_fork sleep 30 |
1225 | [ perf record: Woken up 1 times to write data ] | 1225 | [ perf record: Woken up 1 times to write data ] |
1226 | [ perf record: Captured and wrote 0.087 MB perf.data (~3812 samples) ] | 1226 | [ perf record: Captured and wrote 0.087 MB perf.data (~3812 samples) ] |
1227 | </literallayout> | 1227 | </literallayout> |
1228 | Using 'perf script' we can see each do_fork event that fired: | 1228 | Using 'perf script' we can see each do_fork event that fired: |
1229 | <literallayout class='monospaced'> | 1229 | <literallayout class='monospaced'> |
1230 | root@crownbay:~# perf script | 1230 | root@crownbay:~# perf script |
1231 | 1231 | ||
1232 | # ======== | 1232 | # ======== |
@@ -1265,27 +1265,28 @@ | |||
1265 | matchbox-deskto 1197 [001] 34237.112138: do_fork: (c1028460) | 1265 | matchbox-deskto 1197 [001] 34237.112138: do_fork: (c1028460) |
1266 | matchbox-deskto 1311 [001] 34237.114106: do_fork: (c1028460) | 1266 | matchbox-deskto 1311 [001] 34237.114106: do_fork: (c1028460) |
1267 | gaku 1312 [000] 34237.202388: do_fork: (c1028460) | 1267 | gaku 1312 [000] 34237.202388: do_fork: (c1028460) |
1268 | </literallayout> | 1268 | </literallayout> |
1269 | And using 'perf report' on the same file, we can see the | 1269 | And using 'perf report' on the same file, we can see the |
1270 | callgraphs from starting a few programs during those 30 seconds: | 1270 | callgraphs from starting a few programs during those 30 seconds: |
1271 | </para> | 1271 | </para> |
1272 | 1272 | ||
1273 | <para> | 1273 | <para> |
1274 | <imagedata fileref="figures/perf-probe-do_fork-profile.png" width="6in" depth="7in" align="center" scalefit="1" /> | 1274 | <imagedata fileref="figures/perf-probe-do_fork-profile.png" width="6in" depth="7in" align="center" scalefit="1" /> |
1275 | </para> | 1275 | </para> |
1276 | 1276 | ||
1277 | <note> | 1277 | <note> |
1278 | Tying It Together: The trace events subsystem accomodate static | 1278 | Tying It Together: The trace events subsystem accomodate static |
1279 | and dynamic tracepoints in exactly the same way - there's no | 1279 | and dynamic tracepoints in exactly the same way - there's no |
1280 | difference as far as the infrastructure is concerned. See the | 1280 | difference as far as the infrastructure is concerned. See the |
1281 | ftrace section for more details on the trace event subsystem. | 1281 | ftrace section for more details on the trace event subsystem. |
1282 | </note> | 1282 | </note> |
1283 | 1283 | ||
1284 | <note> | 1284 | <note> |
1285 | Tying It Together: Dynamic tracepoints are implemented under the | 1285 | Tying It Together: Dynamic tracepoints are implemented under the |
1286 | covers by kprobes and uprobes. kprobes and uprobes are also used | 1286 | covers by kprobes and uprobes. kprobes and uprobes are also used |
1287 | by and in fact are the main focus of SystemTap. | 1287 | by and in fact are the main focus of SystemTap. |
1288 | </note> | 1288 | </note> |
1289 | </section> | ||
1289 | </section> | 1290 | </section> |
1290 | 1291 | ||
1291 | <section id='perf-documentation'> | 1292 | <section id='perf-documentation'> |