diff options
| author | Scott Rifenbark <scott.m.rifenbark@intel.com> | 2013-01-16 12:49:45 -0800 |
|---|---|---|
| committer | Richard Purdie <richard.purdie@linuxfoundation.org> | 2013-01-27 13:56:02 +0000 |
| commit | 86478ff9486516ed428e470afbe76f3cd857888c (patch) | |
| tree | b883e2138166f3a9daf986931779a8f4058169bf /documentation/profile-manual | |
| parent | 487a76f7c7bfdb47ddec76659794834e84ed04eb (diff) | |
| download | poky-86478ff9486516ed428e470afbe76f3cd857888c.tar.gz | |
profile-manual: Updated indentation and sectioning
Had the sectioning off and it was preventing the manual from
making. I also added more indentation as needed throughout
most of the manual up to the error.
(From yocto-docs rev: 5de3544593a63e96b349babc177970d8605e0098)
Signed-off-by: Scott Rifenbark <scott.m.rifenbark@intel.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Diffstat (limited to 'documentation/profile-manual')
| -rw-r--r-- | documentation/profile-manual/profile-manual-usage.xml | 1257 |
1 files changed, 629 insertions, 628 deletions
diff --git a/documentation/profile-manual/profile-manual-usage.xml b/documentation/profile-manual/profile-manual-usage.xml index cfe916a795..51c483b700 100644 --- a/documentation/profile-manual/profile-manual-usage.xml +++ b/documentation/profile-manual/profile-manual-usage.xml | |||
| @@ -108,28 +108,28 @@ | |||
| 108 | top System profiling tool. | 108 | top System profiling tool. |
| 109 | 109 | ||
| 110 | See 'perf help COMMAND' for more information on a specific command. | 110 | See 'perf help COMMAND' for more information on a specific command. |
| 111 | </literallayout> | 111 | </literallayout> |
| 112 | </para> | 112 | </para> |
| 113 | 113 | ||
| 114 | <section id='using-perf-to-do-basic-profiling'> | 114 | <section id='using-perf-to-do-basic-profiling'> |
| 115 | <title>Using perf to do Basic Profiling</title> | 115 | <title>Using perf to do Basic Profiling</title> |
| 116 | 116 | ||
| 117 | <para> | 117 | <para> |
| 118 | As a simple test case, we'll profile the 'wget' of a fairly large | 118 | As a simple test case, we'll profile the 'wget' of a fairly large |
| 119 | file, which is a minimally interesting case because it has both | 119 | file, which is a minimally interesting case because it has both |
| 120 | file and network I/O aspects, and at least in the case of standard | 120 | file and network I/O aspects, and at least in the case of standard |
| 121 | Yocto images, it's implemented as part of busybox, so the methods | 121 | Yocto images, it's implemented as part of busybox, so the methods |
| 122 | we use to analyze it can be used in a very similar way to the whole | 122 | we use to analyze it can be used in a very similar way to the whole |
| 123 | host of supported busybox applets in Yocto. | 123 | host of supported busybox applets in Yocto. |
| 124 | <literallayout class='monospaced'> | 124 | <literallayout class='monospaced'> |
| 125 | root@crownbay:~# rm linux-2.6.19.2.tar.bz2; \ | 125 | root@crownbay:~# rm linux-2.6.19.2.tar.bz2; \ |
| 126 | wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 | 126 | wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 |
| 127 | </literallayout> | 127 | </literallayout> |
| 128 | The quickest and easiest way to get some basic overall data about | 128 | The quickest and easiest way to get some basic overall data about |
| 129 | what's going on for a particular workload it to profile it using | 129 | what's going on for a particular workload it to profile it using |
| 130 | 'perf stat'. 'perf stat' basically profiles using a few default | 130 | 'perf stat'. 'perf stat' basically profiles using a few default |
| 131 | counters and displays the summed counts at the end of the run: | 131 | counters and displays the summed counts at the end of the run: |
| 132 | <literallayout class='monospaced'> | 132 | <literallayout class='monospaced'> |
| 133 | root@crownbay:~# perf stat wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 | 133 | root@crownbay:~# perf stat wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 |
| 134 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) | 134 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) |
| 135 | linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA | 135 | linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA |
| @@ -148,19 +148,19 @@ | |||
| 148 | 19550329 branch-misses # 11.82% of all branches | 148 | 19550329 branch-misses # 11.82% of all branches |
| 149 | 149 | ||
| 150 | 59.836627620 seconds time elapsed | 150 | 59.836627620 seconds time elapsed |
| 151 | </literallayout> | 151 | </literallayout> |
| 152 | Many times such a simple-minded test doesn't yield much of | 152 | Many times such a simple-minded test doesn't yield much of |
| 153 | interest, but sometimes it does (see Real-world Yocto bug | 153 | interest, but sometimes it does (see Real-world Yocto bug |
| 154 | (slow loop-mounted write speed)). | 154 | (slow loop-mounted write speed)). |
| 155 | </para> | 155 | </para> |
| 156 | 156 | ||
| 157 | <para> | 157 | <para> |
| 158 | Also, note that 'perf stat' isn't restricted to a fixed set of | 158 | Also, note that 'perf stat' isn't restricted to a fixed set of |
| 159 | counters - basically any event listed in the output of 'perf list' | 159 | counters - basically any event listed in the output of 'perf list' |
| 160 | can be tallied by 'perf stat'. For example, suppose we wanted to | 160 | can be tallied by 'perf stat'. For example, suppose we wanted to |
| 161 | see a summary of all the events related to kernel memory | 161 | see a summary of all the events related to kernel memory |
| 162 | allocation/freeing along with cache hits and misses: | 162 | allocation/freeing along with cache hits and misses: |
| 163 | <literallayout class='monospaced'> | 163 | <literallayout class='monospaced'> |
| 164 | root@crownbay:~# perf stat -e kmem:* -e cache-references -e cache-misses wget http:// downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 | 164 | root@crownbay:~# perf stat -e kmem:* -e cache-references -e cache-misses wget http:// downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 |
| 165 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) | 165 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) |
| 166 | linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA | 166 | linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA |
| @@ -183,28 +183,28 @@ | |||
| 183 | 2917740 cache-misses # 4.365 % of all cache refs | 183 | 2917740 cache-misses # 4.365 % of all cache refs |
| 184 | 184 | ||
| 185 | 44.831023415 seconds time elapsed | 185 | 44.831023415 seconds time elapsed |
| 186 | </literallayout> | 186 | </literallayout> |
| 187 | So 'perf stat' gives us a nice easy way to get a quick overview of | 187 | So 'perf stat' gives us a nice easy way to get a quick overview of |
| 188 | what might be happening for a set of events, but normally we'd | 188 | what might be happening for a set of events, but normally we'd |
| 189 | need a little more detail in order to understand what's going on | 189 | need a little more detail in order to understand what's going on |
| 190 | in a way that we can act on in a useful way. | 190 | in a way that we can act on in a useful way. |
| 191 | </para> | 191 | </para> |
| 192 | 192 | ||
| 193 | <para> | 193 | <para> |
| 194 | To dive down into a next level of detail, we can use 'perf | 194 | To dive down into a next level of detail, we can use 'perf |
| 195 | record'/'perf report' which will collect profiling data and | 195 | record'/'perf report' which will collect profiling data and |
| 196 | present it to use using an interactive text-based UI (or | 196 | present it to use using an interactive text-based UI (or |
| 197 | simply as text if we specify --stdio to 'perf report'). | 197 | simply as text if we specify --stdio to 'perf report'). |
| 198 | </para> | 198 | </para> |
| 199 | 199 | ||
| 200 | <para> | 200 | <para> |
| 201 | As our first attempt at profiling this workload, we'll simply | 201 | As our first attempt at profiling this workload, we'll simply |
| 202 | run 'perf record', handing it the workload we want to profile | 202 | run 'perf record', handing it the workload we want to profile |
| 203 | (everything after 'perf record' and any perf options we hand | 203 | (everything after 'perf record' and any perf options we hand |
| 204 | it - here none - will be executedin a new shell). perf collects | 204 | it - here none - will be executedin a new shell). perf collects |
| 205 | samples until the process exits and records them in a file named | 205 | samples until the process exits and records them in a file named |
| 206 | 'perf.data' in the current working directory. | 206 | 'perf.data' in the current working directory. |
| 207 | <literallayout class='monospaced'> | 207 | <literallayout class='monospaced'> |
| 208 | root@crownbay:~# perf record wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 | 208 | root@crownbay:~# perf record wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 |
| 209 | 209 | ||
| 210 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) | 210 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) |
| @@ -215,44 +215,44 @@ | |||
| 215 | To see the results in a 'text-based UI' (tui), simply run | 215 | To see the results in a 'text-based UI' (tui), simply run |
| 216 | 'perf report', which will read the perf.data file in the current | 216 | 'perf report', which will read the perf.data file in the current |
| 217 | working directory and display the results in an interactive UI: | 217 | working directory and display the results in an interactive UI: |
| 218 | <literallayout class='monospaced'> | 218 | <literallayout class='monospaced'> |
| 219 | root@crownbay:~# perf report | 219 | root@crownbay:~# perf report |
| 220 | </literallayout> | 220 | </literallayout> |
| 221 | </para> | 221 | </para> |
| 222 | 222 | ||
| 223 | <para> | 223 | <para> |
| 224 | <imagedata fileref="figures/perf-wget-flat-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> | 224 | <imagedata fileref="figures/perf-wget-flat-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> |
| 225 | </para> | 225 | </para> |
| 226 | 226 | ||
| 227 | <para> | 227 | <para> |
| 228 | The above screenshot displays a 'flat' profile, one entry for | 228 | The above screenshot displays a 'flat' profile, one entry for |
| 229 | each 'bucket' corresponding to the functions that were profiled | 229 | each 'bucket' corresponding to the functions that were profiled |
| 230 | during the profiling run, ordered from the most popular to the | 230 | during the profiling run, ordered from the most popular to the |
| 231 | least (perf has options to sort in various orders and keys as | 231 | least (perf has options to sort in various orders and keys as |
| 232 | well as display entries only above a certain threshold and so | 232 | well as display entries only above a certain threshold and so |
| 233 | on - see the perf documentation for details). Note that this | 233 | on - see the perf documentation for details). Note that this |
| 234 | includes both userspace functions (entries containing a [.]) and | 234 | includes both userspace functions (entries containing a [.]) and |
| 235 | kernel functions accounted to the process (entries containing | 235 | kernel functions accounted to the process (entries containing |
| 236 | a [k]). (perf has command-line modifiers that can be used to | 236 | a [k]). (perf has command-line modifiers that can be used to |
| 237 | restrict the profiling to kernel or userspace, among others). | 237 | restrict the profiling to kernel or userspace, among others). |
| 238 | </para> | 238 | </para> |
| 239 | 239 | ||
| 240 | <para> | 240 | <para> |
| 241 | Notice also that the above report shows an entry for 'busybox', | 241 | Notice also that the above report shows an entry for 'busybox', |
| 242 | which is the executable that implements 'wget' in Yocto, but that | 242 | which is the executable that implements 'wget' in Yocto, but that |
| 243 | instead of a useful function name in that entry, it displays | 243 | instead of a useful function name in that entry, it displays |
| 244 | an not-so-friendly hex value instead. The steps below will show | 244 | an not-so-friendly hex value instead. The steps below will show |
| 245 | how to fix that problem. | 245 | how to fix that problem. |
| 246 | </para> | 246 | </para> |
| 247 | 247 | ||
| 248 | <para> | 248 | <para> |
| 249 | Before we do that, however, let's try running a different profile, | 249 | Before we do that, however, let's try running a different profile, |
| 250 | one which shows something a little more interesting. The only | 250 | one which shows something a little more interesting. The only |
| 251 | difference between the new profile and the previous one is that | 251 | difference between the new profile and the previous one is that |
| 252 | we'll add the -g option, which will record not just the address | 252 | we'll add the -g option, which will record not just the address |
| 253 | of a sampled function, but the entire callchain to the sampled | 253 | of a sampled function, but the entire callchain to the sampled |
| 254 | function as well: | 254 | function as well: |
| 255 | <literallayout class='monospaced'> | 255 | <literallayout class='monospaced'> |
| 256 | root@crownbay:~# perf record -g wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 | 256 | root@crownbay:~# perf record -g wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 |
| 257 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) | 257 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) |
| 258 | linux-2.6.19.2.tar.b 100% |************************************************| 41727k 0:00:00 ETA | 258 | linux-2.6.19.2.tar.b 100% |************************************************| 41727k 0:00:00 ETA |
| @@ -261,217 +261,217 @@ | |||
| 261 | 261 | ||
| 262 | 262 | ||
| 263 | root@crownbay:~# perf report | 263 | root@crownbay:~# perf report |
| 264 | </literallayout> | 264 | </literallayout> |
| 265 | </para> | 265 | </para> |
| 266 | 266 | ||
| 267 | <para> | 267 | <para> |
| 268 | <imagedata fileref="figures/perf-wget-g-copy-to-user-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> | 268 | <imagedata fileref="figures/perf-wget-g-copy-to-user-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> |
| 269 | </para> | 269 | </para> |
| 270 | 270 | ||
| 271 | <para> | 271 | <para> |
| 272 | Using the callgraph view, we can actually see not only which | 272 | Using the callgraph view, we can actually see not only which |
| 273 | functions took the most time, but we can also see a summary of | 273 | functions took the most time, but we can also see a summary of |
| 274 | how those functions were called and learn something about how the | 274 | how those functions were called and learn something about how the |
| 275 | program interacts with the kernel in the process. | 275 | program interacts with the kernel in the process. |
| 276 | </para> | 276 | </para> |
| 277 | 277 | ||
| 278 | <para> | 278 | <para> |
| 279 | Notice that each entry in the above screenshot now contains a '+' | 279 | Notice that each entry in the above screenshot now contains a '+' |
| 280 | on the left-hand side. This means that we can expand the entry and | 280 | on the left-hand side. This means that we can expand the entry and |
| 281 | drill down into the callchains that feed into that entry. | 281 | drill down into the callchains that feed into that entry. |
| 282 | Pressing 'enter' on any one of them will expand the callchain | 282 | Pressing 'enter' on any one of them will expand the callchain |
| 283 | (you can also press 'E' to expand them all at the same time or 'C' | 283 | (you can also press 'E' to expand them all at the same time or 'C' |
| 284 | to collapse them all). | 284 | to collapse them all). |
| 285 | </para> | 285 | </para> |
| 286 | 286 | ||
| 287 | <para> | 287 | <para> |
| 288 | In the screenshot above, we've toggled the __copy_to_user_ll() | 288 | In the screenshot above, we've toggled the __copy_to_user_ll() |
| 289 | entry and several subnodes all the way down. This lets us see | 289 | entry and several subnodes all the way down. This lets us see |
| 290 | which callchains contributed to the profiled __copy_to_user_ll() | 290 | which callchains contributed to the profiled __copy_to_user_ll() |
| 291 | function which contributed 1.77% to the total profile. | 291 | function which contributed 1.77% to the total profile. |
| 292 | </para> | 292 | </para> |
| 293 | 293 | ||
| 294 | <para> | 294 | <para> |
| 295 | As a bit of background explanation for these callchains, think | 295 | As a bit of background explanation for these callchains, think |
| 296 | about what happens at a high level when you run wget to get a file | 296 | about what happens at a high level when you run wget to get a file |
| 297 | out on the network. Basically what happens is that the data comes | 297 | out on the network. Basically what happens is that the data comes |
| 298 | into the kernel via the network connection (socket) and is passed | 298 | into the kernel via the network connection (socket) and is passed |
| 299 | to the userspace program 'wget' (which is actually a part of | 299 | to the userspace program 'wget' (which is actually a part of |
| 300 | busybox, but that's not important for now), which takes the buffers | 300 | busybox, but that's not important for now), which takes the buffers |
| 301 | the kernel passes to it and writes it to a disk file to save it. | 301 | the kernel passes to it and writes it to a disk file to save it. |
| 302 | </para> | 302 | </para> |
| 303 | 303 | ||
| 304 | <para> | 304 | <para> |
| 305 | The part of this process that we're looking at in the above call | 305 | The part of this process that we're looking at in the above call |
| 306 | stacks is the part where the kernel passes the data it's read from | 306 | stacks is the part where the kernel passes the data it's read from |
| 307 | the socket down to wget i.e. a copy-to-user. | 307 | the socket down to wget i.e. a copy-to-user. |
| 308 | </para> | 308 | </para> |
| 309 | 309 | ||
| 310 | <para> | 310 | <para> |
| 311 | Notice also that here there's also a case where the a hex value | 311 | Notice also that here there's also a case where the a hex value |
| 312 | is displayed in the callstack, here in the expanded | 312 | is displayed in the callstack, here in the expanded |
| 313 | sys_clock_gettime() function. Later we'll see it resolve to a | 313 | sys_clock_gettime() function. Later we'll see it resolve to a |
| 314 | userspace function call in busybox. | 314 | userspace function call in busybox. |
| 315 | </para> | 315 | </para> |
| 316 | 316 | ||
| 317 | <para> | 317 | <para> |
| 318 | <imagedata fileref="figures/perf-wget-g-copy-from-user-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> | 318 | <imagedata fileref="figures/perf-wget-g-copy-from-user-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> |
| 319 | </para> | 319 | </para> |
| 320 | 320 | ||
| 321 | <para> | 321 | <para> |
| 322 | The above screenshot shows the other half of the journey for the | 322 | The above screenshot shows the other half of the journey for the |
| 323 | data - from the wget program's userspace buffers to disk. To get | 323 | data - from the wget program's userspace buffers to disk. To get |
| 324 | the buffers to disk, the wget program issues a write(2), which | 324 | the buffers to disk, the wget program issues a write(2), which |
| 325 | does a copy-from-user to the kernel, which then takes care via | 325 | does a copy-from-user to the kernel, which then takes care via |
| 326 | some circuitous path (probably also present somewhere in the | 326 | some circuitous path (probably also present somewhere in the |
| 327 | profile data), to get it safely to disk. | 327 | profile data), to get it safely to disk. |
| 328 | </para> | 328 | </para> |
| 329 | 329 | ||
| 330 | <para> | 330 | <para> |
| 331 | Now that we've seen the basic layout of the profile data and the | 331 | Now that we've seen the basic layout of the profile data and the |
| 332 | basics of how to extract useful information out of it, let's get | 332 | basics of how to extract useful information out of it, let's get |
| 333 | back to the task at hand and see if we can get some basic idea | 333 | back to the task at hand and see if we can get some basic idea |
| 334 | about where the time is spent in the program we're profiling, | 334 | about where the time is spent in the program we're profiling, |
| 335 | wget. Remember that wget is actually implemented as an applet | 335 | wget. Remember that wget is actually implemented as an applet |
| 336 | in busybox, so while the process name is 'wget', the executable | 336 | in busybox, so while the process name is 'wget', the executable |
| 337 | we're actually interested in is busybox. So let's expand the | 337 | we're actually interested in is busybox. So let's expand the |
| 338 | first entry containing busybox: | 338 | first entry containing busybox: |
| 339 | </para> | 339 | </para> |
| 340 | 340 | ||
| 341 | <para> | 341 | <para> |
| 342 | <imagedata fileref="figures/perf-wget-busybox-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> | 342 | <imagedata fileref="figures/perf-wget-busybox-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> |
| 343 | </para> | 343 | </para> |
| 344 | 344 | ||
| 345 | <para> | 345 | <para> |
| 346 | Again, before we expanded we saw that the function was labeled | 346 | Again, before we expanded we saw that the function was labeled |
| 347 | with a hex value instead of a symbol as with most of the kernel | 347 | with a hex value instead of a symbol as with most of the kernel |
| 348 | entries. Expanding the busybox entry doesn't make it any better. | 348 | entries. Expanding the busybox entry doesn't make it any better. |
| 349 | </para> | 349 | </para> |
| 350 | 350 | ||
| 351 | <para> | 351 | <para> |
| 352 | The problem is that perf can't find the symbol information for the | 352 | The problem is that perf can't find the symbol information for the |
| 353 | busybox binary, which is actually stripped out by the Yocto build | 353 | busybox binary, which is actually stripped out by the Yocto build |
| 354 | system. | 354 | system. |
| 355 | </para> | 355 | </para> |
| 356 | 356 | ||
| 357 | <para> | 357 | <para> |
| 358 | One way around that is to put the following in your local.conf | 358 | One way around that is to put the following in your local.conf |
| 359 | when you build the image: | 359 | when you build the image: |
| 360 | <literallayout class='monospaced'> | 360 | <literallayout class='monospaced'> |
| 361 | INHIBIT_PACKAGE_STRIP = "1" | 361 | INHIBIT_PACKAGE_STRIP = "1" |
| 362 | </literallayout> | 362 | </literallayout> |
| 363 | However, we already have an image with the binaries stripped, | 363 | However, we already have an image with the binaries stripped, |
| 364 | so what can we do to get perf to resolve the symbols? Basically | 364 | so what can we do to get perf to resolve the symbols? Basically |
| 365 | we need to install the debuginfo for the busybox package. | 365 | we need to install the debuginfo for the busybox package. |
| 366 | </para> | 366 | </para> |
| 367 | 367 | ||
| 368 | <para> | 368 | <para> |
| 369 | To generate the debug info for the packages in the image, we can | 369 | To generate the debug info for the packages in the image, we can |
| 370 | to add dbg-pkgs to EXTRA_IMAGE_FEATURES in local.conf. For example: | 370 | to add dbg-pkgs to EXTRA_IMAGE_FEATURES in local.conf. For example: |
| 371 | <literallayout class='monospaced'> | 371 | <literallayout class='monospaced'> |
| 372 | EXTRA_IMAGE_FEATURES = "debug-tweaks tools-profile dbg-pkgs" | 372 | EXTRA_IMAGE_FEATURES = "debug-tweaks tools-profile dbg-pkgs" |
| 373 | </literallayout> | 373 | </literallayout> |
| 374 | Additionally, in order to generate the type of debuginfo that | 374 | Additionally, in order to generate the type of debuginfo that |
| 375 | perf understands, we also need to add the following to local.conf: | 375 | perf understands, we also need to add the following to local.conf: |
| 376 | <literallayout class='monospaced'> | 376 | <literallayout class='monospaced'> |
| 377 | PACKAGE_DEBUG_SPLIT_STYLE = 'debug-file-directory' | 377 | PACKAGE_DEBUG_SPLIT_STYLE = 'debug-file-directory' |
| 378 | </literallayout> | 378 | </literallayout> |
| 379 | Once we've done that, we can install the debuginfo for busybox. | 379 | Once we've done that, we can install the debuginfo for busybox. |
| 380 | The debug packages once built can be found in | 380 | The debug packages once built can be found in |
| 381 | build/tmp/deploy/rpm/* on the host system. Find the | 381 | build/tmp/deploy/rpm/* on the host system. Find the |
| 382 | busybox-dbg-...rpm file and copy it to the target. For example: | 382 | busybox-dbg-...rpm file and copy it to the target. For example: |
| 383 | <literallayout class='monospaced'> | 383 | <literallayout class='monospaced'> |
| 384 | [trz@empanada core2]$ scp /home/trz/yocto/crownbay-tracing-dbg/build/tmp/deploy/rpm/core2/busybox-dbg-1.20.2-r2.core2.rpm root@192.168.1.31: | 384 | [trz@empanada core2]$ scp /home/trz/yocto/crownbay-tracing-dbg/build/tmp/deploy/rpm/core2/busybox-dbg-1.20.2-r2.core2.rpm root@192.168.1.31: |
| 385 | root@192.168.1.31's password: | 385 | root@192.168.1.31's password: |
| 386 | busybox-dbg-1.20.2-r2.core2.rpm 100% 1826KB 1.8MB/s 00:01 | 386 | busybox-dbg-1.20.2-r2.core2.rpm 100% 1826KB 1.8MB/s 00:01 |
| 387 | </literallayout> | 387 | </literallayout> |
| 388 | Now install the debug rpm on the target: | 388 | Now install the debug rpm on the target: |
| 389 | <literallayout class='monospaced'> | 389 | <literallayout class='monospaced'> |
| 390 | root@crownbay:~# rpm -i busybox-dbg-1.20.2-r2.core2.rpm | 390 | root@crownbay:~# rpm -i busybox-dbg-1.20.2-r2.core2.rpm |
| 391 | </literallayout> | 391 | </literallayout> |
| 392 | Now that the debuginfo is installed, we see that the busybox | 392 | Now that the debuginfo is installed, we see that the busybox |
| 393 | entries now display their functions symbolically: | 393 | entries now display their functions symbolically: |
| 394 | </para> | 394 | </para> |
| 395 | 395 | ||
| 396 | <para> | 396 | <para> |
| 397 | <imagedata fileref="figures/perf-wget-busybox-debuginfo.png" width="6in" depth="7in" align="center" scalefit="1" /> | 397 | <imagedata fileref="figures/perf-wget-busybox-debuginfo.png" width="6in" depth="7in" align="center" scalefit="1" /> |
| 398 | </para> | 398 | </para> |
| 399 | 399 | ||
| 400 | <para> | 400 | <para> |
| 401 | If we expand one of the entries and press 'enter' on a leaf node, | 401 | If we expand one of the entries and press 'enter' on a leaf node, |
| 402 | we're presented with a menu of actions we can take to get more | 402 | we're presented with a menu of actions we can take to get more |
| 403 | information related to that entry: | 403 | information related to that entry: |
| 404 | </para> | 404 | </para> |
| 405 | 405 | ||
| 406 | <para> | 406 | <para> |
| 407 | <imagedata fileref="figures/perf-wget-busybox-dso-zoom-menu.png" width="6in" depth="7in" align="center" scalefit="1" /> | 407 | <imagedata fileref="figures/perf-wget-busybox-dso-zoom-menu.png" width="6in" depth="7in" align="center" scalefit="1" /> |
| 408 | </para> | 408 | </para> |
| 409 | 409 | ||
| 410 | <para> | 410 | <para> |
| 411 | One of these actions allows us to show a view that displays a | 411 | One of these actions allows us to show a view that displays a |
| 412 | busybox-centric view of the profiled functions (in this case we've | 412 | busybox-centric view of the profiled functions (in this case we've |
| 413 | also expanded all the nodes using the 'E' key): | 413 | also expanded all the nodes using the 'E' key): |
| 414 | </para> | 414 | </para> |
| 415 | 415 | ||
| 416 | <para> | 416 | <para> |
| 417 | <imagedata fileref="figures/perf-wget-busybox-dso-zoom.png" width="6in" depth="7in" align="center" scalefit="1" /> | 417 | <imagedata fileref="figures/perf-wget-busybox-dso-zoom.png" width="6in" depth="7in" align="center" scalefit="1" /> |
| 418 | </para> | 418 | </para> |
| 419 | 419 | ||
| 420 | <para> | 420 | <para> |
| 421 | Finally, we can see that now that the busybox debuginfo is | 421 | Finally, we can see that now that the busybox debuginfo is |
| 422 | installed, the previously unresolved symbol in the | 422 | installed, the previously unresolved symbol in the |
| 423 | sys_clock_gettime() entry mentioned previously is now resolved, | 423 | sys_clock_gettime() entry mentioned previously is now resolved, |
| 424 | and shows that the sys_clock_gettime system call that was the | 424 | and shows that the sys_clock_gettime system call that was the |
| 425 | source of 6.75% of the copy-to-user overhead was initiated by | 425 | source of 6.75% of the copy-to-user overhead was initiated by |
| 426 | the handle_input() busybox function: | 426 | the handle_input() busybox function: |
| 427 | </para> | 427 | </para> |
| 428 | 428 | ||
| 429 | <para> | 429 | <para> |
| 430 | <imagedata fileref="figures/perf-wget-g-copy-to-user-expanded-debuginfo.png" width="6in" depth="7in" align="center" scalefit="1" /> | 430 | <imagedata fileref="figures/perf-wget-g-copy-to-user-expanded-debuginfo.png" width="6in" depth="7in" align="center" scalefit="1" /> |
| 431 | </para> | 431 | </para> |
| 432 | 432 | ||
| 433 | <para> | 433 | <para> |
| 434 | At the lowest level of detail, we can dive down to the assembly | 434 | At the lowest level of detail, we can dive down to the assembly |
| 435 | level and see which instructions caused the most overhead in a | 435 | level and see which instructions caused the most overhead in a |
| 436 | function. Pressing 'enter' on the 'udhcpc_main' function, we're | 436 | function. Pressing 'enter' on the 'udhcpc_main' function, we're |
| 437 | again presented with a menu: | 437 | again presented with a menu: |
| 438 | </para> | 438 | </para> |
| 439 | 439 | ||
| 440 | <para> | 440 | <para> |
| 441 | <imagedata fileref="figures/perf-wget-busybox-annotate-menu.png" width="6in" depth="7in" align="center" scalefit="1" /> | 441 | <imagedata fileref="figures/perf-wget-busybox-annotate-menu.png" width="6in" depth="7in" align="center" scalefit="1" /> |
| 442 | </para> | 442 | </para> |
| 443 | 443 | ||
| 444 | <para> | 444 | <para> |
| 445 | Selecting 'Annotate udhcpc_main', we get a detailed listing of | 445 | Selecting 'Annotate udhcpc_main', we get a detailed listing of |
| 446 | percentages by instruction for the udhcpc_main function. From the | 446 | percentages by instruction for the udhcpc_main function. From the |
| 447 | display, we can see that over 50% of the time spent in this | 447 | display, we can see that over 50% of the time spent in this |
| 448 | function is taken up by a couple tests and the move of a | 448 | function is taken up by a couple tests and the move of a |
| 449 | constant (1) to a register: | 449 | constant (1) to a register: |
| 450 | </para> | 450 | </para> |
| 451 | 451 | ||
| 452 | <para> | 452 | <para> |
| 453 | <imagedata fileref="figures/perf-wget-busybox-annotate-udhcpc.png" width="6in" depth="7in" align="center" scalefit="1" /> | 453 | <imagedata fileref="figures/perf-wget-busybox-annotate-udhcpc.png" width="6in" depth="7in" align="center" scalefit="1" /> |
| 454 | </para> | 454 | </para> |
| 455 | 455 | ||
| 456 | <para> | 456 | <para> |
| 457 | As a segue into tracing, let's try another profile using a | 457 | As a segue into tracing, let's try another profile using a |
| 458 | different counter, something other than the default 'cycles'. | 458 | different counter, something other than the default 'cycles'. |
| 459 | </para> | 459 | </para> |
| 460 | 460 | ||
| 461 | <para> | 461 | <para> |
| 462 | The tracing and profiling infrastructure in Linux has become | 462 | The tracing and profiling infrastructure in Linux has become |
| 463 | unified in a way that allows us to use the same tool with a | 463 | unified in a way that allows us to use the same tool with a |
| 464 | completely different set of counters, not just the standard | 464 | completely different set of counters, not just the standard |
| 465 | hardware counters that traditionally tools have had to restrict | 465 | hardware counters that traditionally tools have had to restrict |
| 466 | themselves to (of course the traditional tools can also make use | 466 | themselves to (of course the traditional tools can also make use |
| 467 | of the expanded possibilities now available to them, and in some | 467 | of the expanded possibilities now available to them, and in some |
| 468 | cases have, as mentioned previously). | 468 | cases have, as mentioned previously). |
| 469 | </para> | 469 | </para> |
| 470 | 470 | ||
| 471 | <para> | 471 | <para> |
| 472 | We can get a list of the available events that can be used to | 472 | We can get a list of the available events that can be used to |
| 473 | profile a workload via 'perf list': | 473 | profile a workload via 'perf list': |
| 474 | <literallayout class='monospaced'> | 474 | <literallayout class='monospaced'> |
| 475 | root@crownbay:~# perf list | 475 | root@crownbay:~# perf list |
| 476 | 476 | ||
| 477 | List of pre-defined events (to be used in -e): | 477 | List of pre-defined events (to be used in -e): |
| @@ -602,23 +602,23 @@ | |||
| 602 | syscalls:sys_exit_unshare [Tracepoint event] | 602 | syscalls:sys_exit_unshare [Tracepoint event] |
| 603 | raw_syscalls:sys_enter [Tracepoint event] | 603 | raw_syscalls:sys_enter [Tracepoint event] |
| 604 | raw_syscalls:sys_exit [Tracepoint event] | 604 | raw_syscalls:sys_exit [Tracepoint event] |
| 605 | </literallayout> | 605 | </literallayout> |
| 606 | </para> | 606 | </para> |
| 607 | 607 | ||
| 608 | <note> | 608 | <note> |
| 609 | Tying It Together: These are exactly the same set of events defined | 609 | Tying It Together: These are exactly the same set of events defined |
| 610 | by the trace event subsystem and exposed by | 610 | by the trace event subsystem and exposed by |
| 611 | ftrace/tracecmd/kernelshark as files in | 611 | ftrace/tracecmd/kernelshark as files in |
| 612 | /sys/kernel/debug/tracing/events, by SystemTap as | 612 | /sys/kernel/debug/tracing/events, by SystemTap as |
| 613 | kernel.trace("tracepoint_name") and (partially) accessed by LTTng. | 613 | kernel.trace("tracepoint_name") and (partially) accessed by LTTng. |
| 614 | </note> | 614 | </note> |
| 615 | 615 | ||
| 616 | <para> | 616 | <para> |
| 617 | Only a subset of these would be of interest to us when looking at | 617 | Only a subset of these would be of interest to us when looking at |
| 618 | this workload, so let's choose the most likely subsystems | 618 | this workload, so let's choose the most likely subsystems |
| 619 | (identified by the string before the colon in the Tracepoint events) | 619 | (identified by the string before the colon in the Tracepoint events) |
| 620 | and do a 'perf stat' run using only those wildcarded subsystems: | 620 | and do a 'perf stat' run using only those wildcarded subsystems: |
| 621 | <literallayout class='monospaced'> | 621 | <literallayout class='monospaced'> |
| 622 | root@crownbay:~# perf stat -e skb:* -e net:* -e napi:* -e sched:* -e workqueue:* -e irq:* -e syscalls:* wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 | 622 | root@crownbay:~# perf stat -e skb:* -e net:* -e napi:* -e sched:* -e workqueue:* -e irq:* -e syscalls:* wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 |
| 623 | Performance counter stats for 'wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2': | 623 | Performance counter stats for 'wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2': |
| 624 | 624 | ||
| @@ -675,26 +675,26 @@ | |||
| 675 | . | 675 | . |
| 676 | 676 | ||
| 677 | 58.029710972 seconds time elapsed | 677 | 58.029710972 seconds time elapsed |
| 678 | </literallayout> | 678 | </literallayout> |
| 679 | Let's pick one of these tracepoints and tell perf to do a profile | 679 | Let's pick one of these tracepoints and tell perf to do a profile |
| 680 | using it as the sampling event: | 680 | using it as the sampling event: |
| 681 | <literallayout class='monospaced'> | 681 | <literallayout class='monospaced'> |
| 682 | root@crownbay:~# perf record -g -e sched:sched_wakeup wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 | 682 | root@crownbay:~# perf record -g -e sched:sched_wakeup wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 |
| 683 | </literallayout> | 683 | </literallayout> |
| 684 | </para> | 684 | </para> |
| 685 | 685 | ||
| 686 | <para> | 686 | <para> |
| 687 | <imagedata fileref="figures/sched-wakeup-profile.png" width="6in" depth="7in" align="center" scalefit="1" /> | 687 | <imagedata fileref="figures/sched-wakeup-profile.png" width="6in" depth="7in" align="center" scalefit="1" /> |
| 688 | </para> | 688 | </para> |
| 689 | 689 | ||
| 690 | <para> | 690 | <para> |
| 691 | The screenshot above shows the results of running a profile using | 691 | The screenshot above shows the results of running a profile using |
| 692 | sched:sched_switch tracepoint, which shows the relative costs of | 692 | sched:sched_switch tracepoint, which shows the relative costs of |
| 693 | various paths to sched_wakeup (note that sched_wakeup is the | 693 | various paths to sched_wakeup (note that sched_wakeup is the |
| 694 | name of the tracepoint - it's actually defined just inside | 694 | name of the tracepoint - it's actually defined just inside |
| 695 | ttwu_do_wakeup(), which accounts for the function name actually | 695 | ttwu_do_wakeup(), which accounts for the function name actually |
| 696 | displayed in the profile: | 696 | displayed in the profile: |
| 697 | <literallayout class='monospaced'> | 697 | <literallayout class='monospaced'> |
| 698 | /* | 698 | /* |
| 699 | * Mark the task runnable and perform wakeup-preemption. | 699 | * Mark the task runnable and perform wakeup-preemption. |
| 700 | */ | 700 | */ |
| @@ -706,55 +706,55 @@ | |||
| 706 | . | 706 | . |
| 707 | . | 707 | . |
| 708 | } | 708 | } |
| 709 | </literallayout> | 709 | </literallayout> |
| 710 | A couple of the more interesting callchains are expanded and | 710 | A couple of the more interesting callchains are expanded and |
| 711 | displayed above, basically some network receive paths that | 711 | displayed above, basically some network receive paths that |
| 712 | presumably end up waking up wget (busybox) when network data is | 712 | presumably end up waking up wget (busybox) when network data is |
| 713 | ready. | 713 | ready. |
| 714 | </para> | 714 | </para> |
| 715 | 715 | ||
| 716 | <para> | 716 | <para> |
| 717 | Note that because tracepoints are normally used for tracing, | 717 | Note that because tracepoints are normally used for tracing, |
| 718 | the default sampling period for tracepoints is 1 i.e. for | 718 | the default sampling period for tracepoints is 1 i.e. for |
| 719 | tracepoints perf will sample on every event occurrence (this | 719 | tracepoints perf will sample on every event occurrence (this |
| 720 | can be changed using the -c option). This is in contrast to | 720 | can be changed using the -c option). This is in contrast to |
| 721 | hardware counters such as for example the default 'cycles' | 721 | hardware counters such as for example the default 'cycles' |
| 722 | hardware counter used for normal profiling, where sampling | 722 | hardware counter used for normal profiling, where sampling |
| 723 | periods are much higher (in the thousands) because profiling should | 723 | periods are much higher (in the thousands) because profiling should |
| 724 | have as low an overhead as possible and sampling on every cycle w | 724 | have as low an overhead as possible and sampling on every cycle |
| 725 | ould be prohibitively expensive. | 725 | would be prohibitively expensive. |
| 726 | </para> | 726 | </para> |
| 727 | </section> | 727 | </section> |
| 728 | 728 | ||
| 729 | <section id='using-perf-to-do-basic-tracing'> | 729 | <section id='using-perf-to-do-basic-tracing'> |
| 730 | <title>Using perf to do Basic Tracing</title> | 730 | <title>Using perf to do Basic Tracing</title> |
| 731 | 731 | ||
| 732 | <para> | 732 | <para> |
| 733 | Profiling is a great tool for solving many problems or for | 733 | Profiling is a great tool for solving many problems or for |
| 734 | getting a high-level view of what's going on with a workload or | 734 | getting a high-level view of what's going on with a workload or |
| 735 | across the system. It is however by definition an approximation, | 735 | across the system. It is however by definition an approximation, |
| 736 | as suggested by the most prominent word associated with it, | 736 | as suggested by the most prominent word associated with it, |
| 737 | 'sampling'. On the one hand, it allows a representative picture of | 737 | 'sampling'. On the one hand, it allows a representative picture of |
| 738 | what's going on in the system to be cheaply taken, but on the other | 738 | what's going on in the system to be cheaply taken, but on the other |
| 739 | hand, that cheapness limits its utility when that data suggests a | 739 | hand, that cheapness limits its utility when that data suggests a |
| 740 | need to 'dive down' more deeply to discover what's really going | 740 | need to 'dive down' more deeply to discover what's really going |
| 741 | on. In such cases, the only way to see what's really going on is | 741 | on. In such cases, the only way to see what's really going on is |
| 742 | to be able to look at (or summarize more intelligently) the | 742 | to be able to look at (or summarize more intelligently) the |
| 743 | individual steps that go into the higher-level behavior exposed | 743 | individual steps that go into the higher-level behavior exposed |
| 744 | by the coarse-grained profiling data. | 744 | by the coarse-grained profiling data. |
| 745 | </para> | 745 | </para> |
| 746 | 746 | ||
| 747 | <para> | 747 | <para> |
| 748 | As a concrete example, we can trace all the events we think might | 748 | As a concrete example, we can trace all the events we think might |
| 749 | be applicable to our workload: | 749 | be applicable to our workload: |
| 750 | <literallayout class='monospaced'> | 750 | <literallayout class='monospaced'> |
| 751 | root@crownbay:~# perf record -g -e skb:* -e net:* -e napi:* -e sched:sched_switch -e sched:sched_wakeup -e irq:* | 751 | root@crownbay:~# perf record -g -e skb:* -e net:* -e napi:* -e sched:sched_switch -e sched:sched_wakeup -e irq:* |
| 752 | -e syscalls:sys_enter_read -e syscalls:sys_exit_read -e syscalls:sys_enter_write -e syscalls:sys_exit_write | 752 | -e syscalls:sys_enter_read -e syscalls:sys_exit_read -e syscalls:sys_enter_write -e syscalls:sys_exit_write |
| 753 | wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 | 753 | wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 |
| 754 | </literallayout> | 754 | </literallayout> |
| 755 | We can look at the raw trace output using 'perf script' with no | 755 | We can look at the raw trace output using 'perf script' with no |
| 756 | arguments: | 756 | arguments: |
| 757 | <literallayout class='monospaced'> | 757 | <literallayout class='monospaced'> |
| 758 | root@crownbay:~# perf script | 758 | root@crownbay:~# perf script |
| 759 | 759 | ||
| 760 | perf 1262 [000] 11624.857082: sys_exit_read: 0x0 | 760 | perf 1262 [000] 11624.857082: sys_exit_read: 0x0 |
| @@ -779,90 +779,90 @@ | |||
| 779 | wget 1262 [001] 11624.859888: sys_exit_read: 0x400 | 779 | wget 1262 [001] 11624.859888: sys_exit_read: 0x400 |
| 780 | wget 1262 [001] 11624.859935: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400 | 780 | wget 1262 [001] 11624.859935: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400 |
| 781 | wget 1262 [001] 11624.859944: sys_exit_read: 0x400 | 781 | wget 1262 [001] 11624.859944: sys_exit_read: 0x400 |
| 782 | </literallayout> | 782 | </literallayout> |
| 783 | This gives us a detailed timestamped sequence of events that | 783 | This gives us a detailed timestamped sequence of events that |
| 784 | occurred within the workload with respect to those events. | 784 | occurred within the workload with respect to those events. |
| 785 | </para> | 785 | </para> |
| 786 | 786 | ||
| 787 | <para> | 787 | <para> |
| 788 | In many ways, profiling can be viewed as a subset of tracing - | 788 | In many ways, profiling can be viewed as a subset of tracing - |
| 789 | theoretically, if you have a set of trace events that's sufficient | 789 | theoretically, if you have a set of trace events that's sufficient |
| 790 | to capture all the important aspects of a workload, you can derive | 790 | to capture all the important aspects of a workload, you can derive |
| 791 | any of the results or views that a profiling run can. | 791 | any of the results or views that a profiling run can. |
| 792 | </para> | 792 | </para> |
| 793 | 793 | ||
| 794 | <para> | 794 | <para> |
| 795 | Another aspect of traditional profiling is that while powerful in | 795 | Another aspect of traditional profiling is that while powerful in |
| 796 | many ways, it's limited by the granularity of the underlying data. | 796 | many ways, it's limited by the granularity of the underlying data. |
| 797 | Profiling tools offer various ways of sorting and presenting the | 797 | Profiling tools offer various ways of sorting and presenting the |
| 798 | sample data, which make it much more useful and amenable to user | 798 | sample data, which make it much more useful and amenable to user |
| 799 | experimentation, but in the end it can't be used in an open-ended | 799 | experimentation, but in the end it can't be used in an open-ended |
| 800 | way to extract data that just isn't present as a consequence of | 800 | way to extract data that just isn't present as a consequence of |
| 801 | the fact that conceptually, most of it has been thrown away. | 801 | the fact that conceptually, most of it has been thrown away. |
| 802 | </para> | 802 | </para> |
| 803 | 803 | ||
| 804 | <para> | 804 | <para> |
| 805 | Full-blown detailed tracing data does however offer the opportunity | 805 | Full-blown detailed tracing data does however offer the opportunity |
| 806 | to manipulate and present the information collected during a | 806 | to manipulate and present the information collected during a |
| 807 | tracing run in an infinite variety of ways. | 807 | tracing run in an infinite variety of ways. |
| 808 | </para> | 808 | </para> |
| 809 | 809 | ||
| 810 | <para> | 810 | <para> |
| 811 | Another way to look at it is that there are only so many ways that | 811 | Another way to look at it is that there are only so many ways that |
| 812 | the 'primitive' counters can be used on their own to generate | 812 | the 'primitive' counters can be used on their own to generate |
| 813 | interesting output; to get anything more complicated than simple | 813 | interesting output; to get anything more complicated than simple |
| 814 | counts requires some amount of additional logic, which is typically | 814 | counts requires some amount of additional logic, which is typically |
| 815 | very specific to the problem at hand. For example, if we wanted to | 815 | very specific to the problem at hand. For example, if we wanted to |
| 816 | make use of a 'counter' that maps to the value of the time | 816 | make use of a 'counter' that maps to the value of the time |
| 817 | difference between when a process was scheduled to run on a | 817 | difference between when a process was scheduled to run on a |
| 818 | processor and the time it actually ran, we wouldn't expect such | 818 | processor and the time it actually ran, we wouldn't expect such |
| 819 | a counter to exist on its own, but we could derive one called say | 819 | a counter to exist on its own, but we could derive one called say |
| 820 | 'wakeup_latency' and use it to extract a useful view of that metric | 820 | 'wakeup_latency' and use it to extract a useful view of that metric |
| 821 | from trace data. Likewise, we really can't figure out from standard | 821 | from trace data. Likewise, we really can't figure out from standard |
| 822 | profiling tools how much data every process on the system reads and | 822 | profiling tools how much data every process on the system reads and |
| 823 | writes, along with how many of those reads and writes fail | 823 | writes, along with how many of those reads and writes fail |
| 824 | completely. If we have sufficient trace data, however, we could | 824 | completely. If we have sufficient trace data, however, we could |
| 825 | with the right tools easily extract and present that information, | 825 | with the right tools easily extract and present that information, |
| 826 | but we'd need something other than pre-canned profiling tools to | 826 | but we'd need something other than pre-canned profiling tools to |
| 827 | do that. | 827 | do that. |
| 828 | </para> | 828 | </para> |
| 829 | 829 | ||
| 830 | <para> | 830 | <para> |
| 831 | Luckily, there is general-purpose way to handle such needs, | 831 | Luckily, there is general-purpose way to handle such needs, |
| 832 | called 'programming languages'. Making programming languages | 832 | called 'programming languages'. Making programming languages |
| 833 | easily available to apply to such problems given the specific | 833 | easily available to apply to such problems given the specific |
| 834 | format of data is called a 'programming language binding' for | 834 | format of data is called a 'programming language binding' for |
| 835 | that data and language. Perf supports two programming language | 835 | that data and language. Perf supports two programming language |
| 836 | bindings, one for Python and one for Perl. | 836 | bindings, one for Python and one for Perl. |
| 837 | </para> | 837 | </para> |
| 838 | 838 | ||
| 839 | <note> | 839 | <note> |
| 840 | Tying It Together: Language bindings for manipulating and | 840 | Tying It Together: Language bindings for manipulating and |
| 841 | aggregating trace data are of course not a new | 841 | aggregating trace data are of course not a new |
| 842 | idea. One of the first projects to do this was IBM's DProbes | 842 | idea. One of the first projects to do this was IBM's DProbes |
| 843 | dpcc compiler, an ANSI C compiler which targeted a low-level | 843 | dpcc compiler, an ANSI C compiler which targeted a low-level |
| 844 | assembly language running on an in-kernel interpreter on the | 844 | assembly language running on an in-kernel interpreter on the |
| 845 | target system. This is exactly analagous to what Sun's DTrace | 845 | target system. This is exactly analagous to what Sun's DTrace |
| 846 | did, except that DTrace invented its own language for the purpose. | 846 | did, except that DTrace invented its own language for the purpose. |
| 847 | Systemtap, heavily inspired by DTrace, also created its own | 847 | Systemtap, heavily inspired by DTrace, also created its own |
| 848 | one-off language, but rather than running the product on an | 848 | one-off language, but rather than running the product on an |
| 849 | in-kernel interpreter, created an elaborate compiler-based | 849 | in-kernel interpreter, created an elaborate compiler-based |
| 850 | machinery to translate its language into kernel modules written | 850 | machinery to translate its language into kernel modules written |
| 851 | in C. | 851 | in C. |
| 852 | </note> | 852 | </note> |
| 853 | 853 | ||
| 854 | <para> | 854 | <para> |
| 855 | Now that we have the trace data in perf.data, we can use | 855 | Now that we have the trace data in perf.data, we can use |
| 856 | 'perf script -g' to generate a skeleton script with handlers | 856 | 'perf script -g' to generate a skeleton script with handlers |
| 857 | for the read/write entry/exit events we recorded: | 857 | for the read/write entry/exit events we recorded: |
| 858 | <literallayout class='monospaced'> | 858 | <literallayout class='monospaced'> |
| 859 | root@crownbay:~# perf script -g python | 859 | root@crownbay:~# perf script -g python |
| 860 | generated Python script: perf-script.py | 860 | generated Python script: perf-script.py |
| 861 | </literallayout> | 861 | </literallayout> |
| 862 | The skeleton script simply creates a python function for each | 862 | The skeleton script simply creates a python function for each |
| 863 | event type in the perf.data file. The body of each function simply | 863 | event type in the perf.data file. The body of each function simply |
| 864 | prints the event name along with its parameters. For example: | 864 | prints the event name along with its parameters. For example: |
| 865 | <literallayout class='monospaced'> | 865 | <literallayout class='monospaced'> |
| 866 | def net__netif_rx(event_name, context, common_cpu, | 866 | def net__netif_rx(event_name, context, common_cpu, |
| 867 | common_secs, common_nsecs, common_pid, common_comm, | 867 | common_secs, common_nsecs, common_pid, common_comm, |
| 868 | skbaddr, len, name): | 868 | skbaddr, len, name): |
| @@ -870,10 +870,10 @@ | |||
| 870 | common_pid, common_comm) | 870 | common_pid, common_comm) |
| 871 | 871 | ||
| 872 | print "skbaddr=%u, len=%u, name=%s\n" % (skbaddr, len, name), | 872 | print "skbaddr=%u, len=%u, name=%s\n" % (skbaddr, len, name), |
| 873 | </literallayout> | 873 | </literallayout> |
| 874 | We can run that script directly to print all of the events | 874 | We can run that script directly to print all of the events |
| 875 | contained in the perf.data file: | 875 | contained in the perf.data file: |
| 876 | <literallayout class='monospaced'> | 876 | <literallayout class='monospaced'> |
| 877 | root@crownbay:~# perf script -s perf-script.py | 877 | root@crownbay:~# perf script -s perf-script.py |
| 878 | 878 | ||
| 879 | in trace_begin | 879 | in trace_begin |
| @@ -899,39 +899,39 @@ | |||
| 899 | syscalls__sys_exit_read 1 11624.859888770 1262 wget nr=3, ret=1024 | 899 | syscalls__sys_exit_read 1 11624.859888770 1262 wget nr=3, ret=1024 |
| 900 | syscalls__sys_enter_read 1 11624.859935140 1262 wget nr=3, fd=3, buf=3077701632, count=1024 | 900 | syscalls__sys_enter_read 1 11624.859935140 1262 wget nr=3, fd=3, buf=3077701632, count=1024 |
| 901 | syscalls__sys_exit_read 1 11624.859944032 1262 wget nr=3, ret=1024 | 901 | syscalls__sys_exit_read 1 11624.859944032 1262 wget nr=3, ret=1024 |
| 902 | </literallayout> | 902 | </literallayout> |
| 903 | That in itself isn't very useful; after all, we can accomplish | 903 | That in itself isn't very useful; after all, we can accomplish |
| 904 | pretty much the same thing by simply running 'perf script' | 904 | pretty much the same thing by simply running 'perf script' |
| 905 | without arguments in the same directory as the perf.data file. | 905 | without arguments in the same directory as the perf.data file. |
| 906 | </para> | 906 | </para> |
| 907 | 907 | ||
| 908 | <para> | 908 | <para> |
| 909 | We can however replace the print statements in the generated | 909 | We can however replace the print statements in the generated |
| 910 | function bodies with whatever we want, and thereby make it | 910 | function bodies with whatever we want, and thereby make it |
| 911 | infinitely more useful. | 911 | infinitely more useful. |
| 912 | </para> | 912 | </para> |
| 913 | 913 | ||
| 914 | <para> | 914 | <para> |
| 915 | As a simple example, let's just replace the print statements in | 915 | As a simple example, let's just replace the print statements in |
| 916 | the function bodies with a simple function that does nothing but | 916 | the function bodies with a simple function that does nothing but |
| 917 | increment a per-event count. When the program is run against a | 917 | increment a per-event count. When the program is run against a |
| 918 | perf.data file, each time a particular event is encountered, | 918 | perf.data file, each time a particular event is encountered, |
| 919 | a tally is incremented for that event. For example: | 919 | a tally is incremented for that event. For example: |
| 920 | <literallayout class='monospaced'> | 920 | <literallayout class='monospaced'> |
| 921 | def net__netif_rx(event_name, context, common_cpu, | 921 | def net__netif_rx(event_name, context, common_cpu, |
| 922 | common_secs, common_nsecs, common_pid, common_comm, | 922 | common_secs, common_nsecs, common_pid, common_comm, |
| 923 | skbaddr, len, name): | 923 | skbaddr, len, name): |
| 924 | inc_counts(event_name) | 924 | inc_counts(event_name) |
| 925 | </literallayout> | 925 | </literallayout> |
| 926 | Each event handler function in the generated code is modified | 926 | Each event handler function in the generated code is modified |
| 927 | to do this. For convenience, we define a common function called | 927 | to do this. For convenience, we define a common function called |
| 928 | inc_counts() that each handler calls; inc_counts simply tallies | 928 | inc_counts() that each handler calls; inc_counts simply tallies |
| 929 | a count for each event using the 'counts' hash, which is a | 929 | a count for each event using the 'counts' hash, which is a |
| 930 | specialized has function that does Perl-like autovivification, a | 930 | specialized has function that does Perl-like autovivification, a |
| 931 | capability that's extremely useful for kinds of multi-level | 931 | capability that's extremely useful for kinds of multi-level |
| 932 | aggregation commonly used in processing traces (see perf's | 932 | aggregation commonly used in processing traces (see perf's |
| 933 | documentation on the Python language binding for details): | 933 | documentation on the Python language binding for details): |
| 934 | <literallayout class='monospaced'> | 934 | <literallayout class='monospaced'> |
| 935 | counts = autodict() | 935 | counts = autodict() |
| 936 | 936 | ||
| 937 | def inc_counts(event_name): | 937 | def inc_counts(event_name): |
| @@ -939,18 +939,18 @@ | |||
| 939 | counts[event_name] += 1 | 939 | counts[event_name] += 1 |
| 940 | except TypeError: | 940 | except TypeError: |
| 941 | counts[event_name] = 1 | 941 | counts[event_name] = 1 |
| 942 | </literallayout> | 942 | </literallayout> |
| 943 | Finally, at the end of the trace processing run, we want to | 943 | Finally, at the end of the trace processing run, we want to |
| 944 | print the result of all the per-event tallies. For that, we | 944 | print the result of all the per-event tallies. For that, we |
| 945 | use the special 'trace_end()' function: | 945 | use the special 'trace_end()' function: |
| 946 | <literallayout class='monospaced'> | 946 | <literallayout class='monospaced'> |
| 947 | def trace_end(): | 947 | def trace_end(): |
| 948 | for event_name, count in counts.iteritems(): | 948 | for event_name, count in counts.iteritems(): |
| 949 | print "%-40s %10s\n" % (event_name, count) | 949 | print "%-40s %10s\n" % (event_name, count) |
| 950 | </literallayout> | 950 | </literallayout> |
| 951 | The end result is a summary of all the events recorded in the | 951 | The end result is a summary of all the events recorded in the |
| 952 | trace: | 952 | trace: |
| 953 | <literallayout> | 953 | <literallayout> |
| 954 | skb__skb_copy_datagram_iovec 13148 | 954 | skb__skb_copy_datagram_iovec 13148 |
| 955 | irq__softirq_entry 4796 | 955 | irq__softirq_entry 4796 |
| 956 | irq__irq_handler_exit 3805 | 956 | irq__irq_handler_exit 3805 |
| @@ -968,112 +968,112 @@ | |||
| 968 | net__netif_rx 2 | 968 | net__netif_rx 2 |
| 969 | napi__napi_poll 1877 | 969 | napi__napi_poll 1877 |
| 970 | syscalls__sys_exit_write 8990 | 970 | syscalls__sys_exit_write 8990 |
| 971 | </literallayout> | 971 | </literallayout> |
| 972 | Note that this is pretty much exactly the same information we get | 972 | Note that this is pretty much exactly the same information we get |
| 973 | from 'perf stat', which goes a little way to support the idea | 973 | from 'perf stat', which goes a little way to support the idea |
| 974 | mentioned previously that given the right kind of trace data, | 974 | mentioned previously that given the right kind of trace data, |
| 975 | higher-level profiling-type summaries can be derived from it. | 975 | higher-level profiling-type summaries can be derived from it. |
| 976 | </para> | 976 | </para> |
| 977 | 977 | ||
| 978 | <para> | 978 | <para> |
| 979 | Documentation on using the | 979 | Documentation on using the |
| 980 | <ulink url='http://linux.die.net/man/1/perf-script-python'>'perf script' python binding</ulink>. | 980 | <ulink url='http://linux.die.net/man/1/perf-script-python'>'perf script' python binding</ulink>. |
| 981 | </para> | 981 | </para> |
| 982 | </section> | 982 | </section> |
| 983 | 983 | ||
| 984 | <section id='system-wide-tracing-and-profiling'> | 984 | <section id='system-wide-tracing-and-profiling'> |
| 985 | <title>System-Wide Tracing and Profiling</title> | 985 | <title>System-Wide Tracing and Profiling</title> |
| 986 | 986 | ||
| 987 | <para> | 987 | <para> |
| 988 | The examples so far have focused on tracing a particular program or | 988 | The examples so far have focused on tracing a particular program or |
| 989 | workload - in other words, every profiling run has specified the | 989 | workload - in other words, every profiling run has specified the |
| 990 | program to profile in the command-line e.g. 'perf record wget ...'. | 990 | program to profile in the command-line e.g. 'perf record wget ...'. |
| 991 | </para> | 991 | </para> |
| 992 | 992 | ||
| 993 | <para> | 993 | <para> |
| 994 | It's also possible, and more interesting in many cases, to run a | 994 | It's also possible, and more interesting in many cases, to run a |
| 995 | system-wide profile or trace while running the workload in a | 995 | system-wide profile or trace while running the workload in a |
| 996 | separate shell. | 996 | separate shell. |
| 997 | </para> | 997 | </para> |
| 998 | 998 | ||
| 999 | <para> | 999 | <para> |
| 1000 | To do system-wide profiling or tracing, you typically use | 1000 | To do system-wide profiling or tracing, you typically use |
| 1001 | the -a flag to 'perf record'. | 1001 | the -a flag to 'perf record'. |
| 1002 | </para> | 1002 | </para> |
| 1003 | 1003 | ||
| 1004 | <para> | 1004 | <para> |
| 1005 | To demonstrate this, open up one window and start the profile | 1005 | To demonstrate this, open up one window and start the profile |
| 1006 | using the -a flag (press Ctrl-C to stop tracing): | 1006 | using the -a flag (press Ctrl-C to stop tracing): |
| 1007 | <literallayout class='monospaced'> | 1007 | <literallayout class='monospaced'> |
| 1008 | root@crownbay:~# perf record -g -a | 1008 | root@crownbay:~# perf record -g -a |
| 1009 | ^C[ perf record: Woken up 6 times to write data ] | 1009 | ^C[ perf record: Woken up 6 times to write data ] |
| 1010 | [ perf record: Captured and wrote 1.400 MB perf.data (~61172 samples) ] | 1010 | [ perf record: Captured and wrote 1.400 MB perf.data (~61172 samples) ] |
| 1011 | </literallayout> | 1011 | </literallayout> |
| 1012 | In another window, run the wget test: | 1012 | In another window, run the wget test: |
| 1013 | <literallayout class='monospaced'> | 1013 | <literallayout class='monospaced'> |
| 1014 | root@crownbay:~# wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink> | 1014 | root@crownbay:~# wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink> |
| 1015 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) | 1015 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) |
| 1016 | linux-2.6.19.2.tar.b 100% |*******************************| 41727k 0:00:00 ETA | 1016 | linux-2.6.19.2.tar.b 100% |*******************************| 41727k 0:00:00 ETA |
| 1017 | </literallayout> | 1017 | </literallayout> |
| 1018 | Here we see entries not only for our wget load, but for other | 1018 | Here we see entries not only for our wget load, but for other |
| 1019 | processes running on the system as well: | 1019 | processes running on the system as well: |
| 1020 | </para> | 1020 | </para> |
| 1021 | 1021 | ||
| 1022 | <para> | 1022 | <para> |
| 1023 | <imagedata fileref="figures/perf-systemwide.png" width="6in" depth="7in" align="center" scalefit="1" /> | 1023 | <imagedata fileref="figures/perf-systemwide.png" width="6in" depth="7in" align="center" scalefit="1" /> |
| 1024 | </para> | 1024 | </para> |
| 1025 | 1025 | ||
| 1026 | <para> | 1026 | <para> |
| 1027 | In the snapshot above, we can see callchains that originate in | 1027 | In the snapshot above, we can see callchains that originate in |
| 1028 | libc, and a callchain from Xorg that demonstrates that we're | 1028 | libc, and a callchain from Xorg that demonstrates that we're |
| 1029 | using a proprietary X driver in userspace (notice the presence | 1029 | using a proprietary X driver in userspace (notice the presence |
| 1030 | of 'PVR' and some other unresolvable symbols in the expanded | 1030 | of 'PVR' and some other unresolvable symbols in the expanded |
| 1031 | Xorg callchain). | 1031 | Xorg callchain). |
| 1032 | </para> | 1032 | </para> |
| 1033 | 1033 | ||
| 1034 | <para> | 1034 | <para> |
| 1035 | Note also that we have both kernel and userspace entries in the | 1035 | Note also that we have both kernel and userspace entries in the |
| 1036 | above snapshot. We can also tell perf to focus on userspace but | 1036 | above snapshot. We can also tell perf to focus on userspace but |
| 1037 | providing a modifier, in this case 'u', to the 'cycles' hardware | 1037 | providing a modifier, in this case 'u', to the 'cycles' hardware |
| 1038 | counter when we record a profile: | 1038 | counter when we record a profile: |
| 1039 | <literallayout class='monospaced'> | 1039 | <literallayout class='monospaced'> |
| 1040 | root@crownbay:~# perf record -g -a -e cycles:u | 1040 | root@crownbay:~# perf record -g -a -e cycles:u |
| 1041 | ^C[ perf record: Woken up 2 times to write data ] | 1041 | ^C[ perf record: Woken up 2 times to write data ] |
| 1042 | [ perf record: Captured and wrote 0.376 MB perf.data (~16443 samples) ] | 1042 | [ perf record: Captured and wrote 0.376 MB perf.data (~16443 samples) ] |
| 1043 | </literallayout> | 1043 | </literallayout> |
| 1044 | </para> | 1044 | </para> |
| 1045 | 1045 | ||
| 1046 | <para> | 1046 | <para> |
| 1047 | <imagedata fileref="figures/perf-report-cycles-u.png" width="6in" depth="7in" align="center" scalefit="1" /> | 1047 | <imagedata fileref="figures/perf-report-cycles-u.png" width="6in" depth="7in" align="center" scalefit="1" /> |
| 1048 | </para> | 1048 | </para> |
| 1049 | 1049 | ||
| 1050 | <para> | 1050 | <para> |
| 1051 | Notice in the screenshot above, we see only userspace entries ([.]) | 1051 | Notice in the screenshot above, we see only userspace entries ([.]) |
| 1052 | </para> | 1052 | </para> |
| 1053 | 1053 | ||
| 1054 | <para> | 1054 | <para> |
| 1055 | Finally, we can press 'enter' on a leaf node and select the 'Zoom | 1055 | Finally, we can press 'enter' on a leaf node and select the 'Zoom |
| 1056 | into DSO' menu item to show only entries associated with a | 1056 | into DSO' menu item to show only entries associated with a |
| 1057 | specific DSO. In the screenshot below, we've zoomed into the | 1057 | specific DSO. In the screenshot below, we've zoomed into the |
| 1058 | 'libc' DSO which shows all the entries associated with the | 1058 | 'libc' DSO which shows all the entries associated with the |
| 1059 | libc-xxx.so DSO. | 1059 | libc-xxx.so DSO. |
| 1060 | </para> | 1060 | </para> |
| 1061 | 1061 | ||
| 1062 | <para> | 1062 | <para> |
| 1063 | <imagedata fileref="figures/perf-systemwide-libc.png" width="6in" depth="7in" align="center" scalefit="1" /> | 1063 | <imagedata fileref="figures/perf-systemwide-libc.png" width="6in" depth="7in" align="center" scalefit="1" /> |
| 1064 | </para> | 1064 | </para> |
| 1065 | 1065 | ||
| 1066 | <para> | 1066 | <para> |
| 1067 | We can also use the system-wide -a switch to do system-wide | 1067 | We can also use the system-wide -a switch to do system-wide |
| 1068 | tracing. Here we'll trace a couple of scheduler events: | 1068 | tracing. Here we'll trace a couple of scheduler events: |
| 1069 | <literallayout class='monospaced'> | 1069 | <literallayout class='monospaced'> |
| 1070 | root@crownbay:~# perf record -a -e sched:sched_switch -e sched:sched_wakeup | 1070 | root@crownbay:~# perf record -a -e sched:sched_switch -e sched:sched_wakeup |
| 1071 | ^C[ perf record: Woken up 38 times to write data ] | 1071 | ^C[ perf record: Woken up 38 times to write data ] |
| 1072 | [ perf record: Captured and wrote 9.780 MB perf.data (~427299 samples) ] | 1072 | [ perf record: Captured and wrote 9.780 MB perf.data (~427299 samples) ] |
| 1073 | </literallayout> | 1073 | </literallayout> |
| 1074 | We can look at the raw output using 'perf script' with no | 1074 | We can look at the raw output using 'perf script' with no |
| 1075 | arguments: | 1075 | arguments: |
| 1076 | <literallayout class='monospaced'> | 1076 | <literallayout class='monospaced'> |
| 1077 | root@crownbay:~# perf script | 1077 | root@crownbay:~# perf script |
| 1078 | 1078 | ||
| 1079 | perf 1383 [001] 6171.460045: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 | 1079 | perf 1383 [001] 6171.460045: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 |
| @@ -1086,21 +1086,21 @@ | |||
| 1086 | perf 1383 [001] 6171.470058: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120 | 1086 | perf 1383 [001] 6171.470058: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120 |
| 1087 | kworker/1:1 21 [001] 6171.470082: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120 | 1087 | kworker/1:1 21 [001] 6171.470082: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120 |
| 1088 | perf 1383 [001] 6171.480035: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 | 1088 | perf 1383 [001] 6171.480035: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 |
| 1089 | </literallayout> | 1089 | </literallayout> |
| 1090 | </para> | 1090 | </para> |
| 1091 | |||
| 1092 | <section id='perf-filtering'> | ||
| 1093 | <title>Filtering</title> | ||
| 1094 | 1091 | ||
| 1095 | <para> | 1092 | <section id='perf-filtering'> |
| 1096 | Notice that there are a lot of events that don't really have | 1093 | <title>Filtering</title> |
| 1097 | anything to do with what we're interested in, namely events | 1094 | |
| 1098 | that schedule 'perf' itself in and out or that wake perf up. | 1095 | <para> |
| 1099 | We can get rid of those by using the '--filter' option - | 1096 | Notice that there are a lot of events that don't really have |
| 1100 | for each event we specify using -e, we can add a --filter | 1097 | anything to do with what we're interested in, namely events |
| 1101 | after that to filter out trace events that contain fields | 1098 | that schedule 'perf' itself in and out or that wake perf up. |
| 1102 | with specific values: | 1099 | We can get rid of those by using the '--filter' option - |
| 1103 | <literallayout class='monospaced'> | 1100 | for each event we specify using -e, we can add a --filter |
| 1101 | after that to filter out trace events that contain fields | ||
| 1102 | with specific values: | ||
| 1103 | <literallayout class='monospaced'> | ||
| 1104 | root@crownbay:~# perf record -a -e sched:sched_switch --filter 'next_comm != perf && prev_comm != perf' -e sched:sched_wakeup --filter 'comm != perf' | 1104 | root@crownbay:~# perf record -a -e sched:sched_switch --filter 'next_comm != perf && prev_comm != perf' -e sched:sched_wakeup --filter 'comm != perf' |
| 1105 | ^C[ perf record: Woken up 38 times to write data ] | 1105 | ^C[ perf record: Woken up 38 times to write data ] |
| 1106 | [ perf record: Captured and wrote 9.688 MB perf.data (~423279 samples) ] | 1106 | [ perf record: Captured and wrote 9.688 MB perf.data (~423279 samples) ] |
| @@ -1122,59 +1122,59 @@ | |||
| 1122 | swapper 0 [000] 7932.326109: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000 | 1122 | swapper 0 [000] 7932.326109: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000 |
| 1123 | swapper 0 [000] 7932.326171: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120 | 1123 | swapper 0 [000] 7932.326171: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120 |
| 1124 | kworker/0:3 1209 [000] 7932.326214: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 | 1124 | kworker/0:3 1209 [000] 7932.326214: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 |
| 1125 | </literallayout> | 1125 | </literallayout> |
| 1126 | In this case, we've filtered out all events that have 'perf' | 1126 | In this case, we've filtered out all events that have 'perf' |
| 1127 | in their 'comm' or 'comm_prev' or 'comm_next' fields. Notice | 1127 | in their 'comm' or 'comm_prev' or 'comm_next' fields. Notice |
| 1128 | that there are still events recorded for perf, but notice | 1128 | that there are still events recorded for perf, but notice |
| 1129 | that those events don't have values of 'perf' for the filtered | 1129 | that those events don't have values of 'perf' for the filtered |
| 1130 | fields. To completely filter out anything from perf will | 1130 | fields. To completely filter out anything from perf will |
| 1131 | require a bit more work, but for the purpose of demonstrating | 1131 | require a bit more work, but for the purpose of demonstrating |
| 1132 | how to use filters, it's close enough. | 1132 | how to use filters, it's close enough. |
| 1133 | </para> | 1133 | </para> |
| 1134 | 1134 | ||
| 1135 | <note> | 1135 | <note> |
| 1136 | Tying It Together: These are exactly the same set of event | 1136 | Tying It Together: These are exactly the same set of event |
| 1137 | filters defined by the trace event subsystem. See the | 1137 | filters defined by the trace event subsystem. See the |
| 1138 | ftrace/tracecmd/kernelshark section for more discussion about | 1138 | ftrace/tracecmd/kernelshark section for more discussion about |
| 1139 | these event filters. | 1139 | these event filters. |
| 1140 | </note> | 1140 | </note> |
| 1141 | 1141 | ||
| 1142 | <note> | 1142 | <note> |
| 1143 | Tying It Together: These event filters are implemented by a | 1143 | Tying It Together: These event filters are implemented by a |
| 1144 | special-purpose pseudo-interpreter in the kernel and are an | 1144 | special-purpose pseudo-interpreter in the kernel and are an |
| 1145 | integral and indispensable part of the perf design as it | 1145 | integral and indispensable part of the perf design as it |
| 1146 | relates to tracing. kernel-based event filters provide a | 1146 | relates to tracing. kernel-based event filters provide a |
| 1147 | mechanism to precisely throttle the event stream that appears | 1147 | mechanism to precisely throttle the event stream that appears |
| 1148 | in user space, where it makes sense to provide bindings to real | 1148 | in user space, where it makes sense to provide bindings to real |
| 1149 | programming languages for postprocessing the event stream. | 1149 | programming languages for postprocessing the event stream. |
| 1150 | This architecture allows for the intelligent and flexible | 1150 | This architecture allows for the intelligent and flexible |
| 1151 | partitioning of processing between the kernel and user space. | 1151 | partitioning of processing between the kernel and user space. |
| 1152 | Contrast this with other tools such as SystemTap, which does | 1152 | Contrast this with other tools such as SystemTap, which does |
| 1153 | all of its processing in the kernel and as such requires a | 1153 | all of its processing in the kernel and as such requires a |
| 1154 | special project-defined language in order to accommodate that | 1154 | special project-defined language in order to accommodate that |
| 1155 | design, or LTTng, where everything is sent to userspace and | 1155 | design, or LTTng, where everything is sent to userspace and |
| 1156 | as such requires a super-efficient kernel-to-userspace | 1156 | as such requires a super-efficient kernel-to-userspace |
| 1157 | transport mechanism in order to function properly. While | 1157 | transport mechanism in order to function properly. While |
| 1158 | perf certainly can benefit from for instance advances in | 1158 | perf certainly can benefit from for instance advances in |
| 1159 | the design of the transport, it doesn't fundamentally depend | 1159 | the design of the transport, it doesn't fundamentally depend |
| 1160 | on them. Basically, if you find that your perf tracing | 1160 | on them. Basically, if you find that your perf tracing |
| 1161 | application is causing buffer I/O overruns, it probably | 1161 | application is causing buffer I/O overruns, it probably |
| 1162 | means that you aren't taking enough advantage of the | 1162 | means that you aren't taking enough advantage of the |
| 1163 | kernel filtering engine. | 1163 | kernel filtering engine. |
| 1164 | </note> | 1164 | </note> |
| 1165 | </section> | ||
| 1165 | </section> | 1166 | </section> |
| 1166 | </section> | ||
| 1167 | 1167 | ||
| 1168 | <section id='using-dynamic-tracepoints'> | 1168 | <section id='using-dynamic-tracepoints'> |
| 1169 | <title>Using Dynamic Tracepoints</title> | 1169 | <title>Using Dynamic Tracepoints</title> |
| 1170 | 1170 | ||
| 1171 | <para> | 1171 | <para> |
| 1172 | perf isn't restricted to the fixed set of static tracepoints | 1172 | perf isn't restricted to the fixed set of static tracepoints |
| 1173 | listed by 'perf list'. Users can also add their own 'dynamic' | 1173 | listed by 'perf list'. Users can also add their own 'dynamic' |
| 1174 | tracepoints anywhere in the kernel. For instance, suppose we | 1174 | tracepoints anywhere in the kernel. For instance, suppose we |
| 1175 | want to define our own tracepoint on do_fork(). We can do that | 1175 | want to define our own tracepoint on do_fork(). We can do that |
| 1176 | using the 'perf probe' perf subcommand: | 1176 | using the 'perf probe' perf subcommand: |
| 1177 | <literallayout class='monospaced'> | 1177 | <literallayout class='monospaced'> |
| 1178 | root@crownbay:~# perf probe do_fork | 1178 | root@crownbay:~# perf probe do_fork |
| 1179 | Added new event: | 1179 | Added new event: |
| 1180 | probe:do_fork (on do_fork) | 1180 | probe:do_fork (on do_fork) |
| @@ -1182,13 +1182,13 @@ | |||
| 1182 | You can now use it in all perf tools, such as: | 1182 | You can now use it in all perf tools, such as: |
| 1183 | 1183 | ||
| 1184 | perf record -e probe:do_fork -aR sleep 1 | 1184 | perf record -e probe:do_fork -aR sleep 1 |
| 1185 | </literallayout> | 1185 | </literallayout> |
| 1186 | Adding a new tracepoint via 'perf probe' results in an event | 1186 | Adding a new tracepoint via 'perf probe' results in an event |
| 1187 | with all the expected files and format in | 1187 | with all the expected files and format in |
| 1188 | /sys/kernel/debug/tracing/events, just the same as for static | 1188 | /sys/kernel/debug/tracing/events, just the same as for static |
| 1189 | tracepoints (as discussed in more detail in the trace events | 1189 | tracepoints (as discussed in more detail in the trace events |
| 1190 | subsystem section: | 1190 | subsystem section: |
| 1191 | <literallayout class='monospaced'> | 1191 | <literallayout class='monospaced'> |
| 1192 | root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# ls -al | 1192 | root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# ls -al |
| 1193 | drwxr-xr-x 2 root root 0 Oct 28 11:42 . | 1193 | drwxr-xr-x 2 root root 0 Oct 28 11:42 . |
| 1194 | drwxr-xr-x 3 root root 0 Oct 28 11:42 .. | 1194 | drwxr-xr-x 3 root root 0 Oct 28 11:42 .. |
| @@ -1210,23 +1210,23 @@ | |||
| 1210 | field:unsigned long __probe_ip; offset:12; size:4; signed:0; | 1210 | field:unsigned long __probe_ip; offset:12; size:4; signed:0; |
| 1211 | 1211 | ||
| 1212 | print fmt: "(%lx)", REC->__probe_ip | 1212 | print fmt: "(%lx)", REC->__probe_ip |
| 1213 | </literallayout> | 1213 | </literallayout> |
| 1214 | We can list all dynamic tracepoints currently in existence: | 1214 | We can list all dynamic tracepoints currently in existence: |
| 1215 | <literallayout class='monospaced'> | 1215 | <literallayout class='monospaced'> |
| 1216 | root@crownbay:~# perf probe -l | 1216 | root@crownbay:~# perf probe -l |
| 1217 | probe:do_fork (on do_fork) | 1217 | probe:do_fork (on do_fork) |
| 1218 | probe:schedule (on schedule) | 1218 | probe:schedule (on schedule) |
| 1219 | </literallayout> | 1219 | </literallayout> |
| 1220 | Let's record system-wide ('sleep 30' is a trick for recording | 1220 | Let's record system-wide ('sleep 30' is a trick for recording |
| 1221 | system-wide but basically do nothing and then wake up after | 1221 | system-wide but basically do nothing and then wake up after |
| 1222 | 30 seconds): | 1222 | 30 seconds): |
| 1223 | <literallayout class='monospaced'> | 1223 | <literallayout class='monospaced'> |
| 1224 | root@crownbay:~# perf record -g -a -e probe:do_fork sleep 30 | 1224 | root@crownbay:~# perf record -g -a -e probe:do_fork sleep 30 |
| 1225 | [ perf record: Woken up 1 times to write data ] | 1225 | [ perf record: Woken up 1 times to write data ] |
| 1226 | [ perf record: Captured and wrote 0.087 MB perf.data (~3812 samples) ] | 1226 | [ perf record: Captured and wrote 0.087 MB perf.data (~3812 samples) ] |
| 1227 | </literallayout> | 1227 | </literallayout> |
| 1228 | Using 'perf script' we can see each do_fork event that fired: | 1228 | Using 'perf script' we can see each do_fork event that fired: |
| 1229 | <literallayout class='monospaced'> | 1229 | <literallayout class='monospaced'> |
| 1230 | root@crownbay:~# perf script | 1230 | root@crownbay:~# perf script |
| 1231 | 1231 | ||
| 1232 | # ======== | 1232 | # ======== |
| @@ -1265,27 +1265,28 @@ | |||
| 1265 | matchbox-deskto 1197 [001] 34237.112138: do_fork: (c1028460) | 1265 | matchbox-deskto 1197 [001] 34237.112138: do_fork: (c1028460) |
| 1266 | matchbox-deskto 1311 [001] 34237.114106: do_fork: (c1028460) | 1266 | matchbox-deskto 1311 [001] 34237.114106: do_fork: (c1028460) |
| 1267 | gaku 1312 [000] 34237.202388: do_fork: (c1028460) | 1267 | gaku 1312 [000] 34237.202388: do_fork: (c1028460) |
| 1268 | </literallayout> | 1268 | </literallayout> |
| 1269 | And using 'perf report' on the same file, we can see the | 1269 | And using 'perf report' on the same file, we can see the |
| 1270 | callgraphs from starting a few programs during those 30 seconds: | 1270 | callgraphs from starting a few programs during those 30 seconds: |
| 1271 | </para> | 1271 | </para> |
| 1272 | 1272 | ||
| 1273 | <para> | 1273 | <para> |
| 1274 | <imagedata fileref="figures/perf-probe-do_fork-profile.png" width="6in" depth="7in" align="center" scalefit="1" /> | 1274 | <imagedata fileref="figures/perf-probe-do_fork-profile.png" width="6in" depth="7in" align="center" scalefit="1" /> |
| 1275 | </para> | 1275 | </para> |
| 1276 | 1276 | ||
| 1277 | <note> | 1277 | <note> |
| 1278 | Tying It Together: The trace events subsystem accomodate static | 1278 | Tying It Together: The trace events subsystem accomodate static |
| 1279 | and dynamic tracepoints in exactly the same way - there's no | 1279 | and dynamic tracepoints in exactly the same way - there's no |
| 1280 | difference as far as the infrastructure is concerned. See the | 1280 | difference as far as the infrastructure is concerned. See the |
| 1281 | ftrace section for more details on the trace event subsystem. | 1281 | ftrace section for more details on the trace event subsystem. |
| 1282 | </note> | 1282 | </note> |
| 1283 | 1283 | ||
| 1284 | <note> | 1284 | <note> |
| 1285 | Tying It Together: Dynamic tracepoints are implemented under the | 1285 | Tying It Together: Dynamic tracepoints are implemented under the |
| 1286 | covers by kprobes and uprobes. kprobes and uprobes are also used | 1286 | covers by kprobes and uprobes. kprobes and uprobes are also used |
| 1287 | by and in fact are the main focus of SystemTap. | 1287 | by and in fact are the main focus of SystemTap. |
| 1288 | </note> | 1288 | </note> |
| 1289 | </section> | ||
| 1289 | </section> | 1290 | </section> |
| 1290 | 1291 | ||
| 1291 | <section id='perf-documentation'> | 1292 | <section id='perf-documentation'> |
