diff options
| -rw-r--r-- | documentation/profile-manual/profile-manual-usage.rst | 66 |
1 files changed, 66 insertions, 0 deletions
diff --git a/documentation/profile-manual/profile-manual-usage.rst b/documentation/profile-manual/profile-manual-usage.rst index b97a1c6e96..85b78d72c5 100644 --- a/documentation/profile-manual/profile-manual-usage.rst +++ b/documentation/profile-manual/profile-manual-usage.rst | |||
| @@ -160,6 +160,9 @@ and wrote 0.176 MB perf.data (~7700 samples) ] To see the results in a | |||
| 160 | perf.data file in the current working directory and display the results | 160 | perf.data file in the current working directory and display the results |
| 161 | in an interactive UI: root@crownbay:~# perf report | 161 | in an interactive UI: root@crownbay:~# perf report |
| 162 | 162 | ||
| 163 | .. image:: figures/perf-wget-flat-stripped.png | ||
| 164 | :align: center | ||
| 165 | |||
| 163 | The above screenshot displays a 'flat' profile, one entry for each | 166 | The above screenshot displays a 'flat' profile, one entry for each |
| 164 | 'bucket' corresponding to the functions that were profiled during the | 167 | 'bucket' corresponding to the functions that were profiled during the |
| 165 | profiling run, ordered from the most popular to the least (perf has | 168 | profiling run, ordered from the most popular to the least (perf has |
| @@ -189,6 +192,9 @@ linux-2.6.19.2.tar.b 100% | |||
| 189 | and wrote 0.652 MB perf.data (~28476 samples) ] root@crownbay:~# perf | 192 | and wrote 0.652 MB perf.data (~28476 samples) ] root@crownbay:~# perf |
| 190 | report | 193 | report |
| 191 | 194 | ||
| 195 | .. image:: figures/perf-wget-g-copy-to-user-expanded-stripped.png | ||
| 196 | :align: center | ||
| 197 | |||
| 192 | Using the callgraph view, we can actually see not only which functions | 198 | Using the callgraph view, we can actually see not only which functions |
| 193 | took the most time, but we can also see a summary of how those functions | 199 | took the most time, but we can also see a summary of how those functions |
| 194 | were called and learn something about how the program interacts with the | 200 | were called and learn something about how the program interacts with the |
| @@ -222,6 +228,9 @@ displayed in the callstack, here in the expanded sys_clock_gettime() | |||
| 222 | function. Later we'll see it resolve to a userspace function call in | 228 | function. Later we'll see it resolve to a userspace function call in |
| 223 | busybox. | 229 | busybox. |
| 224 | 230 | ||
| 231 | .. image:: figures/perf-wget-g-copy-from-user-expanded-stripped.png | ||
| 232 | :align: center | ||
| 233 | |||
| 225 | The above screenshot shows the other half of the journey for the data - | 234 | The above screenshot shows the other half of the journey for the data - |
| 226 | from the wget program's userspace buffers to disk. To get the buffers to | 235 | from the wget program's userspace buffers to disk. To get the buffers to |
| 227 | disk, the wget program issues a write(2), which does a copy-from-user to | 236 | disk, the wget program issues a write(2), which does a copy-from-user to |
| @@ -236,6 +245,9 @@ actually implemented as an applet in busybox, so while the process name | |||
| 236 | is 'wget', the executable we're actually interested in is busybox. So | 245 | is 'wget', the executable we're actually interested in is busybox. So |
| 237 | let's expand the first entry containing busybox: | 246 | let's expand the first entry containing busybox: |
| 238 | 247 | ||
| 248 | .. image:: figures/perf-wget-busybox-expanded-stripped.png | ||
| 249 | :align: center | ||
| 250 | |||
| 239 | Again, before we expanded we saw that the function was labeled with a | 251 | Again, before we expanded we saw that the function was labeled with a |
| 240 | hex value instead of a symbol as with most of the kernel entries. | 252 | hex value instead of a symbol as with most of the kernel entries. |
| 241 | Expanding the busybox entry doesn't make it any better. | 253 | Expanding the busybox entry doesn't make it any better. |
| @@ -270,14 +282,23 @@ busybox-dbg-1.20.2-r2.core2_32.rpm Now that the debuginfo is installed, | |||
| 270 | we see that the busybox entries now display their functions | 282 | we see that the busybox entries now display their functions |
| 271 | symbolically: | 283 | symbolically: |
| 272 | 284 | ||
| 285 | .. image:: figures/perf-wget-busybox-debuginfo.png | ||
| 286 | :align: center | ||
| 287 | |||
| 273 | If we expand one of the entries and press 'enter' on a leaf node, we're | 288 | If we expand one of the entries and press 'enter' on a leaf node, we're |
| 274 | presented with a menu of actions we can take to get more information | 289 | presented with a menu of actions we can take to get more information |
| 275 | related to that entry: | 290 | related to that entry: |
| 276 | 291 | ||
| 292 | .. image:: figures/perf-wget-busybox-dso-zoom-menu.png | ||
| 293 | :align: center | ||
| 294 | |||
| 277 | One of these actions allows us to show a view that displays a | 295 | One of these actions allows us to show a view that displays a |
| 278 | busybox-centric view of the profiled functions (in this case we've also | 296 | busybox-centric view of the profiled functions (in this case we've also |
| 279 | expanded all the nodes using the 'E' key): | 297 | expanded all the nodes using the 'E' key): |
| 280 | 298 | ||
| 299 | .. image:: figures/perf-wget-busybox-dso-zoom.png | ||
| 300 | :align: center | ||
| 301 | |||
| 281 | Finally, we can see that now that the busybox debuginfo is installed, | 302 | Finally, we can see that now that the busybox debuginfo is installed, |
| 282 | the previously unresolved symbol in the sys_clock_gettime() entry | 303 | the previously unresolved symbol in the sys_clock_gettime() entry |
| 283 | mentioned previously is now resolved, and shows that the | 304 | mentioned previously is now resolved, and shows that the |
| @@ -285,16 +306,25 @@ sys_clock_gettime system call that was the source of 6.75% of the | |||
| 285 | copy-to-user overhead was initiated by the handle_input() busybox | 306 | copy-to-user overhead was initiated by the handle_input() busybox |
| 286 | function: | 307 | function: |
| 287 | 308 | ||
| 309 | .. image:: figures/perf-wget-g-copy-to-user-expanded-debuginfo.png | ||
| 310 | :align: center | ||
| 311 | |||
| 288 | At the lowest level of detail, we can dive down to the assembly level | 312 | At the lowest level of detail, we can dive down to the assembly level |
| 289 | and see which instructions caused the most overhead in a function. | 313 | and see which instructions caused the most overhead in a function. |
| 290 | Pressing 'enter' on the 'udhcpc_main' function, we're again presented | 314 | Pressing 'enter' on the 'udhcpc_main' function, we're again presented |
| 291 | with a menu: | 315 | with a menu: |
| 292 | 316 | ||
| 317 | .. image:: figures/perf-wget-busybox-annotate-menu.png | ||
| 318 | :align: center | ||
| 319 | |||
| 293 | Selecting 'Annotate udhcpc_main', we get a detailed listing of | 320 | Selecting 'Annotate udhcpc_main', we get a detailed listing of |
| 294 | percentages by instruction for the udhcpc_main function. From the | 321 | percentages by instruction for the udhcpc_main function. From the |
| 295 | display, we can see that over 50% of the time spent in this function is | 322 | display, we can see that over 50% of the time spent in this function is |
| 296 | taken up by a couple tests and the move of a constant (1) to a register: | 323 | taken up by a couple tests and the move of a constant (1) to a register: |
| 297 | 324 | ||
| 325 | .. image:: figures/perf-wget-busybox-annotate-udhcpc.png | ||
| 326 | :align: center | ||
| 327 | |||
| 298 | As a segue into tracing, let's try another profile using a different | 328 | As a segue into tracing, let's try another profile using a different |
| 299 | counter, something other than the default 'cycles'. | 329 | counter, something other than the default 'cycles'. |
| 300 | 330 | ||
| @@ -424,6 +454,9 @@ and tell perf to do a profile using it as the sampling event: | |||
| 424 | root@crownbay:~# perf record -g -e sched:sched_wakeup wget | 454 | root@crownbay:~# perf record -g -e sched:sched_wakeup wget |
| 425 | http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 | 455 | http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 |
| 426 | 456 | ||
| 457 | .. image:: figures/sched-wakeup-profile.png | ||
| 458 | :align: center | ||
| 459 | |||
| 427 | The screenshot above shows the results of running a profile using | 460 | The screenshot above shows the results of running a profile using |
| 428 | sched:sched_switch tracepoint, which shows the relative costs of various | 461 | sched:sched_switch tracepoint, which shows the relative costs of various |
| 429 | paths to sched_wakeup (note that sched_wakeup is the name of the | 462 | paths to sched_wakeup (note that sched_wakeup is the name of the |
| @@ -653,6 +686,9 @@ linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k | |||
| 653 | 0:00:00 ETA Here we see entries not only for our wget load, but for | 686 | 0:00:00 ETA Here we see entries not only for our wget load, but for |
| 654 | other processes running on the system as well: | 687 | other processes running on the system as well: |
| 655 | 688 | ||
| 689 | .. image:: figures/perf-systemwide.png | ||
| 690 | :align: center | ||
| 691 | |||
| 656 | In the snapshot above, we can see callchains that originate in libc, and | 692 | In the snapshot above, we can see callchains that originate in libc, and |
| 657 | a callchain from Xorg that demonstrates that we're using a proprietary X | 693 | a callchain from Xorg that demonstrates that we're using a proprietary X |
| 658 | driver in userspace (notice the presence of 'PVR' and some other | 694 | driver in userspace (notice the presence of 'PVR' and some other |
| @@ -665,6 +701,9 @@ record a profile: root@crownbay:~# perf record -g -a -e cycles:u ^C[ | |||
| 665 | perf record: Woken up 2 times to write data ] [ perf record: Captured | 701 | perf record: Woken up 2 times to write data ] [ perf record: Captured |
| 666 | and wrote 0.376 MB perf.data (~16443 samples) ] | 702 | and wrote 0.376 MB perf.data (~16443 samples) ] |
| 667 | 703 | ||
| 704 | .. image:: figures/perf-report-cycles-u.png | ||
| 705 | :align: center | ||
| 706 | |||
| 668 | Notice in the screenshot above, we see only userspace entries ([.]) | 707 | Notice in the screenshot above, we see only userspace entries ([.]) |
| 669 | 708 | ||
| 670 | Finally, we can press 'enter' on a leaf node and select the 'Zoom into | 709 | Finally, we can press 'enter' on a leaf node and select the 'Zoom into |
| @@ -672,6 +711,9 @@ DSO' menu item to show only entries associated with a specific DSO. In | |||
| 672 | the screenshot below, we've zoomed into the 'libc' DSO which shows all | 711 | the screenshot below, we've zoomed into the 'libc' DSO which shows all |
| 673 | the entries associated with the libc-xxx.so DSO. | 712 | the entries associated with the libc-xxx.so DSO. |
| 674 | 713 | ||
| 714 | .. image:: figures/perf-systemwide-libc.png | ||
| 715 | :align: center | ||
| 716 | |||
| 675 | We can also use the system-wide -a switch to do system-wide tracing. | 717 | We can also use the system-wide -a switch to do system-wide tracing. |
| 676 | Here we'll trace a couple of scheduler events: root@crownbay:~# perf | 718 | Here we'll trace a couple of scheduler events: root@crownbay:~# perf |
| 677 | record -a -e sched:sched_switch -e sched:sched_wakeup ^C[ perf record: | 719 | record -a -e sched:sched_switch -e sched:sched_wakeup ^C[ perf record: |
| @@ -838,6 +880,9 @@ gthumb 1300 [001] 34219.264551: do_fork: (c1028460) pcmanfm 1296 [000] | |||
| 838 | (c1028460) And using 'perf report' on the same file, we can see the | 880 | (c1028460) And using 'perf report' on the same file, we can see the |
| 839 | callgraphs from starting a few programs during those 30 seconds: | 881 | callgraphs from starting a few programs during those 30 seconds: |
| 840 | 882 | ||
| 883 | .. image:: figures/perf-probe-do_fork-profile.png | ||
| 884 | :align: center | ||
| 885 | |||
| 841 | .. container:: informalexample | 886 | .. container:: informalexample |
| 842 | 887 | ||
| 843 | Tying it Together: | 888 | Tying it Together: |
| @@ -1312,6 +1357,9 @@ choosing from the kernelshark menu: Capture \| Record That will display | |||
| 1312 | the following dialog, which allows you to choose one or more events (or | 1357 | the following dialog, which allows you to choose one or more events (or |
| 1313 | even one or more complete subsystems) to trace: | 1358 | even one or more complete subsystems) to trace: |
| 1314 | 1359 | ||
| 1360 | .. image:: figures/kernelshark-choose-events.png | ||
| 1361 | :align: center | ||
| 1362 | |||
| 1315 | Note that these are exactly the same sets of events described in the | 1363 | Note that these are exactly the same sets of events described in the |
| 1316 | previous trace events subsystem section, and in fact is where trace-cmd | 1364 | previous trace events subsystem section, and in fact is where trace-cmd |
| 1317 | gets them for kernelshark. | 1365 | gets them for kernelshark. |
| @@ -1324,6 +1372,9 @@ After doing that, we can start and stop the trace using the 'Run' and | |||
| 1324 | 'Stop' button on the lower right corner of the dialog (the same button | 1372 | 'Stop' button on the lower right corner of the dialog (the same button |
| 1325 | will turn into the 'Stop' button after the trace has started): | 1373 | will turn into the 'Stop' button after the trace has started): |
| 1326 | 1374 | ||
| 1375 | .. image:: figures/kernelshark-output-display.png | ||
| 1376 | :align: center | ||
| 1377 | |||
| 1327 | Notice that the right-hand pane shows the exact trace-cmd command-line | 1378 | Notice that the right-hand pane shows the exact trace-cmd command-line |
| 1328 | that's used to run the trace, along with the results of the trace-cmd | 1379 | that's used to run the trace, along with the results of the trace-cmd |
| 1329 | run. | 1380 | run. |
| @@ -1332,9 +1383,15 @@ Once the 'Stop' button is pressed, the graphical view magically fills up | |||
| 1332 | with a colorful per-cpu display of the trace data, along with the | 1383 | with a colorful per-cpu display of the trace data, along with the |
| 1333 | detailed event listing below that: | 1384 | detailed event listing below that: |
| 1334 | 1385 | ||
| 1386 | .. image:: figures/kernelshark-i915-display.png | ||
| 1387 | :align: center | ||
| 1388 | |||
| 1335 | Here's another example, this time a display resulting from tracing 'all | 1389 | Here's another example, this time a display resulting from tracing 'all |
| 1336 | events': | 1390 | events': |
| 1337 | 1391 | ||
| 1392 | .. image:: figures/kernelshark-all.png | ||
| 1393 | :align: center | ||
| 1394 | |||
| 1338 | The tool is pretty self-explanatory, but for more detailed information | 1395 | The tool is pretty self-explanatory, but for more detailed information |
| 1339 | on navigating through the data, see the `kernelshark | 1396 | on navigating through the data, see the `kernelshark |
| 1340 | website <http://rostedt.homelinux.com/kernelshark/>`__. | 1397 | website <http://rostedt.homelinux.com/kernelshark/>`__. |
| @@ -1543,6 +1600,9 @@ press the 'Profile' button. | |||
| 1543 | Once you've pressed the profile button, the three panes will fill up | 1600 | Once you've pressed the profile button, the three panes will fill up |
| 1544 | with profiling data: | 1601 | with profiling data: |
| 1545 | 1602 | ||
| 1603 | .. image:: figures/sysprof-copy-to-user.png | ||
| 1604 | :align: center | ||
| 1605 | |||
| 1546 | The left pane shows a list of functions and processes. Selecting one of | 1606 | The left pane shows a list of functions and processes. Selecting one of |
| 1547 | those expands that function in the right pane, showing all its callees. | 1607 | those expands that function in the right pane, showing all its callees. |
| 1548 | Note that this caller-oriented display is essentially the inverse of | 1608 | Note that this caller-oriented display is essentially the inverse of |
| @@ -1554,6 +1614,9 @@ looking up the callchain we can see that one of the callers of | |||
| 1554 | Notice that this is essentially a portion of the same information we saw | 1614 | Notice that this is essentially a portion of the same information we saw |
| 1555 | in the perf display shown in the perf section of this page. | 1615 | in the perf display shown in the perf section of this page. |
| 1556 | 1616 | ||
| 1617 | .. image:: figures/sysprof-copy-from-user.png | ||
| 1618 | :align: center | ||
| 1619 | |||
| 1557 | Similarly, the above is a snapshot of the Sysprof display of a | 1620 | Similarly, the above is a snapshot of the Sysprof display of a |
| 1558 | copy-from-user callchain. | 1621 | copy-from-user callchain. |
| 1559 | 1622 | ||
| @@ -1562,6 +1625,9 @@ a list of all the callers of a particular function selected in the top | |||
| 1562 | left pane. In this case, the lower pane is showing all the callers of | 1625 | left pane. In this case, the lower pane is showing all the callers of |
| 1563 | \__mark_inode_dirty: | 1626 | \__mark_inode_dirty: |
| 1564 | 1627 | ||
| 1628 | .. image:: figures/sysprof-callers.png | ||
| 1629 | :align: center | ||
| 1630 | |||
| 1565 | Double-clicking on one of those functions will in turn change the focus | 1631 | Double-clicking on one of those functions will in turn change the focus |
| 1566 | to the selected function, and so on. | 1632 | to the selected function, and so on. |
| 1567 | 1633 | ||
