diff options
Diffstat (limited to 'documentation/profile-manual/profile-manual-usage.rst')
-rw-r--r-- | documentation/profile-manual/profile-manual-usage.rst | 66 |
1 files changed, 66 insertions, 0 deletions
diff --git a/documentation/profile-manual/profile-manual-usage.rst b/documentation/profile-manual/profile-manual-usage.rst index b97a1c6e96..85b78d72c5 100644 --- a/documentation/profile-manual/profile-manual-usage.rst +++ b/documentation/profile-manual/profile-manual-usage.rst | |||
@@ -160,6 +160,9 @@ and wrote 0.176 MB perf.data (~7700 samples) ] To see the results in a | |||
160 | perf.data file in the current working directory and display the results | 160 | perf.data file in the current working directory and display the results |
161 | in an interactive UI: root@crownbay:~# perf report | 161 | in an interactive UI: root@crownbay:~# perf report |
162 | 162 | ||
163 | .. image:: figures/perf-wget-flat-stripped.png | ||
164 | :align: center | ||
165 | |||
163 | The above screenshot displays a 'flat' profile, one entry for each | 166 | The above screenshot displays a 'flat' profile, one entry for each |
164 | 'bucket' corresponding to the functions that were profiled during the | 167 | 'bucket' corresponding to the functions that were profiled during the |
165 | profiling run, ordered from the most popular to the least (perf has | 168 | profiling run, ordered from the most popular to the least (perf has |
@@ -189,6 +192,9 @@ linux-2.6.19.2.tar.b 100% | |||
189 | and wrote 0.652 MB perf.data (~28476 samples) ] root@crownbay:~# perf | 192 | and wrote 0.652 MB perf.data (~28476 samples) ] root@crownbay:~# perf |
190 | report | 193 | report |
191 | 194 | ||
195 | .. image:: figures/perf-wget-g-copy-to-user-expanded-stripped.png | ||
196 | :align: center | ||
197 | |||
192 | Using the callgraph view, we can actually see not only which functions | 198 | Using the callgraph view, we can actually see not only which functions |
193 | took the most time, but we can also see a summary of how those functions | 199 | took the most time, but we can also see a summary of how those functions |
194 | were called and learn something about how the program interacts with the | 200 | were called and learn something about how the program interacts with the |
@@ -222,6 +228,9 @@ displayed in the callstack, here in the expanded sys_clock_gettime() | |||
222 | function. Later we'll see it resolve to a userspace function call in | 228 | function. Later we'll see it resolve to a userspace function call in |
223 | busybox. | 229 | busybox. |
224 | 230 | ||
231 | .. image:: figures/perf-wget-g-copy-from-user-expanded-stripped.png | ||
232 | :align: center | ||
233 | |||
225 | The above screenshot shows the other half of the journey for the data - | 234 | The above screenshot shows the other half of the journey for the data - |
226 | from the wget program's userspace buffers to disk. To get the buffers to | 235 | from the wget program's userspace buffers to disk. To get the buffers to |
227 | disk, the wget program issues a write(2), which does a copy-from-user to | 236 | disk, the wget program issues a write(2), which does a copy-from-user to |
@@ -236,6 +245,9 @@ actually implemented as an applet in busybox, so while the process name | |||
236 | is 'wget', the executable we're actually interested in is busybox. So | 245 | is 'wget', the executable we're actually interested in is busybox. So |
237 | let's expand the first entry containing busybox: | 246 | let's expand the first entry containing busybox: |
238 | 247 | ||
248 | .. image:: figures/perf-wget-busybox-expanded-stripped.png | ||
249 | :align: center | ||
250 | |||
239 | Again, before we expanded we saw that the function was labeled with a | 251 | Again, before we expanded we saw that the function was labeled with a |
240 | hex value instead of a symbol as with most of the kernel entries. | 252 | hex value instead of a symbol as with most of the kernel entries. |
241 | Expanding the busybox entry doesn't make it any better. | 253 | Expanding the busybox entry doesn't make it any better. |
@@ -270,14 +282,23 @@ busybox-dbg-1.20.2-r2.core2_32.rpm Now that the debuginfo is installed, | |||
270 | we see that the busybox entries now display their functions | 282 | we see that the busybox entries now display their functions |
271 | symbolically: | 283 | symbolically: |
272 | 284 | ||
285 | .. image:: figures/perf-wget-busybox-debuginfo.png | ||
286 | :align: center | ||
287 | |||
273 | If we expand one of the entries and press 'enter' on a leaf node, we're | 288 | If we expand one of the entries and press 'enter' on a leaf node, we're |
274 | presented with a menu of actions we can take to get more information | 289 | presented with a menu of actions we can take to get more information |
275 | related to that entry: | 290 | related to that entry: |
276 | 291 | ||
292 | .. image:: figures/perf-wget-busybox-dso-zoom-menu.png | ||
293 | :align: center | ||
294 | |||
277 | One of these actions allows us to show a view that displays a | 295 | One of these actions allows us to show a view that displays a |
278 | busybox-centric view of the profiled functions (in this case we've also | 296 | busybox-centric view of the profiled functions (in this case we've also |
279 | expanded all the nodes using the 'E' key): | 297 | expanded all the nodes using the 'E' key): |
280 | 298 | ||
299 | .. image:: figures/perf-wget-busybox-dso-zoom.png | ||
300 | :align: center | ||
301 | |||
281 | Finally, we can see that now that the busybox debuginfo is installed, | 302 | Finally, we can see that now that the busybox debuginfo is installed, |
282 | the previously unresolved symbol in the sys_clock_gettime() entry | 303 | the previously unresolved symbol in the sys_clock_gettime() entry |
283 | mentioned previously is now resolved, and shows that the | 304 | mentioned previously is now resolved, and shows that the |
@@ -285,16 +306,25 @@ sys_clock_gettime system call that was the source of 6.75% of the | |||
285 | copy-to-user overhead was initiated by the handle_input() busybox | 306 | copy-to-user overhead was initiated by the handle_input() busybox |
286 | function: | 307 | function: |
287 | 308 | ||
309 | .. image:: figures/perf-wget-g-copy-to-user-expanded-debuginfo.png | ||
310 | :align: center | ||
311 | |||
288 | At the lowest level of detail, we can dive down to the assembly level | 312 | At the lowest level of detail, we can dive down to the assembly level |
289 | and see which instructions caused the most overhead in a function. | 313 | and see which instructions caused the most overhead in a function. |
290 | Pressing 'enter' on the 'udhcpc_main' function, we're again presented | 314 | Pressing 'enter' on the 'udhcpc_main' function, we're again presented |
291 | with a menu: | 315 | with a menu: |
292 | 316 | ||
317 | .. image:: figures/perf-wget-busybox-annotate-menu.png | ||
318 | :align: center | ||
319 | |||
293 | Selecting 'Annotate udhcpc_main', we get a detailed listing of | 320 | Selecting 'Annotate udhcpc_main', we get a detailed listing of |
294 | percentages by instruction for the udhcpc_main function. From the | 321 | percentages by instruction for the udhcpc_main function. From the |
295 | display, we can see that over 50% of the time spent in this function is | 322 | display, we can see that over 50% of the time spent in this function is |
296 | taken up by a couple tests and the move of a constant (1) to a register: | 323 | taken up by a couple tests and the move of a constant (1) to a register: |
297 | 324 | ||
325 | .. image:: figures/perf-wget-busybox-annotate-udhcpc.png | ||
326 | :align: center | ||
327 | |||
298 | As a segue into tracing, let's try another profile using a different | 328 | As a segue into tracing, let's try another profile using a different |
299 | counter, something other than the default 'cycles'. | 329 | counter, something other than the default 'cycles'. |
300 | 330 | ||
@@ -424,6 +454,9 @@ and tell perf to do a profile using it as the sampling event: | |||
424 | root@crownbay:~# perf record -g -e sched:sched_wakeup wget | 454 | root@crownbay:~# perf record -g -e sched:sched_wakeup wget |
425 | http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 | 455 | http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 |
426 | 456 | ||
457 | .. image:: figures/sched-wakeup-profile.png | ||
458 | :align: center | ||
459 | |||
427 | The screenshot above shows the results of running a profile using | 460 | The screenshot above shows the results of running a profile using |
428 | sched:sched_switch tracepoint, which shows the relative costs of various | 461 | sched:sched_switch tracepoint, which shows the relative costs of various |
429 | paths to sched_wakeup (note that sched_wakeup is the name of the | 462 | paths to sched_wakeup (note that sched_wakeup is the name of the |
@@ -653,6 +686,9 @@ linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k | |||
653 | 0:00:00 ETA Here we see entries not only for our wget load, but for | 686 | 0:00:00 ETA Here we see entries not only for our wget load, but for |
654 | other processes running on the system as well: | 687 | other processes running on the system as well: |
655 | 688 | ||
689 | .. image:: figures/perf-systemwide.png | ||
690 | :align: center | ||
691 | |||
656 | In the snapshot above, we can see callchains that originate in libc, and | 692 | In the snapshot above, we can see callchains that originate in libc, and |
657 | a callchain from Xorg that demonstrates that we're using a proprietary X | 693 | a callchain from Xorg that demonstrates that we're using a proprietary X |
658 | driver in userspace (notice the presence of 'PVR' and some other | 694 | driver in userspace (notice the presence of 'PVR' and some other |
@@ -665,6 +701,9 @@ record a profile: root@crownbay:~# perf record -g -a -e cycles:u ^C[ | |||
665 | perf record: Woken up 2 times to write data ] [ perf record: Captured | 701 | perf record: Woken up 2 times to write data ] [ perf record: Captured |
666 | and wrote 0.376 MB perf.data (~16443 samples) ] | 702 | and wrote 0.376 MB perf.data (~16443 samples) ] |
667 | 703 | ||
704 | .. image:: figures/perf-report-cycles-u.png | ||
705 | :align: center | ||
706 | |||
668 | Notice in the screenshot above, we see only userspace entries ([.]) | 707 | Notice in the screenshot above, we see only userspace entries ([.]) |
669 | 708 | ||
670 | Finally, we can press 'enter' on a leaf node and select the 'Zoom into | 709 | Finally, we can press 'enter' on a leaf node and select the 'Zoom into |
@@ -672,6 +711,9 @@ DSO' menu item to show only entries associated with a specific DSO. In | |||
672 | the screenshot below, we've zoomed into the 'libc' DSO which shows all | 711 | the screenshot below, we've zoomed into the 'libc' DSO which shows all |
673 | the entries associated with the libc-xxx.so DSO. | 712 | the entries associated with the libc-xxx.so DSO. |
674 | 713 | ||
714 | .. image:: figures/perf-systemwide-libc.png | ||
715 | :align: center | ||
716 | |||
675 | We can also use the system-wide -a switch to do system-wide tracing. | 717 | We can also use the system-wide -a switch to do system-wide tracing. |
676 | Here we'll trace a couple of scheduler events: root@crownbay:~# perf | 718 | Here we'll trace a couple of scheduler events: root@crownbay:~# perf |
677 | record -a -e sched:sched_switch -e sched:sched_wakeup ^C[ perf record: | 719 | record -a -e sched:sched_switch -e sched:sched_wakeup ^C[ perf record: |
@@ -838,6 +880,9 @@ gthumb 1300 [001] 34219.264551: do_fork: (c1028460) pcmanfm 1296 [000] | |||
838 | (c1028460) And using 'perf report' on the same file, we can see the | 880 | (c1028460) And using 'perf report' on the same file, we can see the |
839 | callgraphs from starting a few programs during those 30 seconds: | 881 | callgraphs from starting a few programs during those 30 seconds: |
840 | 882 | ||
883 | .. image:: figures/perf-probe-do_fork-profile.png | ||
884 | :align: center | ||
885 | |||
841 | .. container:: informalexample | 886 | .. container:: informalexample |
842 | 887 | ||
843 | Tying it Together: | 888 | Tying it Together: |
@@ -1312,6 +1357,9 @@ choosing from the kernelshark menu: Capture \| Record That will display | |||
1312 | the following dialog, which allows you to choose one or more events (or | 1357 | the following dialog, which allows you to choose one or more events (or |
1313 | even one or more complete subsystems) to trace: | 1358 | even one or more complete subsystems) to trace: |
1314 | 1359 | ||
1360 | .. image:: figures/kernelshark-choose-events.png | ||
1361 | :align: center | ||
1362 | |||
1315 | Note that these are exactly the same sets of events described in the | 1363 | Note that these are exactly the same sets of events described in the |
1316 | previous trace events subsystem section, and in fact is where trace-cmd | 1364 | previous trace events subsystem section, and in fact is where trace-cmd |
1317 | gets them for kernelshark. | 1365 | gets them for kernelshark. |
@@ -1324,6 +1372,9 @@ After doing that, we can start and stop the trace using the 'Run' and | |||
1324 | 'Stop' button on the lower right corner of the dialog (the same button | 1372 | 'Stop' button on the lower right corner of the dialog (the same button |
1325 | will turn into the 'Stop' button after the trace has started): | 1373 | will turn into the 'Stop' button after the trace has started): |
1326 | 1374 | ||
1375 | .. image:: figures/kernelshark-output-display.png | ||
1376 | :align: center | ||
1377 | |||
1327 | Notice that the right-hand pane shows the exact trace-cmd command-line | 1378 | Notice that the right-hand pane shows the exact trace-cmd command-line |
1328 | that's used to run the trace, along with the results of the trace-cmd | 1379 | that's used to run the trace, along with the results of the trace-cmd |
1329 | run. | 1380 | run. |
@@ -1332,9 +1383,15 @@ Once the 'Stop' button is pressed, the graphical view magically fills up | |||
1332 | with a colorful per-cpu display of the trace data, along with the | 1383 | with a colorful per-cpu display of the trace data, along with the |
1333 | detailed event listing below that: | 1384 | detailed event listing below that: |
1334 | 1385 | ||
1386 | .. image:: figures/kernelshark-i915-display.png | ||
1387 | :align: center | ||
1388 | |||
1335 | Here's another example, this time a display resulting from tracing 'all | 1389 | Here's another example, this time a display resulting from tracing 'all |
1336 | events': | 1390 | events': |
1337 | 1391 | ||
1392 | .. image:: figures/kernelshark-all.png | ||
1393 | :align: center | ||
1394 | |||
1338 | The tool is pretty self-explanatory, but for more detailed information | 1395 | The tool is pretty self-explanatory, but for more detailed information |
1339 | on navigating through the data, see the `kernelshark | 1396 | on navigating through the data, see the `kernelshark |
1340 | website <http://rostedt.homelinux.com/kernelshark/>`__. | 1397 | website <http://rostedt.homelinux.com/kernelshark/>`__. |
@@ -1543,6 +1600,9 @@ press the 'Profile' button. | |||
1543 | Once you've pressed the profile button, the three panes will fill up | 1600 | Once you've pressed the profile button, the three panes will fill up |
1544 | with profiling data: | 1601 | with profiling data: |
1545 | 1602 | ||
1603 | .. image:: figures/sysprof-copy-to-user.png | ||
1604 | :align: center | ||
1605 | |||
1546 | The left pane shows a list of functions and processes. Selecting one of | 1606 | The left pane shows a list of functions and processes. Selecting one of |
1547 | those expands that function in the right pane, showing all its callees. | 1607 | those expands that function in the right pane, showing all its callees. |
1548 | Note that this caller-oriented display is essentially the inverse of | 1608 | Note that this caller-oriented display is essentially the inverse of |
@@ -1554,6 +1614,9 @@ looking up the callchain we can see that one of the callers of | |||
1554 | Notice that this is essentially a portion of the same information we saw | 1614 | Notice that this is essentially a portion of the same information we saw |
1555 | in the perf display shown in the perf section of this page. | 1615 | in the perf display shown in the perf section of this page. |
1556 | 1616 | ||
1617 | .. image:: figures/sysprof-copy-from-user.png | ||
1618 | :align: center | ||
1619 | |||
1557 | Similarly, the above is a snapshot of the Sysprof display of a | 1620 | Similarly, the above is a snapshot of the Sysprof display of a |
1558 | copy-from-user callchain. | 1621 | copy-from-user callchain. |
1559 | 1622 | ||
@@ -1562,6 +1625,9 @@ a list of all the callers of a particular function selected in the top | |||
1562 | left pane. In this case, the lower pane is showing all the callers of | 1625 | left pane. In this case, the lower pane is showing all the callers of |
1563 | \__mark_inode_dirty: | 1626 | \__mark_inode_dirty: |
1564 | 1627 | ||
1628 | .. image:: figures/sysprof-callers.png | ||
1629 | :align: center | ||
1630 | |||
1565 | Double-clicking on one of those functions will in turn change the focus | 1631 | Double-clicking on one of those functions will in turn change the focus |
1566 | to the selected function, and so on. | 1632 | to the selected function, and so on. |
1567 | 1633 | ||