summaryrefslogtreecommitdiffstats
path: root/documentation/profile-manual/profile-manual-usage.rst
diff options
context:
space:
mode:
Diffstat (limited to 'documentation/profile-manual/profile-manual-usage.rst')
-rw-r--r--documentation/profile-manual/profile-manual-usage.rst66
1 files changed, 66 insertions, 0 deletions
diff --git a/documentation/profile-manual/profile-manual-usage.rst b/documentation/profile-manual/profile-manual-usage.rst
index b97a1c6e96..85b78d72c5 100644
--- a/documentation/profile-manual/profile-manual-usage.rst
+++ b/documentation/profile-manual/profile-manual-usage.rst
@@ -160,6 +160,9 @@ and wrote 0.176 MB perf.data (~7700 samples) ] To see the results in a
160perf.data file in the current working directory and display the results 160perf.data file in the current working directory and display the results
161in an interactive UI: root@crownbay:~# perf report 161in an interactive UI: root@crownbay:~# perf report
162 162
163.. image:: figures/perf-wget-flat-stripped.png
164 :align: center
165
163The above screenshot displays a 'flat' profile, one entry for each 166The above screenshot displays a 'flat' profile, one entry for each
164'bucket' corresponding to the functions that were profiled during the 167'bucket' corresponding to the functions that were profiled during the
165profiling run, ordered from the most popular to the least (perf has 168profiling run, ordered from the most popular to the least (perf has
@@ -189,6 +192,9 @@ linux-2.6.19.2.tar.b 100%
189and wrote 0.652 MB perf.data (~28476 samples) ] root@crownbay:~# perf 192and wrote 0.652 MB perf.data (~28476 samples) ] root@crownbay:~# perf
190report 193report
191 194
195.. image:: figures/perf-wget-g-copy-to-user-expanded-stripped.png
196 :align: center
197
192Using the callgraph view, we can actually see not only which functions 198Using the callgraph view, we can actually see not only which functions
193took the most time, but we can also see a summary of how those functions 199took the most time, but we can also see a summary of how those functions
194were called and learn something about how the program interacts with the 200were called and learn something about how the program interacts with the
@@ -222,6 +228,9 @@ displayed in the callstack, here in the expanded sys_clock_gettime()
222function. Later we'll see it resolve to a userspace function call in 228function. Later we'll see it resolve to a userspace function call in
223busybox. 229busybox.
224 230
231.. image:: figures/perf-wget-g-copy-from-user-expanded-stripped.png
232 :align: center
233
225The above screenshot shows the other half of the journey for the data - 234The above screenshot shows the other half of the journey for the data -
226from the wget program's userspace buffers to disk. To get the buffers to 235from the wget program's userspace buffers to disk. To get the buffers to
227disk, the wget program issues a write(2), which does a copy-from-user to 236disk, the wget program issues a write(2), which does a copy-from-user to
@@ -236,6 +245,9 @@ actually implemented as an applet in busybox, so while the process name
236is 'wget', the executable we're actually interested in is busybox. So 245is 'wget', the executable we're actually interested in is busybox. So
237let's expand the first entry containing busybox: 246let's expand the first entry containing busybox:
238 247
248.. image:: figures/perf-wget-busybox-expanded-stripped.png
249 :align: center
250
239Again, before we expanded we saw that the function was labeled with a 251Again, before we expanded we saw that the function was labeled with a
240hex value instead of a symbol as with most of the kernel entries. 252hex value instead of a symbol as with most of the kernel entries.
241Expanding the busybox entry doesn't make it any better. 253Expanding the busybox entry doesn't make it any better.
@@ -270,14 +282,23 @@ busybox-dbg-1.20.2-r2.core2_32.rpm Now that the debuginfo is installed,
270we see that the busybox entries now display their functions 282we see that the busybox entries now display their functions
271symbolically: 283symbolically:
272 284
285.. image:: figures/perf-wget-busybox-debuginfo.png
286 :align: center
287
273If we expand one of the entries and press 'enter' on a leaf node, we're 288If we expand one of the entries and press 'enter' on a leaf node, we're
274presented with a menu of actions we can take to get more information 289presented with a menu of actions we can take to get more information
275related to that entry: 290related to that entry:
276 291
292.. image:: figures/perf-wget-busybox-dso-zoom-menu.png
293 :align: center
294
277One of these actions allows us to show a view that displays a 295One of these actions allows us to show a view that displays a
278busybox-centric view of the profiled functions (in this case we've also 296busybox-centric view of the profiled functions (in this case we've also
279expanded all the nodes using the 'E' key): 297expanded all the nodes using the 'E' key):
280 298
299.. image:: figures/perf-wget-busybox-dso-zoom.png
300 :align: center
301
281Finally, we can see that now that the busybox debuginfo is installed, 302Finally, we can see that now that the busybox debuginfo is installed,
282the previously unresolved symbol in the sys_clock_gettime() entry 303the previously unresolved symbol in the sys_clock_gettime() entry
283mentioned previously is now resolved, and shows that the 304mentioned previously is now resolved, and shows that the
@@ -285,16 +306,25 @@ sys_clock_gettime system call that was the source of 6.75% of the
285copy-to-user overhead was initiated by the handle_input() busybox 306copy-to-user overhead was initiated by the handle_input() busybox
286function: 307function:
287 308
309.. image:: figures/perf-wget-g-copy-to-user-expanded-debuginfo.png
310 :align: center
311
288At the lowest level of detail, we can dive down to the assembly level 312At the lowest level of detail, we can dive down to the assembly level
289and see which instructions caused the most overhead in a function. 313and see which instructions caused the most overhead in a function.
290Pressing 'enter' on the 'udhcpc_main' function, we're again presented 314Pressing 'enter' on the 'udhcpc_main' function, we're again presented
291with a menu: 315with a menu:
292 316
317.. image:: figures/perf-wget-busybox-annotate-menu.png
318 :align: center
319
293Selecting 'Annotate udhcpc_main', we get a detailed listing of 320Selecting 'Annotate udhcpc_main', we get a detailed listing of
294percentages by instruction for the udhcpc_main function. From the 321percentages by instruction for the udhcpc_main function. From the
295display, we can see that over 50% of the time spent in this function is 322display, we can see that over 50% of the time spent in this function is
296taken up by a couple tests and the move of a constant (1) to a register: 323taken up by a couple tests and the move of a constant (1) to a register:
297 324
325.. image:: figures/perf-wget-busybox-annotate-udhcpc.png
326 :align: center
327
298As a segue into tracing, let's try another profile using a different 328As a segue into tracing, let's try another profile using a different
299counter, something other than the default 'cycles'. 329counter, something other than the default 'cycles'.
300 330
@@ -424,6 +454,9 @@ and tell perf to do a profile using it as the sampling event:
424root@crownbay:~# perf record -g -e sched:sched_wakeup wget 454root@crownbay:~# perf record -g -e sched:sched_wakeup wget
425http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 455http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2
426 456
457.. image:: figures/sched-wakeup-profile.png
458 :align: center
459
427The screenshot above shows the results of running a profile using 460The screenshot above shows the results of running a profile using
428sched:sched_switch tracepoint, which shows the relative costs of various 461sched:sched_switch tracepoint, which shows the relative costs of various
429paths to sched_wakeup (note that sched_wakeup is the name of the 462paths to sched_wakeup (note that sched_wakeup is the name of the
@@ -653,6 +686,9 @@ linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k
6530:00:00 ETA Here we see entries not only for our wget load, but for 6860:00:00 ETA Here we see entries not only for our wget load, but for
654other processes running on the system as well: 687other processes running on the system as well:
655 688
689.. image:: figures/perf-systemwide.png
690 :align: center
691
656In the snapshot above, we can see callchains that originate in libc, and 692In the snapshot above, we can see callchains that originate in libc, and
657a callchain from Xorg that demonstrates that we're using a proprietary X 693a callchain from Xorg that demonstrates that we're using a proprietary X
658driver in userspace (notice the presence of 'PVR' and some other 694driver in userspace (notice the presence of 'PVR' and some other
@@ -665,6 +701,9 @@ record a profile: root@crownbay:~# perf record -g -a -e cycles:u ^C[
665perf record: Woken up 2 times to write data ] [ perf record: Captured 701perf record: Woken up 2 times to write data ] [ perf record: Captured
666and wrote 0.376 MB perf.data (~16443 samples) ] 702and wrote 0.376 MB perf.data (~16443 samples) ]
667 703
704.. image:: figures/perf-report-cycles-u.png
705 :align: center
706
668Notice in the screenshot above, we see only userspace entries ([.]) 707Notice in the screenshot above, we see only userspace entries ([.])
669 708
670Finally, we can press 'enter' on a leaf node and select the 'Zoom into 709Finally, we can press 'enter' on a leaf node and select the 'Zoom into
@@ -672,6 +711,9 @@ DSO' menu item to show only entries associated with a specific DSO. In
672the screenshot below, we've zoomed into the 'libc' DSO which shows all 711the screenshot below, we've zoomed into the 'libc' DSO which shows all
673the entries associated with the libc-xxx.so DSO. 712the entries associated with the libc-xxx.so DSO.
674 713
714.. image:: figures/perf-systemwide-libc.png
715 :align: center
716
675We can also use the system-wide -a switch to do system-wide tracing. 717We can also use the system-wide -a switch to do system-wide tracing.
676Here we'll trace a couple of scheduler events: root@crownbay:~# perf 718Here we'll trace a couple of scheduler events: root@crownbay:~# perf
677record -a -e sched:sched_switch -e sched:sched_wakeup ^C[ perf record: 719record -a -e sched:sched_switch -e sched:sched_wakeup ^C[ perf record:
@@ -838,6 +880,9 @@ gthumb 1300 [001] 34219.264551: do_fork: (c1028460) pcmanfm 1296 [000]
838(c1028460) And using 'perf report' on the same file, we can see the 880(c1028460) And using 'perf report' on the same file, we can see the
839callgraphs from starting a few programs during those 30 seconds: 881callgraphs from starting a few programs during those 30 seconds:
840 882
883.. image:: figures/perf-probe-do_fork-profile.png
884 :align: center
885
841.. container:: informalexample 886.. container:: informalexample
842 887
843 Tying it Together: 888 Tying it Together:
@@ -1312,6 +1357,9 @@ choosing from the kernelshark menu: Capture \| Record That will display
1312the following dialog, which allows you to choose one or more events (or 1357the following dialog, which allows you to choose one or more events (or
1313even one or more complete subsystems) to trace: 1358even one or more complete subsystems) to trace:
1314 1359
1360.. image:: figures/kernelshark-choose-events.png
1361 :align: center
1362
1315Note that these are exactly the same sets of events described in the 1363Note that these are exactly the same sets of events described in the
1316previous trace events subsystem section, and in fact is where trace-cmd 1364previous trace events subsystem section, and in fact is where trace-cmd
1317gets them for kernelshark. 1365gets them for kernelshark.
@@ -1324,6 +1372,9 @@ After doing that, we can start and stop the trace using the 'Run' and
1324'Stop' button on the lower right corner of the dialog (the same button 1372'Stop' button on the lower right corner of the dialog (the same button
1325will turn into the 'Stop' button after the trace has started): 1373will turn into the 'Stop' button after the trace has started):
1326 1374
1375.. image:: figures/kernelshark-output-display.png
1376 :align: center
1377
1327Notice that the right-hand pane shows the exact trace-cmd command-line 1378Notice that the right-hand pane shows the exact trace-cmd command-line
1328that's used to run the trace, along with the results of the trace-cmd 1379that's used to run the trace, along with the results of the trace-cmd
1329run. 1380run.
@@ -1332,9 +1383,15 @@ Once the 'Stop' button is pressed, the graphical view magically fills up
1332with a colorful per-cpu display of the trace data, along with the 1383with a colorful per-cpu display of the trace data, along with the
1333detailed event listing below that: 1384detailed event listing below that:
1334 1385
1386.. image:: figures/kernelshark-i915-display.png
1387 :align: center
1388
1335Here's another example, this time a display resulting from tracing 'all 1389Here's another example, this time a display resulting from tracing 'all
1336events': 1390events':
1337 1391
1392.. image:: figures/kernelshark-all.png
1393 :align: center
1394
1338The tool is pretty self-explanatory, but for more detailed information 1395The tool is pretty self-explanatory, but for more detailed information
1339on navigating through the data, see the `kernelshark 1396on navigating through the data, see the `kernelshark
1340website <http://rostedt.homelinux.com/kernelshark/>`__. 1397website <http://rostedt.homelinux.com/kernelshark/>`__.
@@ -1543,6 +1600,9 @@ press the 'Profile' button.
1543Once you've pressed the profile button, the three panes will fill up 1600Once you've pressed the profile button, the three panes will fill up
1544with profiling data: 1601with profiling data:
1545 1602
1603.. image:: figures/sysprof-copy-to-user.png
1604 :align: center
1605
1546The left pane shows a list of functions and processes. Selecting one of 1606The left pane shows a list of functions and processes. Selecting one of
1547those expands that function in the right pane, showing all its callees. 1607those expands that function in the right pane, showing all its callees.
1548Note that this caller-oriented display is essentially the inverse of 1608Note that this caller-oriented display is essentially the inverse of
@@ -1554,6 +1614,9 @@ looking up the callchain we can see that one of the callers of
1554Notice that this is essentially a portion of the same information we saw 1614Notice that this is essentially a portion of the same information we saw
1555in the perf display shown in the perf section of this page. 1615in the perf display shown in the perf section of this page.
1556 1616
1617.. image:: figures/sysprof-copy-from-user.png
1618 :align: center
1619
1557Similarly, the above is a snapshot of the Sysprof display of a 1620Similarly, the above is a snapshot of the Sysprof display of a
1558copy-from-user callchain. 1621copy-from-user callchain.
1559 1622
@@ -1562,6 +1625,9 @@ a list of all the callers of a particular function selected in the top
1562left pane. In this case, the lower pane is showing all the callers of 1625left pane. In this case, the lower pane is showing all the callers of
1563\__mark_inode_dirty: 1626\__mark_inode_dirty:
1564 1627
1628.. image:: figures/sysprof-callers.png
1629 :align: center
1630
1565Double-clicking on one of those functions will in turn change the focus 1631Double-clicking on one of those functions will in turn change the focus
1566to the selected function, and so on. 1632to the selected function, and so on.
1567 1633