From 6e4788bb6798d18f40299a3ae203ad5afd91931e Mon Sep 17 00:00:00 2001 From: Nicolas Dechesne Date: Wed, 29 Jul 2020 10:58:36 +0200 Subject: sphinx: profile-manual: add figures (From yocto-docs rev: bfa1c146265d165ddfa1a70488d0f043ec9df2ee) Signed-off-by: Nicolas Dechesne Signed-off-by: Richard Purdie --- .../profile-manual/profile-manual-usage.rst | 66 ++++++++++++++++++++++ 1 file changed, 66 insertions(+) (limited to 'documentation/profile-manual') diff --git a/documentation/profile-manual/profile-manual-usage.rst b/documentation/profile-manual/profile-manual-usage.rst index b97a1c6e96..85b78d72c5 100644 --- a/documentation/profile-manual/profile-manual-usage.rst +++ b/documentation/profile-manual/profile-manual-usage.rst @@ -160,6 +160,9 @@ and wrote 0.176 MB perf.data (~7700 samples) ] To see the results in a perf.data file in the current working directory and display the results in an interactive UI: root@crownbay:~# perf report +.. image:: figures/perf-wget-flat-stripped.png + :align: center + The above screenshot displays a 'flat' profile, one entry for each 'bucket' corresponding to the functions that were profiled during the profiling run, ordered from the most popular to the least (perf has @@ -189,6 +192,9 @@ linux-2.6.19.2.tar.b 100% and wrote 0.652 MB perf.data (~28476 samples) ] root@crownbay:~# perf report +.. image:: figures/perf-wget-g-copy-to-user-expanded-stripped.png + :align: center + Using the callgraph view, we can actually see not only which functions took the most time, but we can also see a summary of how those functions were called and learn something about how the program interacts with the @@ -222,6 +228,9 @@ displayed in the callstack, here in the expanded sys_clock_gettime() function. Later we'll see it resolve to a userspace function call in busybox. +.. image:: figures/perf-wget-g-copy-from-user-expanded-stripped.png + :align: center + The above screenshot shows the other half of the journey for the data - from the wget program's userspace buffers to disk. To get the buffers to disk, the wget program issues a write(2), which does a copy-from-user to @@ -236,6 +245,9 @@ actually implemented as an applet in busybox, so while the process name is 'wget', the executable we're actually interested in is busybox. So let's expand the first entry containing busybox: +.. image:: figures/perf-wget-busybox-expanded-stripped.png + :align: center + Again, before we expanded we saw that the function was labeled with a hex value instead of a symbol as with most of the kernel entries. Expanding the busybox entry doesn't make it any better. @@ -270,14 +282,23 @@ busybox-dbg-1.20.2-r2.core2_32.rpm Now that the debuginfo is installed, we see that the busybox entries now display their functions symbolically: +.. image:: figures/perf-wget-busybox-debuginfo.png + :align: center + If we expand one of the entries and press 'enter' on a leaf node, we're presented with a menu of actions we can take to get more information related to that entry: +.. image:: figures/perf-wget-busybox-dso-zoom-menu.png + :align: center + One of these actions allows us to show a view that displays a busybox-centric view of the profiled functions (in this case we've also expanded all the nodes using the 'E' key): +.. image:: figures/perf-wget-busybox-dso-zoom.png + :align: center + Finally, we can see that now that the busybox debuginfo is installed, the previously unresolved symbol in the sys_clock_gettime() entry mentioned previously is now resolved, and shows that the @@ -285,16 +306,25 @@ sys_clock_gettime system call that was the source of 6.75% of the copy-to-user overhead was initiated by the handle_input() busybox function: +.. image:: figures/perf-wget-g-copy-to-user-expanded-debuginfo.png + :align: center + At the lowest level of detail, we can dive down to the assembly level and see which instructions caused the most overhead in a function. Pressing 'enter' on the 'udhcpc_main' function, we're again presented with a menu: +.. image:: figures/perf-wget-busybox-annotate-menu.png + :align: center + Selecting 'Annotate udhcpc_main', we get a detailed listing of percentages by instruction for the udhcpc_main function. From the display, we can see that over 50% of the time spent in this function is taken up by a couple tests and the move of a constant (1) to a register: +.. image:: figures/perf-wget-busybox-annotate-udhcpc.png + :align: center + As a segue into tracing, let's try another profile using a different counter, something other than the default 'cycles'. @@ -424,6 +454,9 @@ and tell perf to do a profile using it as the sampling event: root@crownbay:~# perf record -g -e sched:sched_wakeup wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 +.. image:: figures/sched-wakeup-profile.png + :align: center + The screenshot above shows the results of running a profile using sched:sched_switch tracepoint, which shows the relative costs of various paths to sched_wakeup (note that sched_wakeup is the name of the @@ -653,6 +686,9 @@ linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k 0:00:00 ETA Here we see entries not only for our wget load, but for other processes running on the system as well: +.. image:: figures/perf-systemwide.png + :align: center + In the snapshot above, we can see callchains that originate in libc, and a callchain from Xorg that demonstrates that we're using a proprietary X driver in userspace (notice the presence of 'PVR' and some other @@ -665,6 +701,9 @@ record a profile: root@crownbay:~# perf record -g -a -e cycles:u ^C[ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.376 MB perf.data (~16443 samples) ] +.. image:: figures/perf-report-cycles-u.png + :align: center + Notice in the screenshot above, we see only userspace entries ([.]) Finally, we can press 'enter' on a leaf node and select the 'Zoom into @@ -672,6 +711,9 @@ DSO' menu item to show only entries associated with a specific DSO. In the screenshot below, we've zoomed into the 'libc' DSO which shows all the entries associated with the libc-xxx.so DSO. +.. image:: figures/perf-systemwide-libc.png + :align: center + We can also use the system-wide -a switch to do system-wide tracing. Here we'll trace a couple of scheduler events: root@crownbay:~# perf record -a -e sched:sched_switch -e sched:sched_wakeup ^C[ perf record: @@ -838,6 +880,9 @@ gthumb 1300 [001] 34219.264551: do_fork: (c1028460) pcmanfm 1296 [000] (c1028460) And using 'perf report' on the same file, we can see the callgraphs from starting a few programs during those 30 seconds: +.. image:: figures/perf-probe-do_fork-profile.png + :align: center + .. container:: informalexample Tying it Together: @@ -1312,6 +1357,9 @@ choosing from the kernelshark menu: Capture \| Record That will display the following dialog, which allows you to choose one or more events (or even one or more complete subsystems) to trace: +.. image:: figures/kernelshark-choose-events.png + :align: center + Note that these are exactly the same sets of events described in the previous trace events subsystem section, and in fact is where trace-cmd gets them for kernelshark. @@ -1324,6 +1372,9 @@ After doing that, we can start and stop the trace using the 'Run' and 'Stop' button on the lower right corner of the dialog (the same button will turn into the 'Stop' button after the trace has started): +.. image:: figures/kernelshark-output-display.png + :align: center + Notice that the right-hand pane shows the exact trace-cmd command-line that's used to run the trace, along with the results of the trace-cmd run. @@ -1332,9 +1383,15 @@ Once the 'Stop' button is pressed, the graphical view magically fills up with a colorful per-cpu display of the trace data, along with the detailed event listing below that: +.. image:: figures/kernelshark-i915-display.png + :align: center + Here's another example, this time a display resulting from tracing 'all events': +.. image:: figures/kernelshark-all.png + :align: center + The tool is pretty self-explanatory, but for more detailed information on navigating through the data, see the `kernelshark website `__. @@ -1543,6 +1600,9 @@ press the 'Profile' button. Once you've pressed the profile button, the three panes will fill up with profiling data: +.. image:: figures/sysprof-copy-to-user.png + :align: center + The left pane shows a list of functions and processes. Selecting one of those expands that function in the right pane, showing all its callees. Note that this caller-oriented display is essentially the inverse of @@ -1554,6 +1614,9 @@ looking up the callchain we can see that one of the callers of Notice that this is essentially a portion of the same information we saw in the perf display shown in the perf section of this page. +.. image:: figures/sysprof-copy-from-user.png + :align: center + Similarly, the above is a snapshot of the Sysprof display of a copy-from-user callchain. @@ -1562,6 +1625,9 @@ a list of all the callers of a particular function selected in the top left pane. In this case, the lower pane is showing all the callers of \__mark_inode_dirty: +.. image:: figures/sysprof-callers.png + :align: center + Double-clicking on one of those functions will in turn change the focus to the selected function, and so on. -- cgit v1.2.3-54-g00ecf