summaryrefslogtreecommitdiffstats
path: root/documentation/profile-manual/profile-manual-usage.rst
diff options
context:
space:
mode:
authorNicolas Dechesne <nicolas.dechesne@linaro.org>2020-06-26 19:10:51 +0200
committerRichard Purdie <richard.purdie@linuxfoundation.org>2020-09-17 10:09:33 +0100
commit9bd69b1f1d71a9692189beeac75af9dfbad816cc (patch)
tree305347fca899074aed5610e0e82eaec180bf630c /documentation/profile-manual/profile-manual-usage.rst
parentc40a8d5904c29046f1cbbeb998e6cd7c24f9b206 (diff)
downloadpoky-9bd69b1f1d71a9692189beeac75af9dfbad816cc.tar.gz
sphinx: initial sphinx support
This commit is autogenerated pandoc to generate an inital set of reST files based on DocBook XML files. A .rst file is generated for each .xml files in all manuals with this command: cd <manual> for i in *.xml; do \ pandoc -f docbook -t rst --shift-heading-level-by=-1 \ $i -o $(basename $i .xml).rst \ done The conversion was done with: pandoc 2.9.2.1-91 (Arch Linux). Also created an initial top level index file for each document, and added all 'books' to the top leve index.rst file. The YP manuals layout is organized as: Book Chapter Section Section Section Sphinx uses section headers to create the document structure. ReStructuredText defines sections headers like that: To break longer text up into sections, you use section headers. These are a single line of text (one or more words) with adornment: an underline alone, or an underline and an overline together, in dashes "-----", equals "======", tildes "~~~~~~" or any of the non-alphanumeric characters = - ` : ' " ~ ^ _ * + # < > that you feel comfortable with. An underline-only adornment is distinct from an overline-and-underline adornment using the same character. The underline/overline must be at least as long as the title text. Be consistent, since all sections marked with the same adornment style are deemed to be at the same level: Let's define the following convention when converting from Docbook: Book => overline === (Title) Chapter => overline *** (1.) Section => ==== (1.1) Section => ---- (1.1.1) Section => ~~~~ (1.1.1.1) Section => ^^^^ (1.1.1.1.1) During the conversion with pandoc, we used --shift-heading-level=-1 to convert most of DocBook headings automatically. However with this setting, the Chapter header was removed, so I added it back manually. Without this setting all headings were off by one, which was more difficult to manually fix. At least with this change, we now have the same TOC with Sphinx and DocBook. (From yocto-docs rev: 3c73d64a476d4423ee4c6808c685fa94d88d7df8) Signed-off-by: Nicolas Dechesne <nicolas.dechesne@linaro.org> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Diffstat (limited to 'documentation/profile-manual/profile-manual-usage.rst')
-rw-r--r--documentation/profile-manual/profile-manual-usage.rst2018
1 files changed, 2018 insertions, 0 deletions
diff --git a/documentation/profile-manual/profile-manual-usage.rst b/documentation/profile-manual/profile-manual-usage.rst
new file mode 100644
index 0000000000..0f9f8925ea
--- /dev/null
+++ b/documentation/profile-manual/profile-manual-usage.rst
@@ -0,0 +1,2018 @@
1***************************************************************
2Basic Usage (with examples) for each of the Yocto Tracing Tools
3***************************************************************
4
5This chapter presents basic usage examples for each of the tracing
6tools.
7
8.. _profile-manual-perf:
9
10perf
11====
12
13The 'perf' tool is the profiling and tracing tool that comes bundled
14with the Linux kernel.
15
16Don't let the fact that it's part of the kernel fool you into thinking
17that it's only for tracing and profiling the kernel - you can indeed use
18it to trace and profile just the kernel, but you can also use it to
19profile specific applications separately (with or without kernel
20context), and you can also use it to trace and profile the kernel and
21all applications on the system simultaneously to gain a system-wide view
22of what's going on.
23
24In many ways, perf aims to be a superset of all the tracing and
25profiling tools available in Linux today, including all the other tools
26covered in this HOWTO. The past couple of years have seen perf subsume a
27lot of the functionality of those other tools and, at the same time,
28those other tools have removed large portions of their previous
29functionality and replaced it with calls to the equivalent functionality
30now implemented by the perf subsystem. Extrapolation suggests that at
31some point those other tools will simply become completely redundant and
32go away; until then, we'll cover those other tools in these pages and in
33many cases show how the same things can be accomplished in perf and the
34other tools when it seems useful to do so.
35
36The coverage below details some of the most common ways you'll likely
37want to apply the tool; full documentation can be found either within
38the tool itself or in the man pages at
39`perf(1) <http://linux.die.net/man/1/perf>`__.
40
41.. _perf-setup:
42
43Setup
44-----
45
46For this section, we'll assume you've already performed the basic setup
47outlined in the General Setup section.
48
49In particular, you'll get the most mileage out of perf if you profile an
50image built with the following in your ``local.conf`` file:
51`INHIBIT_PACKAGE_STRIP <&YOCTO_DOCS_REF_URL;#var-INHIBIT_PACKAGE_STRIP>`__
52= "1"
53
54perf runs on the target system for the most part. You can archive
55profile data and copy it to the host for analysis, but for the rest of
56this document we assume you've ssh'ed to the host and will be running
57the perf commands on the target.
58
59.. _perf-basic-usage:
60
61Basic Usage
62-----------
63
64The perf tool is pretty much self-documenting. To remind yourself of the
65available commands, simply type 'perf', which will show you basic usage
66along with the available perf subcommands: root@crownbay:~# perf usage:
67perf [--version] [--help] COMMAND [ARGS] The most commonly used perf
68commands are: annotate Read perf.data (created by perf record) and
69display annotated code archive Create archive with object files with
70build-ids found in perf.data file bench General framework for benchmark
71suites buildid-cache Manage build-id cache. buildid-list List the
72buildids in a perf.data file diff Read two perf.data files and display
73the differential profile evlist List the event names in a perf.data file
74inject Filter to augment the events stream with additional information
75kmem Tool to trace/measure kernel memory(slab) properties kvm Tool to
76trace/measure kvm guest os list List all symbolic event types lock
77Analyze lock events probe Define new dynamic tracepoints record Run a
78command and record its profile into perf.data report Read perf.data
79(created by perf record) and display the profile sched Tool to
80trace/measure scheduler properties (latencies) script Read perf.data
81(created by perf record) and display trace output stat Run a command and
82gather performance counter statistics test Runs sanity tests. timechart
83Tool to visualize total system behavior during a workload top System
84profiling tool. See 'perf help COMMAND' for more information on a
85specific command.
86
87Using perf to do Basic Profiling
88~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
89
90As a simple test case, we'll profile the 'wget' of a fairly large file,
91which is a minimally interesting case because it has both file and
92network I/O aspects, and at least in the case of standard Yocto images,
93it's implemented as part of busybox, so the methods we use to analyze it
94can be used in a very similar way to the whole host of supported busybox
95applets in Yocto. root@crownbay:~# rm linux-2.6.19.2.tar.bz2; \\ wget
96http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2
97The quickest and easiest way to get some basic overall data about what's
98going on for a particular workload is to profile it using 'perf stat'.
99'perf stat' basically profiles using a few default counters and displays
100the summed counts at the end of the run: root@crownbay:~# perf stat wget
101http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2
102Connecting to downloads.yoctoproject.org (140.211.169.59:80)
103linux-2.6.19.2.tar.b 100%
104\|***************************************************\| 41727k 0:00:00
105ETA Performance counter stats for 'wget
106http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2':
1074597.223902 task-clock # 0.077 CPUs utilized 23568 context-switches #
1080.005 M/sec 68 CPU-migrations # 0.015 K/sec 241 page-faults # 0.052
109K/sec 3045817293 cycles # 0.663 GHz <not supported>
110stalled-cycles-frontend <not supported> stalled-cycles-backend 858909167
111instructions # 0.28 insns per cycle 165441165 branches # 35.987 M/sec
11219550329 branch-misses # 11.82% of all branches 59.836627620 seconds
113time elapsed Many times such a simple-minded test doesn't yield much of
114interest, but sometimes it does (see Real-world Yocto bug (slow
115loop-mounted write speed)).
116
117Also, note that 'perf stat' isn't restricted to a fixed set of counters
118- basically any event listed in the output of 'perf list' can be tallied
119by 'perf stat'. For example, suppose we wanted to see a summary of all
120the events related to kernel memory allocation/freeing along with cache
121hits and misses: root@crownbay:~# perf stat -e kmem:\* -e
122cache-references -e cache-misses wget
123http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2
124Connecting to downloads.yoctoproject.org (140.211.169.59:80)
125linux-2.6.19.2.tar.b 100%
126\|***************************************************\| 41727k 0:00:00
127ETA Performance counter stats for 'wget
128http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2':
1295566 kmem:kmalloc 125517 kmem:kmem_cache_alloc 0 kmem:kmalloc_node 0
130kmem:kmem_cache_alloc_node 34401 kmem:kfree 69920 kmem:kmem_cache_free
131133 kmem:mm_page_free 41 kmem:mm_page_free_batched 11502
132kmem:mm_page_alloc 11375 kmem:mm_page_alloc_zone_locked 0
133kmem:mm_page_pcpu_drain 0 kmem:mm_page_alloc_extfrag 66848602
134cache-references 2917740 cache-misses # 4.365 % of all cache refs
13544.831023415 seconds time elapsed So 'perf stat' gives us a nice easy
136way to get a quick overview of what might be happening for a set of
137events, but normally we'd need a little more detail in order to
138understand what's going on in a way that we can act on in a useful way.
139
140To dive down into a next level of detail, we can use 'perf record'/'perf
141report' which will collect profiling data and present it to use using an
142interactive text-based UI (or simply as text if we specify --stdio to
143'perf report').
144
145As our first attempt at profiling this workload, we'll simply run 'perf
146record', handing it the workload we want to profile (everything after
147'perf record' and any perf options we hand it - here none - will be
148executed in a new shell). perf collects samples until the process exits
149and records them in a file named 'perf.data' in the current working
150directory. root@crownbay:~# perf record wget
151http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2
152Connecting to downloads.yoctoproject.org (140.211.169.59:80)
153linux-2.6.19.2.tar.b 100%
154\|************************************************\| 41727k 0:00:00 ETA
155[ perf record: Woken up 1 times to write data ] [ perf record: Captured
156and wrote 0.176 MB perf.data (~7700 samples) ] To see the results in a
157'text-based UI' (tui), simply run 'perf report', which will read the
158perf.data file in the current working directory and display the results
159in an interactive UI: root@crownbay:~# perf report
160
161The above screenshot displays a 'flat' profile, one entry for each
162'bucket' corresponding to the functions that were profiled during the
163profiling run, ordered from the most popular to the least (perf has
164options to sort in various orders and keys as well as display entries
165only above a certain threshold and so on - see the perf documentation
166for details). Note that this includes both userspace functions (entries
167containing a [.]) and kernel functions accounted to the process (entries
168containing a [k]). (perf has command-line modifiers that can be used to
169restrict the profiling to kernel or userspace, among others).
170
171Notice also that the above report shows an entry for 'busybox', which is
172the executable that implements 'wget' in Yocto, but that instead of a
173useful function name in that entry, it displays a not-so-friendly hex
174value instead. The steps below will show how to fix that problem.
175
176Before we do that, however, let's try running a different profile, one
177which shows something a little more interesting. The only difference
178between the new profile and the previous one is that we'll add the -g
179option, which will record not just the address of a sampled function,
180but the entire callchain to the sampled function as well:
181root@crownbay:~# perf record -g wget
182http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2
183Connecting to downloads.yoctoproject.org (140.211.169.59:80)
184linux-2.6.19.2.tar.b 100%
185\|************************************************\| 41727k 0:00:00 ETA
186[ perf record: Woken up 3 times to write data ] [ perf record: Captured
187and wrote 0.652 MB perf.data (~28476 samples) ] root@crownbay:~# perf
188report
189
190Using the callgraph view, we can actually see not only which functions
191took the most time, but we can also see a summary of how those functions
192were called and learn something about how the program interacts with the
193kernel in the process.
194
195Notice that each entry in the above screenshot now contains a '+' on the
196left-hand side. This means that we can expand the entry and drill down
197into the callchains that feed into that entry. Pressing 'enter' on any
198one of them will expand the callchain (you can also press 'E' to expand
199them all at the same time or 'C' to collapse them all).
200
201In the screenshot above, we've toggled the \__copy_to_user_ll() entry
202and several subnodes all the way down. This lets us see which callchains
203contributed to the profiled \__copy_to_user_ll() function which
204contributed 1.77% to the total profile.
205
206As a bit of background explanation for these callchains, think about
207what happens at a high level when you run wget to get a file out on the
208network. Basically what happens is that the data comes into the kernel
209via the network connection (socket) and is passed to the userspace
210program 'wget' (which is actually a part of busybox, but that's not
211important for now), which takes the buffers the kernel passes to it and
212writes it to a disk file to save it.
213
214The part of this process that we're looking at in the above call stacks
215is the part where the kernel passes the data it's read from the socket
216down to wget i.e. a copy-to-user.
217
218Notice also that here there's also a case where the hex value is
219displayed in the callstack, here in the expanded sys_clock_gettime()
220function. Later we'll see it resolve to a userspace function call in
221busybox.
222
223The above screenshot shows the other half of the journey for the data -
224from the wget program's userspace buffers to disk. To get the buffers to
225disk, the wget program issues a write(2), which does a copy-from-user to
226the kernel, which then takes care via some circuitous path (probably
227also present somewhere in the profile data), to get it safely to disk.
228
229Now that we've seen the basic layout of the profile data and the basics
230of how to extract useful information out of it, let's get back to the
231task at hand and see if we can get some basic idea about where the time
232is spent in the program we're profiling, wget. Remember that wget is
233actually implemented as an applet in busybox, so while the process name
234is 'wget', the executable we're actually interested in is busybox. So
235let's expand the first entry containing busybox:
236
237Again, before we expanded we saw that the function was labeled with a
238hex value instead of a symbol as with most of the kernel entries.
239Expanding the busybox entry doesn't make it any better.
240
241The problem is that perf can't find the symbol information for the
242busybox binary, which is actually stripped out by the Yocto build
243system.
244
245One way around that is to put the following in your ``local.conf`` file
246when you build the image:
247`INHIBIT_PACKAGE_STRIP <&YOCTO_DOCS_REF_URL;#var-INHIBIT_PACKAGE_STRIP>`__
248= "1" However, we already have an image with the binaries stripped, so
249what can we do to get perf to resolve the symbols? Basically we need to
250install the debuginfo for the busybox package.
251
252To generate the debug info for the packages in the image, we can add
253dbg-pkgs to EXTRA_IMAGE_FEATURES in local.conf. For example:
254EXTRA_IMAGE_FEATURES = "debug-tweaks tools-profile dbg-pkgs"
255Additionally, in order to generate the type of debuginfo that perf
256understands, we also need to set
257```PACKAGE_DEBUG_SPLIT_STYLE`` <&YOCTO_DOCS_REF_URL;#var-PACKAGE_DEBUG_SPLIT_STYLE>`__
258in the ``local.conf`` file: PACKAGE_DEBUG_SPLIT_STYLE =
259'debug-file-directory' Once we've done that, we can install the
260debuginfo for busybox. The debug packages once built can be found in
261build/tmp/deploy/rpm/\* on the host system. Find the busybox-dbg-...rpm
262file and copy it to the target. For example: [trz@empanada core2]$ scp
263/home/trz/yocto/crownbay-tracing-dbg/build/tmp/deploy/rpm/core2_32/busybox-dbg-1.20.2-r2.core2_32.rpm
264root@192.168.1.31: root@192.168.1.31's password:
265busybox-dbg-1.20.2-r2.core2_32.rpm 100% 1826KB 1.8MB/s 00:01 Now install
266the debug rpm on the target: root@crownbay:~# rpm -i
267busybox-dbg-1.20.2-r2.core2_32.rpm Now that the debuginfo is installed,
268we see that the busybox entries now display their functions
269symbolically:
270
271If we expand one of the entries and press 'enter' on a leaf node, we're
272presented with a menu of actions we can take to get more information
273related to that entry:
274
275One of these actions allows us to show a view that displays a
276busybox-centric view of the profiled functions (in this case we've also
277expanded all the nodes using the 'E' key):
278
279Finally, we can see that now that the busybox debuginfo is installed,
280the previously unresolved symbol in the sys_clock_gettime() entry
281mentioned previously is now resolved, and shows that the
282sys_clock_gettime system call that was the source of 6.75% of the
283copy-to-user overhead was initiated by the handle_input() busybox
284function:
285
286At the lowest level of detail, we can dive down to the assembly level
287and see which instructions caused the most overhead in a function.
288Pressing 'enter' on the 'udhcpc_main' function, we're again presented
289with a menu:
290
291Selecting 'Annotate udhcpc_main', we get a detailed listing of
292percentages by instruction for the udhcpc_main function. From the
293display, we can see that over 50% of the time spent in this function is
294taken up by a couple tests and the move of a constant (1) to a register:
295
296As a segue into tracing, let's try another profile using a different
297counter, something other than the default 'cycles'.
298
299The tracing and profiling infrastructure in Linux has become unified in
300a way that allows us to use the same tool with a completely different
301set of counters, not just the standard hardware counters that
302traditional tools have had to restrict themselves to (of course the
303traditional tools can also make use of the expanded possibilities now
304available to them, and in some cases have, as mentioned previously).
305
306We can get a list of the available events that can be used to profile a
307workload via 'perf list': root@crownbay:~# perf list List of pre-defined
308events (to be used in -e): cpu-cycles OR cycles [Hardware event]
309stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]
310stalled-cycles-backend OR idle-cycles-backend [Hardware event]
311instructions [Hardware event] cache-references [Hardware event]
312cache-misses [Hardware event] branch-instructions OR branches [Hardware
313event] branch-misses [Hardware event] bus-cycles [Hardware event]
314ref-cycles [Hardware event] cpu-clock [Software event] task-clock
315[Software event] page-faults OR faults [Software event] minor-faults
316[Software event] major-faults [Software event] context-switches OR cs
317[Software event] cpu-migrations OR migrations [Software event]
318alignment-faults [Software event] emulation-faults [Software event]
319L1-dcache-loads [Hardware cache event] L1-dcache-load-misses [Hardware
320cache event] L1-dcache-prefetch-misses [Hardware cache event]
321L1-icache-loads [Hardware cache event] L1-icache-load-misses [Hardware
322cache event] . . . rNNN [Raw hardware event descriptor]
323cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor] (see
324'perf list --help' on how to encode it) mem:<addr>[:access] [Hardware
325breakpoint] sunrpc:rpc_call_status [Tracepoint event]
326sunrpc:rpc_bind_status [Tracepoint event] sunrpc:rpc_connect_status
327[Tracepoint event] sunrpc:rpc_task_begin [Tracepoint event]
328skb:kfree_skb [Tracepoint event] skb:consume_skb [Tracepoint event]
329skb:skb_copy_datagram_iovec [Tracepoint event] net:net_dev_xmit
330[Tracepoint event] net:net_dev_queue [Tracepoint event]
331net:netif_receive_skb [Tracepoint event] net:netif_rx [Tracepoint event]
332napi:napi_poll [Tracepoint event] sock:sock_rcvqueue_full [Tracepoint
333event] sock:sock_exceed_buf_limit [Tracepoint event]
334udp:udp_fail_queue_rcv_skb [Tracepoint event] hda:hda_send_cmd
335[Tracepoint event] hda:hda_get_response [Tracepoint event]
336hda:hda_bus_reset [Tracepoint event] scsi:scsi_dispatch_cmd_start
337[Tracepoint event] scsi:scsi_dispatch_cmd_error [Tracepoint event]
338scsi:scsi_eh_wakeup [Tracepoint event] drm:drm_vblank_event [Tracepoint
339event] drm:drm_vblank_event_queued [Tracepoint event]
340drm:drm_vblank_event_delivered [Tracepoint event] random:mix_pool_bytes
341[Tracepoint event] random:mix_pool_bytes_nolock [Tracepoint event]
342random:credit_entropy_bits [Tracepoint event] gpio:gpio_direction
343[Tracepoint event] gpio:gpio_value [Tracepoint event]
344block:block_rq_abort [Tracepoint event] block:block_rq_requeue
345[Tracepoint event] block:block_rq_issue [Tracepoint event]
346block:block_bio_bounce [Tracepoint event] block:block_bio_complete
347[Tracepoint event] block:block_bio_backmerge [Tracepoint event] . .
348writeback:writeback_wake_thread [Tracepoint event]
349writeback:writeback_wake_forker_thread [Tracepoint event]
350writeback:writeback_bdi_register [Tracepoint event] . .
351writeback:writeback_single_inode_requeue [Tracepoint event]
352writeback:writeback_single_inode [Tracepoint event] kmem:kmalloc
353[Tracepoint event] kmem:kmem_cache_alloc [Tracepoint event]
354kmem:mm_page_alloc [Tracepoint event] kmem:mm_page_alloc_zone_locked
355[Tracepoint event] kmem:mm_page_pcpu_drain [Tracepoint event]
356kmem:mm_page_alloc_extfrag [Tracepoint event]
357vmscan:mm_vmscan_kswapd_sleep [Tracepoint event]
358vmscan:mm_vmscan_kswapd_wake [Tracepoint event]
359vmscan:mm_vmscan_wakeup_kswapd [Tracepoint event]
360vmscan:mm_vmscan_direct_reclaim_begin [Tracepoint event] . .
361module:module_get [Tracepoint event] module:module_put [Tracepoint
362event] module:module_request [Tracepoint event] sched:sched_kthread_stop
363[Tracepoint event] sched:sched_wakeup [Tracepoint event]
364sched:sched_wakeup_new [Tracepoint event] sched:sched_process_fork
365[Tracepoint event] sched:sched_process_exec [Tracepoint event]
366sched:sched_stat_runtime [Tracepoint event] rcu:rcu_utilization
367[Tracepoint event] workqueue:workqueue_queue_work [Tracepoint event]
368workqueue:workqueue_execute_end [Tracepoint event]
369signal:signal_generate [Tracepoint event] signal:signal_deliver
370[Tracepoint event] timer:timer_init [Tracepoint event] timer:timer_start
371[Tracepoint event] timer:hrtimer_cancel [Tracepoint event]
372timer:itimer_state [Tracepoint event] timer:itimer_expire [Tracepoint
373event] irq:irq_handler_entry [Tracepoint event] irq:irq_handler_exit
374[Tracepoint event] irq:softirq_entry [Tracepoint event] irq:softirq_exit
375[Tracepoint event] irq:softirq_raise [Tracepoint event] printk:console
376[Tracepoint event] task:task_newtask [Tracepoint event] task:task_rename
377[Tracepoint event] syscalls:sys_enter_socketcall [Tracepoint event]
378syscalls:sys_exit_socketcall [Tracepoint event] . . .
379syscalls:sys_enter_unshare [Tracepoint event] syscalls:sys_exit_unshare
380[Tracepoint event] raw_syscalls:sys_enter [Tracepoint event]
381raw_syscalls:sys_exit [Tracepoint event]
382
383.. container:: informalexample
384
385 Tying it Together:
386 These are exactly the same set of events defined by the trace event
387 subsystem and exposed by ftrace/tracecmd/kernelshark as files in
388 /sys/kernel/debug/tracing/events, by SystemTap as
389 kernel.trace("tracepoint_name") and (partially) accessed by LTTng.
390
391Only a subset of these would be of interest to us when looking at this
392workload, so let's choose the most likely subsystems (identified by the
393string before the colon in the Tracepoint events) and do a 'perf stat'
394run using only those wildcarded subsystems: root@crownbay:~# perf stat
395-e skb:\* -e net:\* -e napi:\* -e sched:\* -e workqueue:\* -e irq:\* -e
396syscalls:\* wget
397http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2
398Performance counter stats for 'wget
399http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2':
40023323 skb:kfree_skb 0 skb:consume_skb 49897 skb:skb_copy_datagram_iovec
4016217 net:net_dev_xmit 6217 net:net_dev_queue 7962 net:netif_receive_skb
4022 net:netif_rx 8340 napi:napi_poll 0 sched:sched_kthread_stop 0
403sched:sched_kthread_stop_ret 3749 sched:sched_wakeup 0
404sched:sched_wakeup_new 0 sched:sched_switch 29 sched:sched_migrate_task
4050 sched:sched_process_free 1 sched:sched_process_exit 0
406sched:sched_wait_task 0 sched:sched_process_wait 0
407sched:sched_process_fork 1 sched:sched_process_exec 0
408sched:sched_stat_wait 2106519415641 sched:sched_stat_sleep 0
409sched:sched_stat_iowait 147453613 sched:sched_stat_blocked 12903026955
410sched:sched_stat_runtime 0 sched:sched_pi_setprio 3574
411workqueue:workqueue_queue_work 3574 workqueue:workqueue_activate_work 0
412workqueue:workqueue_execute_start 0 workqueue:workqueue_execute_end
41316631 irq:irq_handler_entry 16631 irq:irq_handler_exit 28521
414irq:softirq_entry 28521 irq:softirq_exit 28728 irq:softirq_raise 1
415syscalls:sys_enter_sendmmsg 1 syscalls:sys_exit_sendmmsg 0
416syscalls:sys_enter_recvmmsg 0 syscalls:sys_exit_recvmmsg 14
417syscalls:sys_enter_socketcall 14 syscalls:sys_exit_socketcall . . .
41816965 syscalls:sys_enter_read 16965 syscalls:sys_exit_read 12854
419syscalls:sys_enter_write 12854 syscalls:sys_exit_write . . .
42058.029710972 seconds time elapsed Let's pick one of these tracepoints
421and tell perf to do a profile using it as the sampling event:
422root@crownbay:~# perf record -g -e sched:sched_wakeup wget
423http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2
424
425The screenshot above shows the results of running a profile using
426sched:sched_switch tracepoint, which shows the relative costs of various
427paths to sched_wakeup (note that sched_wakeup is the name of the
428tracepoint - it's actually defined just inside ttwu_do_wakeup(), which
429accounts for the function name actually displayed in the profile: /\* \*
430Mark the task runnable and perform wakeup-preemption. \*/ static void
431ttwu_do_wakeup(struct rq \*rq, struct task_struct \*p, int wake_flags) {
432trace_sched_wakeup(p, true); . . . } A couple of the more interesting
433callchains are expanded and displayed above, basically some network
434receive paths that presumably end up waking up wget (busybox) when
435network data is ready.
436
437Note that because tracepoints are normally used for tracing, the default
438sampling period for tracepoints is 1 i.e. for tracepoints perf will
439sample on every event occurrence (this can be changed using the -c
440option). This is in contrast to hardware counters such as for example
441the default 'cycles' hardware counter used for normal profiling, where
442sampling periods are much higher (in the thousands) because profiling
443should have as low an overhead as possible and sampling on every cycle
444would be prohibitively expensive.
445
446Using perf to do Basic Tracing
447~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
448
449Profiling is a great tool for solving many problems or for getting a
450high-level view of what's going on with a workload or across the system.
451It is however by definition an approximation, as suggested by the most
452prominent word associated with it, 'sampling'. On the one hand, it
453allows a representative picture of what's going on in the system to be
454cheaply taken, but on the other hand, that cheapness limits its utility
455when that data suggests a need to 'dive down' more deeply to discover
456what's really going on. In such cases, the only way to see what's really
457going on is to be able to look at (or summarize more intelligently) the
458individual steps that go into the higher-level behavior exposed by the
459coarse-grained profiling data.
460
461As a concrete example, we can trace all the events we think might be
462applicable to our workload: root@crownbay:~# perf record -g -e skb:\* -e
463net:\* -e napi:\* -e sched:sched_switch -e sched:sched_wakeup -e irq:\*
464-e syscalls:sys_enter_read -e syscalls:sys_exit_read -e
465syscalls:sys_enter_write -e syscalls:sys_exit_write wget
466http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2
467We can look at the raw trace output using 'perf script' with no
468arguments: root@crownbay:~# perf script perf 1262 [000] 11624.857082:
469sys_exit_read: 0x0 perf 1262 [000] 11624.857193: sched_wakeup:
470comm=migration/0 pid=6 prio=0 success=1 target_cpu=000 wget 1262 [001]
47111624.858021: softirq_raise: vec=1 [action=TIMER] wget 1262 [001]
47211624.858074: softirq_entry: vec=1 [action=TIMER] wget 1262 [001]
47311624.858081: softirq_exit: vec=1 [action=TIMER] wget 1262 [001]
47411624.858166: sys_enter_read: fd: 0x0003, buf: 0xbf82c940, count: 0x0200
475wget 1262 [001] 11624.858177: sys_exit_read: 0x200 wget 1262 [001]
47611624.858878: kfree_skb: skbaddr=0xeb248d80 protocol=0
477location=0xc15a5308 wget 1262 [001] 11624.858945: kfree_skb:
478skbaddr=0xeb248000 protocol=0 location=0xc15a5308 wget 1262 [001]
47911624.859020: softirq_raise: vec=1 [action=TIMER] wget 1262 [001]
48011624.859076: softirq_entry: vec=1 [action=TIMER] wget 1262 [001]
48111624.859083: softirq_exit: vec=1 [action=TIMER] wget 1262 [001]
48211624.859167: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400
483wget 1262 [001] 11624.859192: sys_exit_read: 0x1d7 wget 1262 [001]
48411624.859228: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400
485wget 1262 [001] 11624.859233: sys_exit_read: 0x0 wget 1262 [001]
48611624.859573: sys_enter_read: fd: 0x0003, buf: 0xbf82c580, count: 0x0200
487wget 1262 [001] 11624.859584: sys_exit_read: 0x200 wget 1262 [001]
48811624.859864: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400
489wget 1262 [001] 11624.859888: sys_exit_read: 0x400 wget 1262 [001]
49011624.859935: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400
491wget 1262 [001] 11624.859944: sys_exit_read: 0x400 This gives us a
492detailed timestamped sequence of events that occurred within the
493workload with respect to those events.
494
495In many ways, profiling can be viewed as a subset of tracing -
496theoretically, if you have a set of trace events that's sufficient to
497capture all the important aspects of a workload, you can derive any of
498the results or views that a profiling run can.
499
500Another aspect of traditional profiling is that while powerful in many
501ways, it's limited by the granularity of the underlying data. Profiling
502tools offer various ways of sorting and presenting the sample data,
503which make it much more useful and amenable to user experimentation, but
504in the end it can't be used in an open-ended way to extract data that
505just isn't present as a consequence of the fact that conceptually, most
506of it has been thrown away.
507
508Full-blown detailed tracing data does however offer the opportunity to
509manipulate and present the information collected during a tracing run in
510an infinite variety of ways.
511
512Another way to look at it is that there are only so many ways that the
513'primitive' counters can be used on their own to generate interesting
514output; to get anything more complicated than simple counts requires
515some amount of additional logic, which is typically very specific to the
516problem at hand. For example, if we wanted to make use of a 'counter'
517that maps to the value of the time difference between when a process was
518scheduled to run on a processor and the time it actually ran, we
519wouldn't expect such a counter to exist on its own, but we could derive
520one called say 'wakeup_latency' and use it to extract a useful view of
521that metric from trace data. Likewise, we really can't figure out from
522standard profiling tools how much data every process on the system reads
523and writes, along with how many of those reads and writes fail
524completely. If we have sufficient trace data, however, we could with the
525right tools easily extract and present that information, but we'd need
526something other than pre-canned profiling tools to do that.
527
528Luckily, there is a general-purpose way to handle such needs, called
529'programming languages'. Making programming languages easily available
530to apply to such problems given the specific format of data is called a
531'programming language binding' for that data and language. Perf supports
532two programming language bindings, one for Python and one for Perl.
533
534.. container:: informalexample
535
536 Tying it Together:
537 Language bindings for manipulating and aggregating trace data are of
538 course not a new idea. One of the first projects to do this was IBM's
539 DProbes dpcc compiler, an ANSI C compiler which targeted a low-level
540 assembly language running on an in-kernel interpreter on the target
541 system. This is exactly analogous to what Sun's DTrace did, except
542 that DTrace invented its own language for the purpose. Systemtap,
543 heavily inspired by DTrace, also created its own one-off language,
544 but rather than running the product on an in-kernel interpreter,
545 created an elaborate compiler-based machinery to translate its
546 language into kernel modules written in C.
547
548Now that we have the trace data in perf.data, we can use 'perf script
549-g' to generate a skeleton script with handlers for the read/write
550entry/exit events we recorded: root@crownbay:~# perf script -g python
551generated Python script: perf-script.py The skeleton script simply
552creates a python function for each event type in the perf.data file. The
553body of each function simply prints the event name along with its
554parameters. For example: def net__netif_rx(event_name, context,
555common_cpu, common_secs, common_nsecs, common_pid, common_comm, skbaddr,
556len, name): print_header(event_name, common_cpu, common_secs,
557common_nsecs, common_pid, common_comm) print "skbaddr=%u, len=%u,
558name=%s\n" % (skbaddr, len, name), We can run that script directly to
559print all of the events contained in the perf.data file:
560root@crownbay:~# perf script -s perf-script.py in trace_begin
561syscalls__sys_exit_read 0 11624.857082795 1262 perf nr=3, ret=0
562sched__sched_wakeup 0 11624.857193498 1262 perf comm=migration/0, pid=6,
563prio=0, success=1, target_cpu=0 irq__softirq_raise 1 11624.858021635
5641262 wget vec=TIMER irq__softirq_entry 1 11624.858074075 1262 wget
565vec=TIMER irq__softirq_exit 1 11624.858081389 1262 wget vec=TIMER
566syscalls__sys_enter_read 1 11624.858166434 1262 wget nr=3, fd=3,
567buf=3213019456, count=512 syscalls__sys_exit_read 1 11624.858177924 1262
568wget nr=3, ret=512 skb__kfree_skb 1 11624.858878188 1262 wget
569skbaddr=3945041280, location=3243922184, protocol=0 skb__kfree_skb 1
57011624.858945608 1262 wget skbaddr=3945037824, location=3243922184,
571protocol=0 irq__softirq_raise 1 11624.859020942 1262 wget vec=TIMER
572irq__softirq_entry 1 11624.859076935 1262 wget vec=TIMER
573irq__softirq_exit 1 11624.859083469 1262 wget vec=TIMER
574syscalls__sys_enter_read 1 11624.859167565 1262 wget nr=3, fd=3,
575buf=3077701632, count=1024 syscalls__sys_exit_read 1 11624.859192533
5761262 wget nr=3, ret=471 syscalls__sys_enter_read 1 11624.859228072 1262
577wget nr=3, fd=3, buf=3077701632, count=1024 syscalls__sys_exit_read 1
57811624.859233707 1262 wget nr=3, ret=0 syscalls__sys_enter_read 1
57911624.859573008 1262 wget nr=3, fd=3, buf=3213018496, count=512
580syscalls__sys_exit_read 1 11624.859584818 1262 wget nr=3, ret=512
581syscalls__sys_enter_read 1 11624.859864562 1262 wget nr=3, fd=3,
582buf=3077701632, count=1024 syscalls__sys_exit_read 1 11624.859888770
5831262 wget nr=3, ret=1024 syscalls__sys_enter_read 1 11624.859935140 1262
584wget nr=3, fd=3, buf=3077701632, count=1024 syscalls__sys_exit_read 1
58511624.859944032 1262 wget nr=3, ret=1024 That in itself isn't very
586useful; after all, we can accomplish pretty much the same thing by
587simply running 'perf script' without arguments in the same directory as
588the perf.data file.
589
590We can however replace the print statements in the generated function
591bodies with whatever we want, and thereby make it infinitely more
592useful.
593
594As a simple example, let's just replace the print statements in the
595function bodies with a simple function that does nothing but increment a
596per-event count. When the program is run against a perf.data file, each
597time a particular event is encountered, a tally is incremented for that
598event. For example: def net__netif_rx(event_name, context, common_cpu,
599common_secs, common_nsecs, common_pid, common_comm, skbaddr, len, name):
600inc_counts(event_name) Each event handler function in the generated code
601is modified to do this. For convenience, we define a common function
602called inc_counts() that each handler calls; inc_counts() simply tallies
603a count for each event using the 'counts' hash, which is a specialized
604hash function that does Perl-like autovivification, a capability that's
605extremely useful for kinds of multi-level aggregation commonly used in
606processing traces (see perf's documentation on the Python language
607binding for details): counts = autodict() def inc_counts(event_name):
608try: counts[event_name] += 1 except TypeError: counts[event_name] = 1
609Finally, at the end of the trace processing run, we want to print the
610result of all the per-event tallies. For that, we use the special
611'trace_end()' function: def trace_end(): for event_name, count in
612counts.iteritems(): print "%-40s %10s\n" % (event_name, count) The end
613result is a summary of all the events recorded in the trace:
614skb__skb_copy_datagram_iovec 13148 irq__softirq_entry 4796
615irq__irq_handler_exit 3805 irq__softirq_exit 4795
616syscalls__sys_enter_write 8990 net__net_dev_xmit 652 skb__kfree_skb 4047
617sched__sched_wakeup 1155 irq__irq_handler_entry 3804 irq__softirq_raise
6184799 net__net_dev_queue 652 syscalls__sys_enter_read 17599
619net__netif_receive_skb 1743 syscalls__sys_exit_read 17598 net__netif_rx
6202 napi__napi_poll 1877 syscalls__sys_exit_write 8990 Note that this is
621pretty much exactly the same information we get from 'perf stat', which
622goes a little way to support the idea mentioned previously that given
623the right kind of trace data, higher-level profiling-type summaries can
624be derived from it.
625
626Documentation on using the `'perf script' python
627binding <http://linux.die.net/man/1/perf-script-python>`__.
628
629System-Wide Tracing and Profiling
630~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
631
632The examples so far have focused on tracing a particular program or
633workload - in other words, every profiling run has specified the program
634to profile in the command-line e.g. 'perf record wget ...'.
635
636It's also possible, and more interesting in many cases, to run a
637system-wide profile or trace while running the workload in a separate
638shell.
639
640To do system-wide profiling or tracing, you typically use the -a flag to
641'perf record'.
642
643To demonstrate this, open up one window and start the profile using the
644-a flag (press Ctrl-C to stop tracing): root@crownbay:~# perf record -g
645-a ^C[ perf record: Woken up 6 times to write data ] [ perf record:
646Captured and wrote 1.400 MB perf.data (~61172 samples) ] In another
647window, run the wget test: root@crownbay:~# wget
648http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2
649Connecting to downloads.yoctoproject.org (140.211.169.59:80)
650linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k
6510:00:00 ETA Here we see entries not only for our wget load, but for
652other processes running on the system as well:
653
654In the snapshot above, we can see callchains that originate in libc, and
655a callchain from Xorg that demonstrates that we're using a proprietary X
656driver in userspace (notice the presence of 'PVR' and some other
657unresolvable symbols in the expanded Xorg callchain).
658
659Note also that we have both kernel and userspace entries in the above
660snapshot. We can also tell perf to focus on userspace but providing a
661modifier, in this case 'u', to the 'cycles' hardware counter when we
662record a profile: root@crownbay:~# perf record -g -a -e cycles:u ^C[
663perf record: Woken up 2 times to write data ] [ perf record: Captured
664and wrote 0.376 MB perf.data (~16443 samples) ]
665
666Notice in the screenshot above, we see only userspace entries ([.])
667
668Finally, we can press 'enter' on a leaf node and select the 'Zoom into
669DSO' menu item to show only entries associated with a specific DSO. In
670the screenshot below, we've zoomed into the 'libc' DSO which shows all
671the entries associated with the libc-xxx.so DSO.
672
673We can also use the system-wide -a switch to do system-wide tracing.
674Here we'll trace a couple of scheduler events: root@crownbay:~# perf
675record -a -e sched:sched_switch -e sched:sched_wakeup ^C[ perf record:
676Woken up 38 times to write data ] [ perf record: Captured and wrote
6779.780 MB perf.data (~427299 samples) ] We can look at the raw output
678using 'perf script' with no arguments: root@crownbay:~# perf script perf
6791383 [001] 6171.460045: sched_wakeup: comm=kworker/1:1 pid=21 prio=120
680success=1 target_cpu=001 perf 1383 [001] 6171.460066: sched_switch:
681prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==>
682next_comm=kworker/1:1 next_pid=21 next_prio=120 kworker/1:1 21 [001]
6836171.460093: sched_switch: prev_comm=kworker/1:1 prev_pid=21
684prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383
685next_prio=120 swapper 0 [000] 6171.468063: sched_wakeup:
686comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000 swapper 0
687[000] 6171.468107: sched_switch: prev_comm=swapper/0 prev_pid=0
688prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209
689next_prio=120 kworker/0:3 1209 [000] 6171.468143: sched_switch:
690prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==>
691next_comm=swapper/0 next_pid=0 next_prio=120 perf 1383 [001]
6926171.470039: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1
693target_cpu=001 perf 1383 [001] 6171.470058: sched_switch: prev_comm=perf
694prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1
695next_pid=21 next_prio=120 kworker/1:1 21 [001] 6171.470082:
696sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120
697prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120 perf 1383
698[001] 6171.480035: sched_wakeup: comm=kworker/1:1 pid=21 prio=120
699success=1 target_cpu=001
700
701.. _perf-filtering:
702
703Filtering
704^^^^^^^^^
705
706Notice that there are a lot of events that don't really have anything to
707do with what we're interested in, namely events that schedule 'perf'
708itself in and out or that wake perf up. We can get rid of those by using
709the '--filter' option - for each event we specify using -e, we can add a
710--filter after that to filter out trace events that contain fields with
711specific values: root@crownbay:~# perf record -a -e sched:sched_switch
712--filter 'next_comm != perf && prev_comm != perf' -e sched:sched_wakeup
713--filter 'comm != perf' ^C[ perf record: Woken up 38 times to write data
714] [ perf record: Captured and wrote 9.688 MB perf.data (~423279 samples)
715] root@crownbay:~# perf script swapper 0 [000] 7932.162180:
716sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R
717==> next_comm=kworker/0:3 next_pid=1209 next_prio=120 kworker/0:3 1209
718[000] 7932.162236: sched_switch: prev_comm=kworker/0:3 prev_pid=1209
719prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0
720next_prio=120 perf 1407 [001] 7932.170048: sched_wakeup:
721comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 perf 1407
722[001] 7932.180044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120
723success=1 target_cpu=001 perf 1407 [001] 7932.190038: sched_wakeup:
724comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 perf 1407
725[001] 7932.200044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120
726success=1 target_cpu=001 perf 1407 [001] 7932.210044: sched_wakeup:
727comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 perf 1407
728[001] 7932.220044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120
729success=1 target_cpu=001 swapper 0 [001] 7932.230111: sched_wakeup:
730comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 swapper 0
731[001] 7932.230146: sched_switch: prev_comm=swapper/1 prev_pid=0
732prev_prio=120 prev_state=R ==> next_comm=kworker/1:1 next_pid=21
733next_prio=120 kworker/1:1 21 [001] 7932.230205: sched_switch:
734prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==>
735next_comm=swapper/1 next_pid=0 next_prio=120 swapper 0 [000]
7367932.326109: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1
737target_cpu=000 swapper 0 [000] 7932.326171: sched_switch:
738prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==>
739next_comm=kworker/0:3 next_pid=1209 next_prio=120 kworker/0:3 1209 [000]
7407932.326214: sched_switch: prev_comm=kworker/0:3 prev_pid=1209
741prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0
742next_prio=120 In this case, we've filtered out all events that have
743'perf' in their 'comm' or 'comm_prev' or 'comm_next' fields. Notice that
744there are still events recorded for perf, but notice that those events
745don't have values of 'perf' for the filtered fields. To completely
746filter out anything from perf will require a bit more work, but for the
747purpose of demonstrating how to use filters, it's close enough.
748
749.. container:: informalexample
750
751 Tying it Together:
752 These are exactly the same set of event filters defined by the trace
753 event subsystem. See the ftrace/tracecmd/kernelshark section for more
754 discussion about these event filters.
755
756.. container:: informalexample
757
758 Tying it Together:
759 These event filters are implemented by a special-purpose
760 pseudo-interpreter in the kernel and are an integral and
761 indispensable part of the perf design as it relates to tracing.
762 kernel-based event filters provide a mechanism to precisely throttle
763 the event stream that appears in user space, where it makes sense to
764 provide bindings to real programming languages for postprocessing the
765 event stream. This architecture allows for the intelligent and
766 flexible partitioning of processing between the kernel and user
767 space. Contrast this with other tools such as SystemTap, which does
768 all of its processing in the kernel and as such requires a special
769 project-defined language in order to accommodate that design, or
770 LTTng, where everything is sent to userspace and as such requires a
771 super-efficient kernel-to-userspace transport mechanism in order to
772 function properly. While perf certainly can benefit from for instance
773 advances in the design of the transport, it doesn't fundamentally
774 depend on them. Basically, if you find that your perf tracing
775 application is causing buffer I/O overruns, it probably means that
776 you aren't taking enough advantage of the kernel filtering engine.
777
778Using Dynamic Tracepoints
779~~~~~~~~~~~~~~~~~~~~~~~~~
780
781perf isn't restricted to the fixed set of static tracepoints listed by
782'perf list'. Users can also add their own 'dynamic' tracepoints anywhere
783in the kernel. For instance, suppose we want to define our own
784tracepoint on do_fork(). We can do that using the 'perf probe' perf
785subcommand: root@crownbay:~# perf probe do_fork Added new event:
786probe:do_fork (on do_fork) You can now use it in all perf tools, such
787as: perf record -e probe:do_fork -aR sleep 1 Adding a new tracepoint via
788'perf probe' results in an event with all the expected files and format
789in /sys/kernel/debug/tracing/events, just the same as for static
790tracepoints (as discussed in more detail in the trace events subsystem
791section: root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork#
792ls -al drwxr-xr-x 2 root root 0 Oct 28 11:42 . drwxr-xr-x 3 root root 0
793Oct 28 11:42 .. -rw-r--r-- 1 root root 0 Oct 28 11:42 enable -rw-r--r--
7941 root root 0 Oct 28 11:42 filter -r--r--r-- 1 root root 0 Oct 28 11:42
795format -r--r--r-- 1 root root 0 Oct 28 11:42 id
796root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# cat format
797name: do_fork ID: 944 format: field:unsigned short common_type;
798offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2;
799size:1; signed:0; field:unsigned char common_preempt_count; offset:3;
800size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1;
801field:int common_padding; offset:8; size:4; signed:1; field:unsigned
802long \__probe_ip; offset:12; size:4; signed:0; print fmt: "(%lx)",
803REC->__probe_ip We can list all dynamic tracepoints currently in
804existence: root@crownbay:~# perf probe -l probe:do_fork (on do_fork)
805probe:schedule (on schedule) Let's record system-wide ('sleep 30' is a
806trick for recording system-wide but basically do nothing and then wake
807up after 30 seconds): root@crownbay:~# perf record -g -a -e
808probe:do_fork sleep 30 [ perf record: Woken up 1 times to write data ] [
809perf record: Captured and wrote 0.087 MB perf.data (~3812 samples) ]
810Using 'perf script' we can see each do_fork event that fired:
811root@crownbay:~# perf script # ======== # captured on: Sun Oct 28
81211:55:18 2012 # hostname : crownbay # os release : 3.4.11-yocto-standard
813# perf version : 3.4.11 # arch : i686 # nrcpus online : 2 # nrcpus avail
814: 2 # cpudesc : Intel(R) Atom(TM) CPU E660 @ 1.30GHz # cpuid :
815GenuineIntel,6,38,1 # total memory : 1017184 kB # cmdline :
816/usr/bin/perf record -g -a -e probe:do_fork sleep 30 # event : name =
817probe:do_fork, type = 2, config = 0x3b0, config1 = 0x0, config2 = 0x0,
818excl_usr = 0, excl_kern = 0, id = { 5, 6 } # HEADER_CPU_TOPOLOGY info
819available, use -I to display # ======== # matchbox-deskto 1197 [001]
82034211.378318: do_fork: (c1028460) matchbox-deskto 1295 [001]
82134211.380388: do_fork: (c1028460) pcmanfm 1296 [000] 34211.632350:
822do_fork: (c1028460) pcmanfm 1296 [000] 34211.639917: do_fork: (c1028460)
823matchbox-deskto 1197 [001] 34217.541603: do_fork: (c1028460)
824matchbox-deskto 1299 [001] 34217.543584: do_fork: (c1028460) gthumb 1300
825[001] 34217.697451: do_fork: (c1028460) gthumb 1300 [001] 34219.085734:
826do_fork: (c1028460) gthumb 1300 [000] 34219.121351: do_fork: (c1028460)
827gthumb 1300 [001] 34219.264551: do_fork: (c1028460) pcmanfm 1296 [000]
82834219.590380: do_fork: (c1028460) matchbox-deskto 1197 [001]
82934224.955965: do_fork: (c1028460) matchbox-deskto 1306 [001]
83034224.957972: do_fork: (c1028460) matchbox-termin 1307 [000]
83134225.038214: do_fork: (c1028460) matchbox-termin 1307 [001]
83234225.044218: do_fork: (c1028460) matchbox-termin 1307 [000]
83334225.046442: do_fork: (c1028460) matchbox-deskto 1197 [001]
83434237.112138: do_fork: (c1028460) matchbox-deskto 1311 [001]
83534237.114106: do_fork: (c1028460) gaku 1312 [000] 34237.202388: do_fork:
836(c1028460) And using 'perf report' on the same file, we can see the
837callgraphs from starting a few programs during those 30 seconds:
838
839.. container:: informalexample
840
841 Tying it Together:
842 The trace events subsystem accommodate static and dynamic tracepoints
843 in exactly the same way - there's no difference as far as the
844 infrastructure is concerned. See the ftrace section for more details
845 on the trace event subsystem.
846
847.. container:: informalexample
848
849 Tying it Together:
850 Dynamic tracepoints are implemented under the covers by kprobes and
851 uprobes. kprobes and uprobes are also used by and in fact are the
852 main focus of SystemTap.
853
854.. _perf-documentation:
855
856Documentation
857-------------
858
859Online versions of the man pages for the commands discussed in this
860section can be found here:
861
862- The `'perf stat' manpage <http://linux.die.net/man/1/perf-stat>`__.
863
864- The `'perf record'
865 manpage <http://linux.die.net/man/1/perf-record>`__.
866
867- The `'perf report'
868 manpage <http://linux.die.net/man/1/perf-report>`__.
869
870- The `'perf probe' manpage <http://linux.die.net/man/1/perf-probe>`__.
871
872- The `'perf script'
873 manpage <http://linux.die.net/man/1/perf-script>`__.
874
875- Documentation on using the `'perf script' python
876 binding <http://linux.die.net/man/1/perf-script-python>`__.
877
878- The top-level `perf(1) manpage <http://linux.die.net/man/1/perf>`__.
879
880Normally, you should be able to invoke the man pages via perf itself
881e.g. 'perf help' or 'perf help record'.
882
883However, by default Yocto doesn't install man pages, but perf invokes
884the man pages for most help functionality. This is a bug and is being
885addressed by a Yocto bug: `Bug 3388 - perf: enable man pages for basic
886'help'
887functionality <https://bugzilla.yoctoproject.org/show_bug.cgi?id=3388>`__.
888
889The man pages in text form, along with some other files, such as a set
890of examples, can be found in the 'perf' directory of the kernel tree:
891tools/perf/Documentation There's also a nice perf tutorial on the perf
892wiki that goes into more detail than we do here in certain areas: `Perf
893Tutorial <https://perf.wiki.kernel.org/index.php/Tutorial>`__
894
895.. _profile-manual-ftrace:
896
897ftrace
898======
899
900'ftrace' literally refers to the 'ftrace function tracer' but in reality
901this encompasses a number of related tracers along with the
902infrastructure that they all make use of.
903
904.. _ftrace-setup:
905
906Setup
907-----
908
909For this section, we'll assume you've already performed the basic setup
910outlined in the General Setup section.
911
912ftrace, trace-cmd, and kernelshark run on the target system, and are
913ready to go out-of-the-box - no additional setup is necessary. For the
914rest of this section we assume you've ssh'ed to the host and will be
915running ftrace on the target. kernelshark is a GUI application and if
916you use the '-X' option to ssh you can have the kernelshark GUI run on
917the target but display remotely on the host if you want.
918
919Basic ftrace usage
920------------------
921
922'ftrace' essentially refers to everything included in the /tracing
923directory of the mounted debugfs filesystem (Yocto follows the standard
924convention and mounts it at /sys/kernel/debug). Here's a listing of all
925the files found in /sys/kernel/debug/tracing on a Yocto system:
926root@sugarbay:/sys/kernel/debug/tracing# ls README kprobe_events trace
927available_events kprobe_profile trace_clock available_filter_functions
928options trace_marker available_tracers per_cpu trace_options
929buffer_size_kb printk_formats trace_pipe buffer_total_size_kb
930saved_cmdlines tracing_cpumask current_tracer set_event tracing_enabled
931dyn_ftrace_total_info set_ftrace_filter tracing_on enabled_functions
932set_ftrace_notrace tracing_thresh events set_ftrace_pid free_buffer
933set_graph_function The files listed above are used for various purposes
934- some relate directly to the tracers themselves, others are used to set
935tracing options, and yet others actually contain the tracing output when
936a tracer is in effect. Some of the functions can be guessed from their
937names, others need explanation; in any case, we'll cover some of the
938files we see here below but for an explanation of the others, please see
939the ftrace documentation.
940
941We'll start by looking at some of the available built-in tracers.
942
943cat'ing the 'available_tracers' file lists the set of available tracers:
944root@sugarbay:/sys/kernel/debug/tracing# cat available_tracers blk
945function_graph function nop The 'current_tracer' file contains the
946tracer currently in effect: root@sugarbay:/sys/kernel/debug/tracing# cat
947current_tracer nop The above listing of current_tracer shows that the
948'nop' tracer is in effect, which is just another way of saying that
949there's actually no tracer currently in effect.
950
951echo'ing one of the available_tracers into current_tracer makes the
952specified tracer the current tracer:
953root@sugarbay:/sys/kernel/debug/tracing# echo function > current_tracer
954root@sugarbay:/sys/kernel/debug/tracing# cat current_tracer function The
955above sets the current tracer to be the 'function tracer'. This tracer
956traces every function call in the kernel and makes it available as the
957contents of the 'trace' file. Reading the 'trace' file lists the
958currently buffered function calls that have been traced by the function
959tracer: root@sugarbay:/sys/kernel/debug/tracing# cat trace \| less #
960tracer: function # # entries-in-buffer/entries-written: 310629/766471
961#P:8 # # \_-----=> irqs-off # / \_----=> need-resched # \| / \_---=>
962hardirq/softirq # \|\| / \_--=> preempt-depth # \||\| / delay # TASK-PID
963CPU# \|||\| TIMESTAMP FUNCTION # \| \| \| \|||\| \| \| <idle>-0 [004]
964d..1 470.867169: ktime_get_real <-intel_idle <idle>-0 [004] d..1
965470.867170: getnstimeofday <-ktime_get_real <idle>-0 [004] d..1
966470.867171: ns_to_timeval <-intel_idle <idle>-0 [004] d..1 470.867171:
967ns_to_timespec <-ns_to_timeval <idle>-0 [004] d..1 470.867172:
968smp_apic_timer_interrupt <-apic_timer_interrupt <idle>-0 [004] d..1
969470.867172: native_apic_mem_write <-smp_apic_timer_interrupt <idle>-0
970[004] d..1 470.867172: irq_enter <-smp_apic_timer_interrupt <idle>-0
971[004] d..1 470.867172: rcu_irq_enter <-irq_enter <idle>-0 [004] d..1
972470.867173: rcu_idle_exit_common.isra.33 <-rcu_irq_enter <idle>-0 [004]
973d..1 470.867173: local_bh_disable <-irq_enter <idle>-0 [004] d..1
974470.867173: add_preempt_count <-local_bh_disable <idle>-0 [004] d.s1
975470.867174: tick_check_idle <-irq_enter <idle>-0 [004] d.s1 470.867174:
976tick_check_oneshot_broadcast <-tick_check_idle <idle>-0 [004] d.s1
977470.867174: ktime_get <-tick_check_idle <idle>-0 [004] d.s1 470.867174:
978tick_nohz_stop_idle <-tick_check_idle <idle>-0 [004] d.s1 470.867175:
979update_ts_time_stats <-tick_nohz_stop_idle <idle>-0 [004] d.s1
980470.867175: nr_iowait_cpu <-update_ts_time_stats <idle>-0 [004] d.s1
981470.867175: tick_do_update_jiffies64 <-tick_check_idle <idle>-0 [004]
982d.s1 470.867175: \_raw_spin_lock <-tick_do_update_jiffies64 <idle>-0
983[004] d.s1 470.867176: add_preempt_count <-_raw_spin_lock <idle>-0 [004]
984d.s2 470.867176: do_timer <-tick_do_update_jiffies64 <idle>-0 [004] d.s2
985470.867176: \_raw_spin_lock <-do_timer <idle>-0 [004] d.s2 470.867176:
986add_preempt_count <-_raw_spin_lock <idle>-0 [004] d.s3 470.867177:
987ntp_tick_length <-do_timer <idle>-0 [004] d.s3 470.867177:
988\_raw_spin_lock_irqsave <-ntp_tick_length . . . Each line in the trace
989above shows what was happening in the kernel on a given cpu, to the
990level of detail of function calls. Each entry shows the function called,
991followed by its caller (after the arrow).
992
993The function tracer gives you an extremely detailed idea of what the
994kernel was doing at the point in time the trace was taken, and is a
995great way to learn about how the kernel code works in a dynamic sense.
996
997.. container:: informalexample
998
999 Tying it Together:
1000 The ftrace function tracer is also available from within perf, as the
1001 ftrace:function tracepoint.
1002
1003It is a little more difficult to follow the call chains than it needs to
1004be - luckily there's a variant of the function tracer that displays the
1005callchains explicitly, called the 'function_graph' tracer:
1006root@sugarbay:/sys/kernel/debug/tracing# echo function_graph >
1007current_tracer root@sugarbay:/sys/kernel/debug/tracing# cat trace \|
1008less tracer: function_graph CPU DURATION FUNCTION CALLS \| \| \| \| \|
1009\| \| 7) 0.046 us \| pick_next_task_fair(); 7) 0.043 us \|
1010pick_next_task_stop(); 7) 0.042 us \| pick_next_task_rt(); 7) 0.032 us
1011\| pick_next_task_fair(); 7) 0.030 us \| pick_next_task_idle(); 7) \|
1012\_raw_spin_unlock_irq() { 7) 0.033 us \| sub_preempt_count(); 7) 0.258
1013us \| } 7) 0.032 us \| sub_preempt_count(); 7) + 13.341 us \| } /\*
1014\__schedule \*/ 7) 0.095 us \| } /\* sub_preempt_count \*/ 7) \|
1015schedule() { 7) \| \__schedule() { 7) 0.060 us \| add_preempt_count();
10167) 0.044 us \| rcu_note_context_switch(); 7) \| \_raw_spin_lock_irq() {
10177) 0.033 us \| add_preempt_count(); 7) 0.247 us \| } 7) \|
1018idle_balance() { 7) \| \_raw_spin_unlock() { 7) 0.031 us \|
1019sub_preempt_count(); 7) 0.246 us \| } 7) \| update_shares() { 7) 0.030
1020us \| \__rcu_read_lock(); 7) 0.029 us \| \__rcu_read_unlock(); 7) 0.484
1021us \| } 7) 0.030 us \| \__rcu_read_lock(); 7) \| load_balance() { 7) \|
1022find_busiest_group() { 7) 0.031 us \| idle_cpu(); 7) 0.029 us \|
1023idle_cpu(); 7) 0.035 us \| idle_cpu(); 7) 0.906 us \| } 7) 1.141 us \| }
10247) 0.022 us \| msecs_to_jiffies(); 7) \| load_balance() { 7) \|
1025find_busiest_group() { 7) 0.031 us \| idle_cpu(); . . . 4) 0.062 us \|
1026msecs_to_jiffies(); 4) 0.062 us \| \__rcu_read_unlock(); 4) \|
1027\_raw_spin_lock() { 4) 0.073 us \| add_preempt_count(); 4) 0.562 us \| }
10284) + 17.452 us \| } 4) 0.108 us \| put_prev_task_fair(); 4) 0.102 us \|
1029pick_next_task_fair(); 4) 0.084 us \| pick_next_task_stop(); 4) 0.075 us
1030\| pick_next_task_rt(); 4) 0.062 us \| pick_next_task_fair(); 4) 0.066
1031us \| pick_next_task_idle(); ------------------------------------------
10324) kworker-74 => <idle>-0 ------------------------------------------ 4)
1033\| finish_task_switch() { 4) \| \_raw_spin_unlock_irq() { 4) 0.100 us \|
1034sub_preempt_count(); 4) 0.582 us \| } 4) 1.105 us \| } 4) 0.088 us \|
1035sub_preempt_count(); 4) ! 100.066 us \| } . . . 3) \| sys_ioctl() { 3)
10360.083 us \| fget_light(); 3) \| security_file_ioctl() { 3) 0.066 us \|
1037cap_file_ioctl(); 3) 0.562 us \| } 3) \| do_vfs_ioctl() { 3) \|
1038drm_ioctl() { 3) 0.075 us \| drm_ut_debug_printk(); 3) \|
1039i915_gem_pwrite_ioctl() { 3) \| i915_mutex_lock_interruptible() { 3)
10400.070 us \| mutex_lock_interruptible(); 3) 0.570 us \| } 3) \|
1041drm_gem_object_lookup() { 3) \| \_raw_spin_lock() { 3) 0.080 us \|
1042add_preempt_count(); 3) 0.620 us \| } 3) \| \_raw_spin_unlock() { 3)
10430.085 us \| sub_preempt_count(); 3) 0.562 us \| } 3) 2.149 us \| } 3)
10440.133 us \| i915_gem_object_pin(); 3) \|
1045i915_gem_object_set_to_gtt_domain() { 3) 0.065 us \|
1046i915_gem_object_flush_gpu_write_domain(); 3) 0.065 us \|
1047i915_gem_object_wait_rendering(); 3) 0.062 us \|
1048i915_gem_object_flush_cpu_write_domain(); 3) 1.612 us \| } 3) \|
1049i915_gem_object_put_fence() { 3) 0.097 us \|
1050i915_gem_object_flush_fence.constprop.36(); 3) 0.645 us \| } 3) 0.070 us
1051\| add_preempt_count(); 3) 0.070 us \| sub_preempt_count(); 3) 0.073 us
1052\| i915_gem_object_unpin(); 3) 0.068 us \| mutex_unlock(); 3) 9.924 us
1053\| } 3) + 11.236 us \| } 3) + 11.770 us \| } 3) + 13.784 us \| } 3) \|
1054sys_ioctl() { As you can see, the function_graph display is much easier
1055to follow. Also note that in addition to the function calls and
1056associated braces, other events such as scheduler events are displayed
1057in context. In fact, you can freely include any tracepoint available in
1058the trace events subsystem described in the next section by simply
1059enabling those events, and they'll appear in context in the function
1060graph display. Quite a powerful tool for understanding kernel dynamics.
1061
1062Also notice that there are various annotations on the left hand side of
1063the display. For example if the total time it took for a given function
1064to execute is above a certain threshold, an exclamation point or plus
1065sign appears on the left hand side. Please see the ftrace documentation
1066for details on all these fields.
1067
1068The 'trace events' Subsystem
1069----------------------------
1070
1071One especially important directory contained within the
1072/sys/kernel/debug/tracing directory is the 'events' subdirectory, which
1073contains representations of every tracepoint in the system. Listing out
1074the contents of the 'events' subdirectory, we see mainly another set of
1075subdirectories: root@sugarbay:/sys/kernel/debug/tracing# cd events
1076root@sugarbay:/sys/kernel/debug/tracing/events# ls -al drwxr-xr-x 38
1077root root 0 Nov 14 23:19 . drwxr-xr-x 5 root root 0 Nov 14 23:19 ..
1078drwxr-xr-x 19 root root 0 Nov 14 23:19 block drwxr-xr-x 32 root root 0
1079Nov 14 23:19 btrfs drwxr-xr-x 5 root root 0 Nov 14 23:19 drm -rw-r--r--
10801 root root 0 Nov 14 23:19 enable drwxr-xr-x 40 root root 0 Nov 14 23:19
1081ext3 drwxr-xr-x 79 root root 0 Nov 14 23:19 ext4 drwxr-xr-x 14 root root
10820 Nov 14 23:19 ftrace drwxr-xr-x 8 root root 0 Nov 14 23:19 hda
1083-r--r--r-- 1 root root 0 Nov 14 23:19 header_event -r--r--r-- 1 root
1084root 0 Nov 14 23:19 header_page drwxr-xr-x 25 root root 0 Nov 14 23:19
1085i915 drwxr-xr-x 7 root root 0 Nov 14 23:19 irq drwxr-xr-x 12 root root 0
1086Nov 14 23:19 jbd drwxr-xr-x 14 root root 0 Nov 14 23:19 jbd2 drwxr-xr-x
108714 root root 0 Nov 14 23:19 kmem drwxr-xr-x 7 root root 0 Nov 14 23:19
1088module drwxr-xr-x 3 root root 0 Nov 14 23:19 napi drwxr-xr-x 6 root root
10890 Nov 14 23:19 net drwxr-xr-x 3 root root 0 Nov 14 23:19 oom drwxr-xr-x
109012 root root 0 Nov 14 23:19 power drwxr-xr-x 3 root root 0 Nov 14 23:19
1091printk drwxr-xr-x 8 root root 0 Nov 14 23:19 random drwxr-xr-x 4 root
1092root 0 Nov 14 23:19 raw_syscalls drwxr-xr-x 3 root root 0 Nov 14 23:19
1093rcu drwxr-xr-x 6 root root 0 Nov 14 23:19 rpm drwxr-xr-x 20 root root 0
1094Nov 14 23:19 sched drwxr-xr-x 7 root root 0 Nov 14 23:19 scsi drwxr-xr-x
10954 root root 0 Nov 14 23:19 signal drwxr-xr-x 5 root root 0 Nov 14 23:19
1096skb drwxr-xr-x 4 root root 0 Nov 14 23:19 sock drwxr-xr-x 10 root root 0
1097Nov 14 23:19 sunrpc drwxr-xr-x 538 root root 0 Nov 14 23:19 syscalls
1098drwxr-xr-x 4 root root 0 Nov 14 23:19 task drwxr-xr-x 14 root root 0 Nov
109914 23:19 timer drwxr-xr-x 3 root root 0 Nov 14 23:19 udp drwxr-xr-x 21
1100root root 0 Nov 14 23:19 vmscan drwxr-xr-x 3 root root 0 Nov 14 23:19
1101vsyscall drwxr-xr-x 6 root root 0 Nov 14 23:19 workqueue drwxr-xr-x 26
1102root root 0 Nov 14 23:19 writeback Each one of these subdirectories
1103corresponds to a 'subsystem' and contains yet again more subdirectories,
1104each one of those finally corresponding to a tracepoint. For example,
1105here are the contents of the 'kmem' subsystem:
1106root@sugarbay:/sys/kernel/debug/tracing/events# cd kmem
1107root@sugarbay:/sys/kernel/debug/tracing/events/kmem# ls -al drwxr-xr-x
110814 root root 0 Nov 14 23:19 . drwxr-xr-x 38 root root 0 Nov 14 23:19 ..
1109-rw-r--r-- 1 root root 0 Nov 14 23:19 enable -rw-r--r-- 1 root root 0
1110Nov 14 23:19 filter drwxr-xr-x 2 root root 0 Nov 14 23:19 kfree
1111drwxr-xr-x 2 root root 0 Nov 14 23:19 kmalloc drwxr-xr-x 2 root root 0
1112Nov 14 23:19 kmalloc_node drwxr-xr-x 2 root root 0 Nov 14 23:19
1113kmem_cache_alloc drwxr-xr-x 2 root root 0 Nov 14 23:19
1114kmem_cache_alloc_node drwxr-xr-x 2 root root 0 Nov 14 23:19
1115kmem_cache_free drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc
1116drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc_extfrag drwxr-xr-x 2
1117root root 0 Nov 14 23:19 mm_page_alloc_zone_locked drwxr-xr-x 2 root
1118root 0 Nov 14 23:19 mm_page_free drwxr-xr-x 2 root root 0 Nov 14 23:19
1119mm_page_free_batched drwxr-xr-x 2 root root 0 Nov 14 23:19
1120mm_page_pcpu_drain Let's see what's inside the subdirectory for a
1121specific tracepoint, in this case the one for kmalloc:
1122root@sugarbay:/sys/kernel/debug/tracing/events/kmem# cd kmalloc
1123root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# ls -al
1124drwxr-xr-x 2 root root 0 Nov 14 23:19 . drwxr-xr-x 14 root root 0 Nov 14
112523:19 .. -rw-r--r-- 1 root root 0 Nov 14 23:19 enable -rw-r--r-- 1 root
1126root 0 Nov 14 23:19 filter -r--r--r-- 1 root root 0 Nov 14 23:19 format
1127-r--r--r-- 1 root root 0 Nov 14 23:19 id The 'format' file for the
1128tracepoint describes the event in memory, which is used by the various
1129tracing tools that now make use of these tracepoint to parse the event
1130and make sense of it, along with a 'print fmt' field that allows tools
1131like ftrace to display the event as text. Here's what the format of the
1132kmalloc event looks like:
1133root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# cat format
1134name: kmalloc ID: 313 format: field:unsigned short common_type;
1135offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2;
1136size:1; signed:0; field:unsigned char common_preempt_count; offset:3;
1137size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1;
1138field:int common_padding; offset:8; size:4; signed:1; field:unsigned
1139long call_site; offset:16; size:8; signed:0; field:const void \* ptr;
1140offset:24; size:8; signed:0; field:size_t bytes_req; offset:32; size:8;
1141signed:0; field:size_t bytes_alloc; offset:40; size:8; signed:0;
1142field:gfp_t gfp_flags; offset:48; size:4; signed:0; print fmt:
1143"call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s",
1144REC->call_site, REC->ptr, REC->bytes_req, REC->bytes_alloc,
1145(REC->gfp_flags) ? \__print_flags(REC->gfp_flags, "|", {(unsigned
1146long)(((( gfp_t)0x10u) \| (( gfp_t)0x40u) \| (( gfp_t)0x80u) \| ((
1147gfp_t)0x20000u) \| (( gfp_t)0x02u) \| (( gfp_t)0x08u)) \| ((
1148gfp_t)0x4000u) \| (( gfp_t)0x10000u) \| (( gfp_t)0x1000u) \| ((
1149gfp_t)0x200u) \| (( gfp_t)0x400000u)), "GFP_TRANSHUGE"}, {(unsigned
1150long)((( gfp_t)0x10u) \| (( gfp_t)0x40u) \| (( gfp_t)0x80u) \| ((
1151gfp_t)0x20000u) \| (( gfp_t)0x02u) \| (( gfp_t)0x08u)),
1152"GFP_HIGHUSER_MOVABLE"}, {(unsigned long)((( gfp_t)0x10u) \| ((
1153gfp_t)0x40u) \| (( gfp_t)0x80u) \| (( gfp_t)0x20000u) \| ((
1154gfp_t)0x02u)), "GFP_HIGHUSER"}, {(unsigned long)((( gfp_t)0x10u) \| ((
1155gfp_t)0x40u) \| (( gfp_t)0x80u) \| (( gfp_t)0x20000u)), "GFP_USER"},
1156{(unsigned long)((( gfp_t)0x10u) \| (( gfp_t)0x40u) \| (( gfp_t)0x80u)
1157\| (( gfp_t)0x80000u)), GFP_TEMPORARY"}, {(unsigned long)(((
1158gfp_t)0x10u) \| (( gfp_t)0x40u) \| (( gfp_t)0x80u)), "GFP_KERNEL"},
1159{(unsigned long)((( gfp_t)0x10u) \| (( gfp_t)0x40u)), "GFP_NOFS"},
1160{(unsigned long)((( gfp_t)0x20u)), "GFP_ATOMIC"}, {(unsigned long)(((
1161gfp_t)0x10u)), "GFP_NOIO"}, {(unsigned long)(( gfp_t)0x20u),
1162"GFP_HIGH"}, {(unsigned long)(( gfp_t)0x10u), "GFP_WAIT"}, {(unsigned
1163long)(( gfp_t)0x40u), "GFP_IO"}, {(unsigned long)(( gfp_t)0x100u),
1164"GFP_COLD"}, {(unsigned long)(( gfp_t)0x200u), "GFP_NOWARN"}, {(unsigned
1165long)(( gfp_t)0x400u), "GFP_REPEAT"}, {(unsigned long)(( gfp_t)0x800u),
1166"GFP_NOFAIL"}, {(unsigned long)(( gfp_t)0x1000u), "GFP_NORETRY"},
1167{(unsigned long)(( gfp_t)0x4000u), "GFP_COMP"}, {(unsigned long)((
1168gfp_t)0x8000u), "GFP_ZERO"}, {(unsigned long)(( gfp_t)0x10000u),
1169"GFP_NOMEMALLOC"}, {(unsigned long)(( gfp_t)0x20000u), "GFP_HARDWALL"},
1170{(unsigned long)(( gfp_t)0x40000u), "GFP_THISNODE"}, {(unsigned long)((
1171gfp_t)0x80000u), "GFP_RECLAIMABLE"}, {(unsigned long)(( gfp_t)0x08u),
1172"GFP_MOVABLE"}, {(unsigned long)(( gfp_t)0), "GFP_NOTRACK"}, {(unsigned
1173long)(( gfp_t)0x400000u), "GFP_NO_KSWAPD"}, {(unsigned long)((
1174gfp_t)0x800000u), "GFP_OTHER_NODE"} ) : "GFP_NOWAIT" The 'enable' file
1175in the tracepoint directory is what allows the user (or tools such as
1176trace-cmd) to actually turn the tracepoint on and off. When enabled, the
1177corresponding tracepoint will start appearing in the ftrace 'trace' file
1178described previously. For example, this turns on the kmalloc tracepoint:
1179root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 1 >
1180enable At the moment, we're not interested in the function tracer or
1181some other tracer that might be in effect, so we first turn it off, but
1182if we do that, we still need to turn tracing on in order to see the
1183events in the output buffer: root@sugarbay:/sys/kernel/debug/tracing#
1184echo nop > current_tracer root@sugarbay:/sys/kernel/debug/tracing# echo
11851 > tracing_on Now, if we look at the the 'trace' file, we see nothing
1186but the kmalloc events we just turned on:
1187root@sugarbay:/sys/kernel/debug/tracing# cat trace \| less # tracer: nop
1188# # entries-in-buffer/entries-written: 1897/1897 #P:8 # # \_-----=>
1189irqs-off # / \_----=> need-resched # \| / \_---=> hardirq/softirq # \|\|
1190/ \_--=> preempt-depth # \||\| / delay # TASK-PID CPU# \|||\| TIMESTAMP
1191FUNCTION # \| \| \| \|||\| \| \| dropbear-1465 [000] ...1 18154.620753:
1192kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048
1193bytes_alloc=2048 gfp_flags=GFP_KERNEL <idle>-0 [000] ..s3 18154.621640:
1194kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512
1195bytes_alloc=512 gfp_flags=GFP_ATOMIC <idle>-0 [000] ..s3 18154.621656:
1196kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512
1197bytes_alloc=512 gfp_flags=GFP_ATOMIC matchbox-termin-1361 [001] ...1
119818154.755472: kmalloc: call_site=ffffffff81614050 ptr=ffff88006d5f0e00
1199bytes_req=512 bytes_alloc=512 gfp_flags=GFP_KERNEL|GFP_REPEAT Xorg-1264
1200[002] ...1 18154.755581: kmalloc: call_site=ffffffff8141abe8
1201ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192
1202gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY Xorg-1264 [002] ...1
120318154.755583: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520
1204bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO Xorg-1264
1205[002] ...1 18154.755589: kmalloc: call_site=ffffffff81419edb
1206ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64
1207gfp_flags=GFP_KERNEL|GFP_ZERO matchbox-termin-1361 [001] ...1
120818155.354594: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db35400
1209bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT Xorg-1264
1210[002] ...1 18155.354703: kmalloc: call_site=ffffffff8141abe8
1211ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192
1212gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY Xorg-1264 [002] ...1
121318155.354705: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520
1214bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO Xorg-1264
1215[002] ...1 18155.354711: kmalloc: call_site=ffffffff81419edb
1216ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64
1217gfp_flags=GFP_KERNEL|GFP_ZERO <idle>-0 [000] ..s3 18155.673319: kmalloc:
1218call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512
1219bytes_alloc=512 gfp_flags=GFP_ATOMIC dropbear-1465 [000] ...1
122018155.673525: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000
1221bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL <idle>-0 [000] ..s3
122218155.674821: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800
1223bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC <idle>-0 [000] ..s3
122418155.793014: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800
1225bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC dropbear-1465 [000]
1226...1 18155.793219: kmalloc: call_site=ffffffff816650d4
1227ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048
1228gfp_flags=GFP_KERNEL <idle>-0 [000] ..s3 18155.794147: kmalloc:
1229call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512
1230bytes_alloc=512 gfp_flags=GFP_ATOMIC <idle>-0 [000] ..s3 18155.936705:
1231kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512
1232bytes_alloc=512 gfp_flags=GFP_ATOMIC dropbear-1465 [000] ...1
123318155.936910: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000
1234bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL <idle>-0 [000] ..s3
123518155.937869: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800
1236bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC matchbox-termin-1361
1237[001] ...1 18155.953667: kmalloc: call_site=ffffffff81614050
1238ptr=ffff88006d5f2000 bytes_req=512 bytes_alloc=512
1239gfp_flags=GFP_KERNEL|GFP_REPEAT Xorg-1264 [002] ...1 18155.953775:
1240kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168
1241bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY Xorg-1264
1242[002] ...1 18155.953777: kmalloc: call_site=ffffffff814192a3
1243ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32
1244gfp_flags=GFP_KERNEL|GFP_ZERO Xorg-1264 [002] ...1 18155.953783:
1245kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64
1246bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO <idle>-0 [000] ..s3
124718156.176053: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800
1248bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC dropbear-1465 [000]
1249...1 18156.176257: kmalloc: call_site=ffffffff816650d4
1250ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048
1251gfp_flags=GFP_KERNEL <idle>-0 [000] ..s3 18156.177717: kmalloc:
1252call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512
1253bytes_alloc=512 gfp_flags=GFP_ATOMIC <idle>-0 [000] ..s3 18156.399229:
1254kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512
1255bytes_alloc=512 gfp_flags=GFP_ATOMIC dropbear-1465 [000] ...1
125618156.399434: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000
1257bytes_http://rostedt.homelinux.com/kernelshark/req=2048 bytes_alloc=2048
1258gfp_flags=GFP_KERNEL <idle>-0 [000] ..s3 18156.400660: kmalloc:
1259call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512
1260bytes_alloc=512 gfp_flags=GFP_ATOMIC matchbox-termin-1361 [001] ...1
126118156.552800: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db34800
1262bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT To again
1263disable the kmalloc event, we need to send 0 to the enable file:
1264root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 0 >
1265enable You can enable any number of events or complete subsystems (by
1266using the 'enable' file in the subsystem directory) and get an
1267arbitrarily fine-grained idea of what's going on in the system by
1268enabling as many of the appropriate tracepoints as applicable.
1269
1270A number of the tools described in this HOWTO do just that, including
1271trace-cmd and kernelshark in the next section.
1272
1273.. container:: informalexample
1274
1275 Tying it Together:
1276 These tracepoints and their representation are used not only by
1277 ftrace, but by many of the other tools covered in this document and
1278 they form a central point of integration for the various tracers
1279 available in Linux. They form a central part of the instrumentation
1280 for the following tools: perf, lttng, ftrace, blktrace and SystemTap
1281
1282.. container:: informalexample
1283
1284 Tying it Together:
1285 Eventually all the special-purpose tracers currently available in
1286 /sys/kernel/debug/tracing will be removed and replaced with
1287 equivalent tracers based on the 'trace events' subsystem.
1288
1289.. _trace-cmd-kernelshark:
1290
1291trace-cmd/kernelshark
1292---------------------
1293
1294trace-cmd is essentially an extensive command-line 'wrapper' interface
1295that hides the details of all the individual files in
1296/sys/kernel/debug/tracing, allowing users to specify specific particular
1297events within the /sys/kernel/debug/tracing/events/ subdirectory and to
1298collect traces and avoid having to deal with those details directly.
1299
1300As yet another layer on top of that, kernelshark provides a GUI that
1301allows users to start and stop traces and specify sets of events using
1302an intuitive interface, and view the output as both trace events and as
1303a per-CPU graphical display. It directly uses 'trace-cmd' as the
1304plumbing that accomplishes all that underneath the covers (and actually
1305displays the trace-cmd command it uses, as we'll see).
1306
1307To start a trace using kernelshark, first start kernelshark:
1308root@sugarbay:~# kernelshark Then bring up the 'Capture' dialog by
1309choosing from the kernelshark menu: Capture \| Record That will display
1310the following dialog, which allows you to choose one or more events (or
1311even one or more complete subsystems) to trace:
1312
1313Note that these are exactly the same sets of events described in the
1314previous trace events subsystem section, and in fact is where trace-cmd
1315gets them for kernelshark.
1316
1317In the above screenshot, we've decided to explore the graphics subsystem
1318a bit and so have chosen to trace all the tracepoints contained within
1319the 'i915' and 'drm' subsystems.
1320
1321After doing that, we can start and stop the trace using the 'Run' and
1322'Stop' button on the lower right corner of the dialog (the same button
1323will turn into the 'Stop' button after the trace has started):
1324
1325Notice that the right-hand pane shows the exact trace-cmd command-line
1326that's used to run the trace, along with the results of the trace-cmd
1327run.
1328
1329Once the 'Stop' button is pressed, the graphical view magically fills up
1330with a colorful per-cpu display of the trace data, along with the
1331detailed event listing below that:
1332
1333Here's another example, this time a display resulting from tracing 'all
1334events':
1335
1336The tool is pretty self-explanatory, but for more detailed information
1337on navigating through the data, see the `kernelshark
1338website <http://rostedt.homelinux.com/kernelshark/>`__.
1339
1340.. _ftrace-documentation:
1341
1342Documentation
1343-------------
1344
1345The documentation for ftrace can be found in the kernel Documentation
1346directory: Documentation/trace/ftrace.txt The documentation for the
1347trace event subsystem can also be found in the kernel Documentation
1348directory: Documentation/trace/events.txt There is a nice series of
1349articles on using ftrace and trace-cmd at LWN:
1350
1351- `Debugging the kernel using Ftrace - part
1352 1 <http://lwn.net/Articles/365835/>`__
1353
1354- `Debugging the kernel using Ftrace - part
1355 2 <http://lwn.net/Articles/366796/>`__
1356
1357- `Secrets of the Ftrace function
1358 tracer <http://lwn.net/Articles/370423/>`__
1359
1360- `trace-cmd: A front-end for
1361 Ftrace <https://lwn.net/Articles/410200/>`__
1362
1363There's more detailed documentation kernelshark usage here:
1364`KernelShark <http://rostedt.homelinux.com/kernelshark/>`__
1365
1366An amusing yet useful README (a tracing mini-HOWTO) can be found in
1367/sys/kernel/debug/tracing/README.
1368
1369.. _profile-manual-systemtap:
1370
1371systemtap
1372=========
1373
1374SystemTap is a system-wide script-based tracing and profiling tool.
1375
1376SystemTap scripts are C-like programs that are executed in the kernel to
1377gather/print/aggregate data extracted from the context they end up being
1378invoked under.
1379
1380For example, this probe from the `SystemTap
1381tutorial <http://sourceware.org/systemtap/tutorial/>`__ simply prints a
1382line every time any process on the system open()s a file. For each line,
1383it prints the executable name of the program that opened the file, along
1384with its PID, and the name of the file it opened (or tried to open),
1385which it extracts from the open syscall's argstr. probe syscall.open {
1386printf ("%s(%d) open (%s)\n", execname(), pid(), argstr) } probe
1387timer.ms(4000) # after 4 seconds { exit () } Normally, to execute this
1388probe, you'd simply install systemtap on the system you want to probe,
1389and directly run the probe on that system e.g. assuming the name of the
1390file containing the above text is trace_open.stp: # stap trace_open.stp
1391What systemtap does under the covers to run this probe is 1) parse and
1392convert the probe to an equivalent 'C' form, 2) compile the 'C' form
1393into a kernel module, 3) insert the module into the kernel, which arms
1394it, and 4) collect the data generated by the probe and display it to the
1395user.
1396
1397In order to accomplish steps 1 and 2, the 'stap' program needs access to
1398the kernel build system that produced the kernel that the probed system
1399is running. In the case of a typical embedded system (the 'target'), the
1400kernel build system unfortunately isn't typically part of the image
1401running on the target. It is normally available on the 'host' system
1402that produced the target image however; in such cases, steps 1 and 2 are
1403executed on the host system, and steps 3 and 4 are executed on the
1404target system, using only the systemtap 'runtime'.
1405
1406The systemtap support in Yocto assumes that only steps 3 and 4 are run
1407on the target; it is possible to do everything on the target, but this
1408section assumes only the typical embedded use-case.
1409
1410So basically what you need to do in order to run a systemtap script on
1411the target is to 1) on the host system, compile the probe into a kernel
1412module that makes sense to the target, 2) copy the module onto the
1413target system and 3) insert the module into the target kernel, which
1414arms it, and 4) collect the data generated by the probe and display it
1415to the user.
1416
1417.. _systemtap-setup:
1418
1419Setup
1420-----
1421
1422Those are a lot of steps and a lot of details, but fortunately Yocto
1423includes a script called 'crosstap' that will take care of those
1424details, allowing you to simply execute a systemtap script on the remote
1425target, with arguments if necessary.
1426
1427In order to do this from a remote host, however, you need to have access
1428to the build for the image you booted. The 'crosstap' script provides
1429details on how to do this if you run the script on the host without
1430having done a build:
1431
1432.. note::
1433
1434 SystemTap, which uses 'crosstap', assumes you can establish an ssh
1435 connection to the remote target. Please refer to the crosstap wiki
1436 page for details on verifying ssh connections at
1437 . Also, the ability to ssh into the target system is not enabled by
1438 default in \*-minimal images.
1439
1440$ crosstap root@192.168.1.88 trace_open.stp Error: No target kernel
1441build found. Did you forget to create a local build of your image?
1442'crosstap' requires a local sdk build of the target system (or a build
1443that includes 'tools-profile') in order to build kernel modules that can
1444probe the target system. Practically speaking, that means you need to do
1445the following: - If you're running a pre-built image, download the
1446release and/or BSP tarballs used to build the image. - If you're working
1447from git sources, just clone the metadata and BSP layers needed to build
1448the image you'll be booting. - Make sure you're properly set up to build
1449a new image (see the BSP README and/or the widely available basic
1450documentation that discusses how to build images). - Build an -sdk
1451version of the image e.g.: $ bitbake core-image-sato-sdk OR - Build a
1452non-sdk image but include the profiling tools: [ edit local.conf and add
1453'tools-profile' to the end of the EXTRA_IMAGE_FEATURES variable ] $
1454bitbake core-image-sato Once you've build the image on the host system,
1455you're ready to boot it (or the equivalent pre-built image) and use
1456'crosstap' to probe it (you need to source the environment as usual
1457first): $ source oe-init-build-env $ cd ~/my/systemtap/scripts $
1458crosstap root@192.168.1.xxx myscript.stp So essentially what you need to
1459do is build an SDK image or image with 'tools-profile' as detailed in
1460the "`General Setup <#profile-manual-general-setup>`__" section of this
1461manual, and boot the resulting target image.
1462
1463.. note::
1464
1465 If you have a build directory containing multiple machines, you need
1466 to have the MACHINE you're connecting to selected in local.conf, and
1467 the kernel in that machine's build directory must match the kernel on
1468 the booted system exactly, or you'll get the above 'crosstap' message
1469 when you try to invoke a script.
1470
1471Running a Script on a Target
1472----------------------------
1473
1474Once you've done that, you should be able to run a systemtap script on
1475the target: $ cd /path/to/yocto $ source oe-init-build-env ### Shell
1476environment set up for builds. ### You can now run 'bitbake <target>'
1477Common targets are: core-image-minimal core-image-sato meta-toolchain
1478meta-ide-support You can also run generated qemu images with a command
1479like 'runqemu qemux86-64' Once you've done that, you can cd to whatever
1480directory contains your scripts and use 'crosstap' to run the script: $
1481cd /path/to/my/systemap/script $ crosstap root@192.168.7.2
1482trace_open.stp If you get an error connecting to the target e.g.: $
1483crosstap root@192.168.7.2 trace_open.stp error establishing ssh
1484connection on remote 'root@192.168.7.2' Try ssh'ing to the target and
1485see what happens: $ ssh root@192.168.7.2 A lot of the time, connection
1486problems are due specifying a wrong IP address or having a 'host key
1487verification error'.
1488
1489If everything worked as planned, you should see something like this
1490(enter the password when prompted, or press enter if it's set up to use
1491no password): $ crosstap root@192.168.7.2 trace_open.stp
1492root@192.168.7.2's password: matchbox-termin(1036) open
1493("/tmp/vte3FS2LW", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600)
1494matchbox-termin(1036) open ("/tmp/vteJMC7LW",
1495O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600)
1496
1497.. _systemtap-documentation:
1498
1499Documentation
1500-------------
1501
1502The SystemTap language reference can be found here: `SystemTap Language
1503Reference <http://sourceware.org/systemtap/langref/>`__
1504
1505Links to other SystemTap documents, tutorials, and examples can be found
1506here: `SystemTap documentation
1507page <http://sourceware.org/systemtap/documentation.html>`__
1508
1509.. _profile-manual-sysprof:
1510
1511Sysprof
1512=======
1513
1514Sysprof is a very easy to use system-wide profiler that consists of a
1515single window with three panes and a few buttons which allow you to
1516start, stop, and view the profile from one place.
1517
1518.. _sysprof-setup:
1519
1520Setup
1521-----
1522
1523For this section, we'll assume you've already performed the basic setup
1524outlined in the General Setup section.
1525
1526Sysprof is a GUI-based application that runs on the target system. For
1527the rest of this document we assume you've ssh'ed to the host and will
1528be running Sysprof on the target (you can use the '-X' option to ssh and
1529have the Sysprof GUI run on the target but display remotely on the host
1530if you want).
1531
1532.. _sysprof-basic-usage:
1533
1534Basic Usage
1535-----------
1536
1537To start profiling the system, you simply press the 'Start' button. To
1538stop profiling and to start viewing the profile data in one easy step,
1539press the 'Profile' button.
1540
1541Once you've pressed the profile button, the three panes will fill up
1542with profiling data:
1543
1544The left pane shows a list of functions and processes. Selecting one of
1545those expands that function in the right pane, showing all its callees.
1546Note that this caller-oriented display is essentially the inverse of
1547perf's default callee-oriented callchain display.
1548
1549In the screenshot above, we're focusing on \__copy_to_user_ll() and
1550looking up the callchain we can see that one of the callers of
1551\__copy_to_user_ll is sys_read() and the complete callpath between them.
1552Notice that this is essentially a portion of the same information we saw
1553in the perf display shown in the perf section of this page.
1554
1555Similarly, the above is a snapshot of the Sysprof display of a
1556copy-from-user callchain.
1557
1558Finally, looking at the third Sysprof pane in the lower left, we can see
1559a list of all the callers of a particular function selected in the top
1560left pane. In this case, the lower pane is showing all the callers of
1561\__mark_inode_dirty:
1562
1563Double-clicking on one of those functions will in turn change the focus
1564to the selected function, and so on.
1565
1566.. container:: informalexample
1567
1568 Tying it Together:
1569 If you like sysprof's 'caller-oriented' display, you may be able to
1570 approximate it in other tools as well. For example, 'perf report' has
1571 the -g (--call-graph) option that you can experiment with; one of the
1572 options is 'caller' for an inverted caller-based callgraph display.
1573
1574.. _sysprof-documentation:
1575
1576Documentation
1577-------------
1578
1579There doesn't seem to be any documentation for Sysprof, but maybe that's
1580because it's pretty self-explanatory. The Sysprof website, however, is
1581here: `Sysprof, System-wide Performance Profiler for
1582Linux <http://sysprof.com/>`__
1583
1584LTTng (Linux Trace Toolkit, next generation)
1585============================================
1586
1587.. _lttng-setup:
1588
1589Setup
1590-----
1591
1592For this section, we'll assume you've already performed the basic setup
1593outlined in the General Setup section. LTTng is run on the target system
1594by ssh'ing to it.
1595
1596Collecting and Viewing Traces
1597-----------------------------
1598
1599Once you've applied the above commits and built and booted your image
1600(you need to build the core-image-sato-sdk image or use one of the other
1601methods described in the General Setup section), you're ready to start
1602tracing.
1603
1604Collecting and viewing a trace on the target (inside a shell)
1605~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1606
1607First, from the host, ssh to the target: $ ssh -l root 192.168.1.47 The
1608authenticity of host '192.168.1.47 (192.168.1.47)' can't be established.
1609RSA key fingerprint is 23:bd:c8:b1:a8:71:52:00:ee:00:4f:64:9e:10:b9:7e.
1610Are you sure you want to continue connecting (yes/no)? yes Warning:
1611Permanently added '192.168.1.47' (RSA) to the list of known hosts.
1612root@192.168.1.47's password: Once on the target, use these steps to
1613create a trace: root@crownbay:~# lttng create Spawning a session daemon
1614Session auto-20121015-232120 created. Traces will be written in
1615/home/root/lttng-traces/auto-20121015-232120 Enable the events you want
1616to trace (in this case all kernel events): root@crownbay:~# lttng
1617enable-event --kernel --all All kernel events are enabled in channel
1618channel0 Start the trace: root@crownbay:~# lttng start Tracing started
1619for session auto-20121015-232120 And then stop the trace after awhile or
1620after running a particular workload that you want to trace:
1621root@crownbay:~# lttng stop Tracing stopped for session
1622auto-20121015-232120 You can now view the trace in text form on the
1623target: root@crownbay:~# lttng view [23:21:56.989270399] (+?.?????????)
1624sys_geteuid: { 1 }, { } [23:21:56.989278081] (+0.000007682)
1625exit_syscall: { 1 }, { ret = 0 } [23:21:56.989286043] (+0.000007962)
1626sys_pipe: { 1 }, { fildes = 0xB77B9E8C } [23:21:56.989321802]
1627(+0.000035759) exit_syscall: { 1 }, { ret = 0 } [23:21:56.989329345]
1628(+0.000007543) sys_mmap_pgoff: { 1 }, { addr = 0x0, len = 10485760, prot
1629= 3, flags = 131362, fd = 4294967295, pgoff = 0 } [23:21:56.989351694]
1630(+0.000022349) exit_syscall: { 1 }, { ret = -1247805440 }
1631[23:21:56.989432989] (+0.000081295) sys_clone: { 1 }, { clone_flags =
16320x411, newsp = 0xB5EFFFE4, parent_tid = 0xFFFFFFFF, child_tid = 0x0 }
1633[23:21:56.989477129] (+0.000044140) sched_stat_runtime: { 1 }, { comm =
1634"lttng-consumerd", tid = 1193, runtime = 681660, vruntime = 43367983388
1635} [23:21:56.989486697] (+0.000009568) sched_migrate_task: { 1 }, { comm
1636= "lttng-consumerd", tid = 1193, prio = 20, orig_cpu = 1, dest_cpu = 1 }
1637[23:21:56.989508418] (+0.000021721) hrtimer_init: { 1 }, { hrtimer =
16383970832076, clockid = 1, mode = 1 } [23:21:56.989770462] (+0.000262044)
1639hrtimer_cancel: { 1 }, { hrtimer = 3993865440 } [23:21:56.989771580]
1640(+0.000001118) hrtimer_cancel: { 0 }, { hrtimer = 3993812192 }
1641[23:21:56.989776957] (+0.000005377) hrtimer_expire_entry: { 1 }, {
1642hrtimer = 3993865440, now = 79815980007057, function = 3238465232 }
1643[23:21:56.989778145] (+0.000001188) hrtimer_expire_entry: { 0 }, {
1644hrtimer = 3993812192, now = 79815980008174, function = 3238465232 }
1645[23:21:56.989791695] (+0.000013550) softirq_raise: { 1 }, { vec = 1 }
1646[23:21:56.989795396] (+0.000003701) softirq_raise: { 0 }, { vec = 1 }
1647[23:21:56.989800635] (+0.000005239) softirq_raise: { 0 }, { vec = 9 }
1648[23:21:56.989807130] (+0.000006495) sched_stat_runtime: { 1 }, { comm =
1649"lttng-consumerd", tid = 1193, runtime = 330710, vruntime = 43368314098
1650} [23:21:56.989809993] (+0.000002863) sched_stat_runtime: { 0 }, { comm
1651= "lttng-sessiond", tid = 1181, runtime = 1015313, vruntime =
165236976733240 } [23:21:56.989818514] (+0.000008521) hrtimer_expire_exit: {
16530 }, { hrtimer = 3993812192 } [23:21:56.989819631] (+0.000001117)
1654hrtimer_expire_exit: { 1 }, { hrtimer = 3993865440 }
1655[23:21:56.989821866] (+0.000002235) hrtimer_start: { 0 }, { hrtimer =
16563993812192, function = 3238465232, expires = 79815981000000, softexpires
1657= 79815981000000 } [23:21:56.989822984] (+0.000001118) hrtimer_start: {
16581 }, { hrtimer = 3993865440, function = 3238465232, expires =
165979815981000000, softexpires = 79815981000000 } [23:21:56.989832762]
1660(+0.000009778) softirq_entry: { 1 }, { vec = 1 } [23:21:56.989833879]
1661(+0.000001117) softirq_entry: { 0 }, { vec = 1 } [23:21:56.989838069]
1662(+0.000004190) timer_cancel: { 1 }, { timer = 3993871956 }
1663[23:21:56.989839187] (+0.000001118) timer_cancel: { 0 }, { timer =
16643993818708 } [23:21:56.989841492] (+0.000002305) timer_expire_entry: { 1
1665}, { timer = 3993871956, now = 79515980, function = 3238277552 }
1666[23:21:56.989842819] (+0.000001327) timer_expire_entry: { 0 }, { timer =
16673993818708, now = 79515980, function = 3238277552 } [23:21:56.989854831]
1668(+0.000012012) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd",
1669tid = 1193, runtime = 49237, vruntime = 43368363335 }
1670[23:21:56.989855949] (+0.000001118) sched_stat_runtime: { 0 }, { comm =
1671"lttng-sessiond", tid = 1181, runtime = 45121, vruntime = 36976778361 }
1672[23:21:56.989861257] (+0.000005308) sched_stat_sleep: { 1 }, { comm =
1673"kworker/1:1", tid = 21, delay = 9451318 } [23:21:56.989862374]
1674(+0.000001117) sched_stat_sleep: { 0 }, { comm = "kworker/0:0", tid = 4,
1675delay = 9958820 } [23:21:56.989868241] (+0.000005867) sched_wakeup: { 0
1676}, { comm = "kworker/0:0", tid = 4, prio = 120, success = 1, target_cpu
1677= 0 } [23:21:56.989869358] (+0.000001117) sched_wakeup: { 1 }, { comm =
1678"kworker/1:1", tid = 21, prio = 120, success = 1, target_cpu = 1 }
1679[23:21:56.989877460] (+0.000008102) timer_expire_exit: { 1 }, { timer =
16803993871956 } [23:21:56.989878577] (+0.000001117) timer_expire_exit: { 0
1681}, { timer = 3993818708 } . . . You can now safely destroy the trace
1682session (note that this doesn't delete the trace - it's still there in
1683~/lttng-traces): root@crownbay:~# lttng destroy Session
1684auto-20121015-232120 destroyed at /home/root Note that the trace is
1685saved in a directory of the same name as returned by 'lttng create',
1686under the ~/lttng-traces directory (note that you can change this by
1687supplying your own name to 'lttng create'): root@crownbay:~# ls -al
1688~/lttng-traces drwxrwx--- 3 root root 1024 Oct 15 23:21 . drwxr-xr-x 5
1689root root 1024 Oct 15 23:57 .. drwxrwx--- 3 root root 1024 Oct 15 23:21
1690auto-20121015-232120
1691
1692Collecting and viewing a userspace trace on the target (inside a shell)
1693~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1694
1695For LTTng userspace tracing, you need to have a properly instrumented
1696userspace program. For this example, we'll use the 'hello' test program
1697generated by the lttng-ust build.
1698
1699The 'hello' test program isn't installed on the rootfs by the lttng-ust
1700build, so we need to copy it over manually. First cd into the build
1701directory that contains the hello executable: $ cd
1702build/tmp/work/core2_32-poky-linux/lttng-ust/2.0.5-r0/git/tests/hello/.libs
1703Copy that over to the target machine: $ scp hello root@192.168.1.20: You
1704now have the instrumented lttng 'hello world' test program on the
1705target, ready to test.
1706
1707First, from the host, ssh to the target: $ ssh -l root 192.168.1.47 The
1708authenticity of host '192.168.1.47 (192.168.1.47)' can't be established.
1709RSA key fingerprint is 23:bd:c8:b1:a8:71:52:00:ee:00:4f:64:9e:10:b9:7e.
1710Are you sure you want to continue connecting (yes/no)? yes Warning:
1711Permanently added '192.168.1.47' (RSA) to the list of known hosts.
1712root@192.168.1.47's password: Once on the target, use these steps to
1713create a trace: root@crownbay:~# lttng create Session
1714auto-20190303-021943 created. Traces will be written in
1715/home/root/lttng-traces/auto-20190303-021943 Enable the events you want
1716to trace (in this case all userspace events): root@crownbay:~# lttng
1717enable-event --userspace --all All UST events are enabled in channel
1718channel0 Start the trace: root@crownbay:~# lttng start Tracing started
1719for session auto-20190303-021943 Run the instrumented hello world
1720program: root@crownbay:~# ./hello Hello, World! Tracing... done. And
1721then stop the trace after awhile or after running a particular workload
1722that you want to trace: root@crownbay:~# lttng stop Tracing stopped for
1723session auto-20190303-021943 You can now view the trace in text form on
1724the target: root@crownbay:~# lttng view [02:31:14.906146544]
1725(+?.?????????) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, {
1726intfield = 0, intfield2 = 0x0, longfield = 0, netintfield = 0,
1727netintfieldhex = 0x0, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ],
1728arrfield2 = "test", \_seqfield1_length = 4, seqfield1 = [ [0] = 116, [1]
1729= 101, [2] = 115, [3] = 116 ], \_seqfield2_length = 4, seqfield2 =
1730"test", stringfield = "test", floatfield = 2222, doublefield = 2,
1731boolfield = 1 } [02:31:14.906170360] (+0.000023816) hello:1424
1732ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 1, intfield2 = 0x1,
1733longfield = 1, netintfield = 1, netintfieldhex = 0x1, arrfield1 = [ [0]
1734= 1, [1] = 2, [2] = 3 ], arrfield2 = "test", \_seqfield1_length = 4,
1735seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ],
1736\_seqfield2_length = 4, seqfield2 = "test", stringfield = "test",
1737floatfield = 2222, doublefield = 2, boolfield = 1 } [02:31:14.906183140]
1738(+0.000012780) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, {
1739intfield = 2, intfield2 = 0x2, longfield = 2, netintfield = 2,
1740netintfieldhex = 0x2, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ],
1741arrfield2 = "test", \_seqfield1_length = 4, seqfield1 = [ [0] = 116, [1]
1742= 101, [2] = 115, [3] = 116 ], \_seqfield2_length = 4, seqfield2 =
1743"test", stringfield = "test", floatfield = 2222, doublefield = 2,
1744boolfield = 1 } [02:31:14.906194385] (+0.000011245) hello:1424
1745ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 3, intfield2 = 0x3,
1746longfield = 3, netintfield = 3, netintfieldhex = 0x3, arrfield1 = [ [0]
1747= 1, [1] = 2, [2] = 3 ], arrfield2 = "test", \_seqfield1_length = 4,
1748seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ],
1749\_seqfield2_length = 4, seqfield2 = "test", stringfield = "test",
1750floatfield = 2222, doublefield = 2, boolfield = 1 } . . . You can now
1751safely destroy the trace session (note that this doesn't delete the
1752trace - it's still there in ~/lttng-traces): root@crownbay:~# lttng
1753destroy Session auto-20190303-021943 destroyed at /home/root
1754
1755.. _lltng-documentation:
1756
1757Documentation
1758-------------
1759
1760You can find the primary LTTng Documentation on the `LTTng
1761Documentation <https://lttng.org/docs/>`__ site. The documentation on
1762this site is appropriate for intermediate to advanced software
1763developers who are working in a Linux environment and are interested in
1764efficient software tracing.
1765
1766For information on LTTng in general, visit the `LTTng
1767Project <http://lttng.org/lttng2.0>`__ site. You can find a "Getting
1768Started" link on this site that takes you to an LTTng Quick Start.
1769
1770.. _profile-manual-blktrace:
1771
1772blktrace
1773========
1774
1775blktrace is a tool for tracing and reporting low-level disk I/O.
1776blktrace provides the tracing half of the equation; its output can be
1777piped into the blkparse program, which renders the data in a
1778human-readable form and does some basic analysis:
1779
1780.. _blktrace-setup:
1781
1782Setup
1783-----
1784
1785For this section, we'll assume you've already performed the basic setup
1786outlined in the "`General Setup <#profile-manual-general-setup>`__"
1787section.
1788
1789blktrace is an application that runs on the target system. You can run
1790the entire blktrace and blkparse pipeline on the target, or you can run
1791blktrace in 'listen' mode on the target and have blktrace and blkparse
1792collect and analyze the data on the host (see the "`Using blktrace
1793Remotely <#using-blktrace-remotely>`__" section below). For the rest of
1794this section we assume you've ssh'ed to the host and will be running
1795blkrace on the target.
1796
1797.. _blktrace-basic-usage:
1798
1799Basic Usage
1800-----------
1801
1802To record a trace, simply run the 'blktrace' command, giving it the name
1803of the block device you want to trace activity on: root@crownbay:~#
1804blktrace /dev/sdc In another shell, execute a workload you want to
1805trace. root@crownbay:/media/sdc# rm linux-2.6.19.2.tar.bz2; wget
1806http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2;
1807sync Connecting to downloads.yoctoproject.org (140.211.169.59:80)
1808linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k
18090:00:00 ETA Press Ctrl-C in the blktrace shell to stop the trace. It
1810will display how many events were logged, along with the per-cpu file
1811sizes (blktrace records traces in per-cpu kernel buffers and simply
1812dumps them to userspace for blkparse to merge and sort later). ^C=== sdc
1813=== CPU 0: 7082 events, 332 KiB data CPU 1: 1578 events, 74 KiB data
1814Total: 8660 events (dropped 0), 406 KiB data If you examine the files
1815saved to disk, you see multiple files, one per CPU and with the device
1816name as the first part of the filename: root@crownbay:~# ls -al
1817drwxr-xr-x 6 root root 1024 Oct 27 22:39 . drwxr-sr-x 4 root root 1024
1818Oct 26 18:24 .. -rw-r--r-- 1 root root 339938 Oct 27 22:40
1819sdc.blktrace.0 -rw-r--r-- 1 root root 75753 Oct 27 22:40 sdc.blktrace.1
1820To view the trace events, simply invoke 'blkparse' in the directory
1821containing the trace files, giving it the device name that forms the
1822first part of the filenames: root@crownbay:~# blkparse sdc 8,32 1 1
18230.000000000 1225 Q WS 3417048 + 8 [jbd2/sdc-8] 8,32 1 2 0.000025213 1225
1824G WS 3417048 + 8 [jbd2/sdc-8] 8,32 1 3 0.000033384 1225 P N [jbd2/sdc-8]
18258,32 1 4 0.000043301 1225 I WS 3417048 + 8 [jbd2/sdc-8] 8,32 1 0
18260.000057270 0 m N cfq1225 insert_request 8,32 1 0 0.000064813 0 m N
1827cfq1225 add_to_rr 8,32 1 5 0.000076336 1225 U N [jbd2/sdc-8] 1 8,32 1 0
18280.000088559 0 m N cfq workload slice:150 8,32 1 0 0.000097359 0 m N
1829cfq1225 set_active wl_prio:0 wl_type:1 8,32 1 0 0.000104063 0 m N
1830cfq1225 Not idling. st->count:1 8,32 1 0 0.000112584 0 m N cfq1225 fifo=
1831(null) 8,32 1 0 0.000118730 0 m N cfq1225 dispatch_insert 8,32 1 0
18320.000127390 0 m N cfq1225 dispatched a request 8,32 1 0 0.000133536 0 m
1833N cfq1225 activate rq, drv=1 8,32 1 6 0.000136889 1225 D WS 3417048 + 8
1834[jbd2/sdc-8] 8,32 1 7 0.000360381 1225 Q WS 3417056 + 8 [jbd2/sdc-8]
18358,32 1 8 0.000377422 1225 G WS 3417056 + 8 [jbd2/sdc-8] 8,32 1 9
18360.000388876 1225 P N [jbd2/sdc-8] 8,32 1 10 0.000397886 1225 Q WS
18373417064 + 8 [jbd2/sdc-8] 8,32 1 11 0.000404800 1225 M WS 3417064 + 8
1838[jbd2/sdc-8] 8,32 1 12 0.000412343 1225 Q WS 3417072 + 8 [jbd2/sdc-8]
18398,32 1 13 0.000416533 1225 M WS 3417072 + 8 [jbd2/sdc-8] 8,32 1 14
18400.000422121 1225 Q WS 3417080 + 8 [jbd2/sdc-8] 8,32 1 15 0.000425194
18411225 M WS 3417080 + 8 [jbd2/sdc-8] 8,32 1 16 0.000431968 1225 Q WS
18423417088 + 8 [jbd2/sdc-8] 8,32 1 17 0.000435251 1225 M WS 3417088 + 8
1843[jbd2/sdc-8] 8,32 1 18 0.000440279 1225 Q WS 3417096 + 8 [jbd2/sdc-8]
18448,32 1 19 0.000443911 1225 M WS 3417096 + 8 [jbd2/sdc-8] 8,32 1 20
18450.000450336 1225 Q WS 3417104 + 8 [jbd2/sdc-8] 8,32 1 21 0.000454038
18461225 M WS 3417104 + 8 [jbd2/sdc-8] 8,32 1 22 0.000462070 1225 Q WS
18473417112 + 8 [jbd2/sdc-8] 8,32 1 23 0.000465422 1225 M WS 3417112 + 8
1848[jbd2/sdc-8] 8,32 1 24 0.000474222 1225 I WS 3417056 + 64 [jbd2/sdc-8]
18498,32 1 0 0.000483022 0 m N cfq1225 insert_request 8,32 1 25 0.000489727
18501225 U N [jbd2/sdc-8] 1 8,32 1 0 0.000498457 0 m N cfq1225 Not idling.
1851st->count:1 8,32 1 0 0.000503765 0 m N cfq1225 dispatch_insert 8,32 1 0
18520.000512914 0 m N cfq1225 dispatched a request 8,32 1 0 0.000518851 0 m
1853N cfq1225 activate rq, drv=2 . . . 8,32 0 0 58.515006138 0 m N cfq3551
1854complete rqnoidle 1 8,32 0 2024 58.516603269 3 C WS 3156992 + 16 [0]
18558,32 0 0 58.516626736 0 m N cfq3551 complete rqnoidle 1 8,32 0 0
185658.516634558 0 m N cfq3551 arm_idle: 8 group_idle: 0 8,32 0 0
185758.516636933 0 m N cfq schedule dispatch 8,32 1 0 58.516971613 0 m N
1858cfq3551 slice expired t=0 8,32 1 0 58.516982089 0 m N cfq3551 sl_used=13
1859disp=6 charge=13 iops=0 sect=80 8,32 1 0 58.516985511 0 m N cfq3551
1860del_from_rr 8,32 1 0 58.516990819 0 m N cfq3551 put_queue CPU0 (sdc):
1861Reads Queued: 0, 0KiB Writes Queued: 331, 26,284KiB Read Dispatches: 0,
18620KiB Write Dispatches: 485, 40,484KiB Reads Requeued: 0 Writes Requeued:
18630 Reads Completed: 0, 0KiB Writes Completed: 511, 41,000KiB Read Merges:
18640, 0KiB Write Merges: 13, 160KiB Read depth: 0 Write depth: 2 IO
1865unplugs: 23 Timer unplugs: 0 CPU1 (sdc): Reads Queued: 0, 0KiB Writes
1866Queued: 249, 15,800KiB Read Dispatches: 0, 0KiB Write Dispatches: 42,
18671,600KiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 0, 0KiB
1868Writes Completed: 16, 1,084KiB Read Merges: 0, 0KiB Write Merges: 40,
1869276KiB Read depth: 0 Write depth: 2 IO unplugs: 30 Timer unplugs: 1
1870Total (sdc): Reads Queued: 0, 0KiB Writes Queued: 580, 42,084KiB Read
1871Dispatches: 0, 0KiB Write Dispatches: 527, 42,084KiB Reads Requeued: 0
1872Writes Requeued: 0 Reads Completed: 0, 0KiB Writes Completed: 527,
187342,084KiB Read Merges: 0, 0KiB Write Merges: 53, 436KiB IO unplugs: 53
1874Timer unplugs: 1 Throughput (R/W): 0KiB/s / 719KiB/s Events (sdc): 6,592
1875entries Skips: 0 forward (0 - 0.0%) Input file sdc.blktrace.0 added
1876Input file sdc.blktrace.1 added The report shows each event that was
1877found in the blktrace data, along with a summary of the overall block
1878I/O traffic during the run. You can look at the
1879`blkparse <http://linux.die.net/man/1/blkparse>`__ manpage to learn the
1880meaning of each field displayed in the trace listing.
1881
1882.. _blktrace-live-mode:
1883
1884Live Mode
1885~~~~~~~~~
1886
1887blktrace and blkparse are designed from the ground up to be able to
1888operate together in a 'pipe mode' where the stdout of blktrace can be
1889fed directly into the stdin of blkparse: root@crownbay:~# blktrace
1890/dev/sdc -o - \| blkparse -i - This enables long-lived tracing sessions
1891to run without writing anything to disk, and allows the user to look for
1892certain conditions in the trace data in 'real-time' by viewing the trace
1893output as it scrolls by on the screen or by passing it along to yet
1894another program in the pipeline such as grep which can be used to
1895identify and capture conditions of interest.
1896
1897There's actually another blktrace command that implements the above
1898pipeline as a single command, so the user doesn't have to bother typing
1899in the above command sequence: root@crownbay:~# btrace /dev/sdc
1900
1901Using blktrace Remotely
1902~~~~~~~~~~~~~~~~~~~~~~~
1903
1904Because blktrace traces block I/O and at the same time normally writes
1905its trace data to a block device, and in general because it's not really
1906a great idea to make the device being traced the same as the device the
1907tracer writes to, blktrace provides a way to trace without perturbing
1908the traced device at all by providing native support for sending all
1909trace data over the network.
1910
1911To have blktrace operate in this mode, start blktrace on the target
1912system being traced with the -l option, along with the device to trace:
1913root@crownbay:~# blktrace -l /dev/sdc server: waiting for connections...
1914On the host system, use the -h option to connect to the target system,
1915also passing it the device to trace: $ blktrace -d /dev/sdc -h
1916192.168.1.43 blktrace: connecting to 192.168.1.43 blktrace: connected!
1917On the target system, you should see this: server: connection from
1918192.168.1.43 In another shell, execute a workload you want to trace.
1919root@crownbay:/media/sdc# rm linux-2.6.19.2.tar.bz2; wget
1920http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2;
1921sync Connecting to downloads.yoctoproject.org (140.211.169.59:80)
1922linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k
19230:00:00 ETA When it's done, do a Ctrl-C on the host system to stop the
1924trace: ^C=== sdc === CPU 0: 7691 events, 361 KiB data CPU 1: 4109
1925events, 193 KiB data Total: 11800 events (dropped 0), 554 KiB data On
1926the target system, you should also see a trace summary for the trace
1927just ended: server: end of run for 192.168.1.43:sdc === sdc === CPU 0:
19287691 events, 361 KiB data CPU 1: 4109 events, 193 KiB data Total: 11800
1929events (dropped 0), 554 KiB data The blktrace instance on the host will
1930save the target output inside a hostname-timestamp directory: $ ls -al
1931drwxr-xr-x 10 root root 1024 Oct 28 02:40 . drwxr-sr-x 4 root root 1024
1932Oct 26 18:24 .. drwxr-xr-x 2 root root 1024 Oct 28 02:40
1933192.168.1.43-2012-10-28-02:40:56 cd into that directory to see the
1934output files: $ ls -l -rw-r--r-- 1 root root 369193 Oct 28 02:44
1935sdc.blktrace.0 -rw-r--r-- 1 root root 197278 Oct 28 02:44 sdc.blktrace.1
1936And run blkparse on the host system using the device name: $ blkparse
1937sdc 8,32 1 1 0.000000000 1263 Q RM 6016 + 8 [ls] 8,32 1 0 0.000036038 0
1938m N cfq1263 alloced 8,32 1 2 0.000039390 1263 G RM 6016 + 8 [ls] 8,32 1
19393 0.000049168 1263 I RM 6016 + 8 [ls] 8,32 1 0 0.000056152 0 m N cfq1263
1940insert_request 8,32 1 0 0.000061600 0 m N cfq1263 add_to_rr 8,32 1 0
19410.000075498 0 m N cfq workload slice:300 . . . 8,32 0 0 177.266385696 0
1942m N cfq1267 arm_idle: 8 group_idle: 0 8,32 0 0 177.266388140 0 m N cfq
1943schedule dispatch 8,32 1 0 177.266679239 0 m N cfq1267 slice expired t=0
19448,32 1 0 177.266689297 0 m N cfq1267 sl_used=9 disp=6 charge=9 iops=0
1945sect=56 8,32 1 0 177.266692649 0 m N cfq1267 del_from_rr 8,32 1 0
1946177.266696560 0 m N cfq1267 put_queue CPU0 (sdc): Reads Queued: 0, 0KiB
1947Writes Queued: 270, 21,708KiB Read Dispatches: 59, 2,628KiB Write
1948Dispatches: 495, 39,964KiB Reads Requeued: 0 Writes Requeued: 0 Reads
1949Completed: 90, 2,752KiB Writes Completed: 543, 41,596KiB Read Merges: 0,
19500KiB Write Merges: 9, 344KiB Read depth: 2 Write depth: 2 IO unplugs: 20
1951Timer unplugs: 1 CPU1 (sdc): Reads Queued: 688, 2,752KiB Writes Queued:
1952381, 20,652KiB Read Dispatches: 31, 124KiB Write Dispatches: 59,
19532,396KiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 0, 0KiB
1954Writes Completed: 11, 764KiB Read Merges: 598, 2,392KiB Write Merges:
195588, 448KiB Read depth: 2 Write depth: 2 IO unplugs: 52 Timer unplugs: 0
1956Total (sdc): Reads Queued: 688, 2,752KiB Writes Queued: 651, 42,360KiB
1957Read Dispatches: 90, 2,752KiB Write Dispatches: 554, 42,360KiB Reads
1958Requeued: 0 Writes Requeued: 0 Reads Completed: 90, 2,752KiB Writes
1959Completed: 554, 42,360KiB Read Merges: 598, 2,392KiB Write Merges: 97,
1960792KiB IO unplugs: 72 Timer unplugs: 1 Throughput (R/W): 15KiB/s /
1961238KiB/s Events (sdc): 9,301 entries Skips: 0 forward (0 - 0.0%) You
1962should see the trace events and summary just as you would have if you'd
1963run the same command on the target.
1964
1965Tracing Block I/O via 'ftrace'
1966~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1967
1968It's also possible to trace block I/O using only `trace events
1969subsystem <#the-trace-events-subsystem>`__, which can be useful for
1970casual tracing if you don't want to bother dealing with the userspace
1971tools.
1972
1973To enable tracing for a given device, use /sys/block/xxx/trace/enable,
1974where xxx is the device name. This for example enables tracing for
1975/dev/sdc: root@crownbay:/sys/kernel/debug/tracing# echo 1 >
1976/sys/block/sdc/trace/enable Once you've selected the device(s) you want
1977to trace, selecting the 'blk' tracer will turn the blk tracer on:
1978root@crownbay:/sys/kernel/debug/tracing# cat available_tracers blk
1979function_graph function nop root@crownbay:/sys/kernel/debug/tracing#
1980echo blk > current_tracer Execute the workload you're interested in:
1981root@crownbay:/sys/kernel/debug/tracing# cat /media/sdc/testfile.txt And
1982look at the output (note here that we're using 'trace_pipe' instead of
1983trace to capture this trace - this allows us to wait around on the pipe
1984for data to appear): root@crownbay:/sys/kernel/debug/tracing# cat
1985trace_pipe cat-3587 [001] d..1 3023.276361: 8,32 Q R 1699848 + 8 [cat]
1986cat-3587 [001] d..1 3023.276410: 8,32 m N cfq3587 alloced cat-3587 [001]
1987d..1 3023.276415: 8,32 G R 1699848 + 8 [cat] cat-3587 [001] d..1
19883023.276424: 8,32 P N [cat] cat-3587 [001] d..2 3023.276432: 8,32 I R
19891699848 + 8 [cat] cat-3587 [001] d..1 3023.276439: 8,32 m N cfq3587
1990insert_request cat-3587 [001] d..1 3023.276445: 8,32 m N cfq3587
1991add_to_rr cat-3587 [001] d..2 3023.276454: 8,32 U N [cat] 1 cat-3587
1992[001] d..1 3023.276464: 8,32 m N cfq workload slice:150 cat-3587 [001]
1993d..1 3023.276471: 8,32 m N cfq3587 set_active wl_prio:0 wl_type:2
1994cat-3587 [001] d..1 3023.276478: 8,32 m N cfq3587 fifo= (null) cat-3587
1995[001] d..1 3023.276483: 8,32 m N cfq3587 dispatch_insert cat-3587 [001]
1996d..1 3023.276490: 8,32 m N cfq3587 dispatched a request cat-3587 [001]
1997d..1 3023.276497: 8,32 m N cfq3587 activate rq, drv=1 cat-3587 [001]
1998d..2 3023.276500: 8,32 D R 1699848 + 8 [cat] And this turns off tracing
1999for the specified device: root@crownbay:/sys/kernel/debug/tracing# echo
20000 > /sys/block/sdc/trace/enable
2001
2002.. _blktrace-documentation:
2003
2004Documentation
2005-------------
2006
2007Online versions of the man pages for the commands discussed in this
2008section can be found here:
2009
2010- http://linux.die.net/man/8/blktrace
2011
2012- http://linux.die.net/man/1/blkparse
2013
2014- http://linux.die.net/man/8/btrace
2015
2016The above manpages, along with manpages for the other blktrace utilities
2017(btt, blkiomon, etc) can be found in the /doc directory of the blktrace
2018tools git repo: $ git clone git://git.kernel.dk/blktrace.git