diff options
Diffstat (limited to 'documentation/profile-manual')
-rw-r--r-- | documentation/profile-manual/profile-manual-usage.xml | 353 |
1 files changed, 353 insertions, 0 deletions
diff --git a/documentation/profile-manual/profile-manual-usage.xml b/documentation/profile-manual/profile-manual-usage.xml index 71feb418fd..cfe916a795 100644 --- a/documentation/profile-manual/profile-manual-usage.xml +++ b/documentation/profile-manual/profile-manual-usage.xml | |||
@@ -981,10 +981,363 @@ | |||
981 | </para> | 981 | </para> |
982 | </section> | 982 | </section> |
983 | 983 | ||
984 | <section id='system-wide-tracing-and-profiling'> | ||
985 | <title>System-Wide Tracing and Profiling</title> | ||
984 | 986 | ||
987 | <para> | ||
988 | The examples so far have focused on tracing a particular program or | ||
989 | workload - in other words, every profiling run has specified the | ||
990 | program to profile in the command-line e.g. 'perf record wget ...'. | ||
991 | </para> | ||
992 | |||
993 | <para> | ||
994 | It's also possible, and more interesting in many cases, to run a | ||
995 | system-wide profile or trace while running the workload in a | ||
996 | separate shell. | ||
997 | </para> | ||
998 | |||
999 | <para> | ||
1000 | To do system-wide profiling or tracing, you typically use | ||
1001 | the -a flag to 'perf record'. | ||
1002 | </para> | ||
1003 | |||
1004 | <para> | ||
1005 | To demonstrate this, open up one window and start the profile | ||
1006 | using the -a flag (press Ctrl-C to stop tracing): | ||
1007 | <literallayout class='monospaced'> | ||
1008 | root@crownbay:~# perf record -g -a | ||
1009 | ^C[ perf record: Woken up 6 times to write data ] | ||
1010 | [ perf record: Captured and wrote 1.400 MB perf.data (~61172 samples) ] | ||
1011 | </literallayout> | ||
1012 | In another window, run the wget test: | ||
1013 | <literallayout class='monospaced'> | ||
1014 | root@crownbay:~# wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink> | ||
1015 | Connecting to downloads.yoctoproject.org (140.211.169.59:80) | ||
1016 | linux-2.6.19.2.tar.b 100% |*******************************| 41727k 0:00:00 ETA | ||
1017 | </literallayout> | ||
1018 | Here we see entries not only for our wget load, but for other | ||
1019 | processes running on the system as well: | ||
1020 | </para> | ||
1021 | |||
1022 | <para> | ||
1023 | <imagedata fileref="figures/perf-systemwide.png" width="6in" depth="7in" align="center" scalefit="1" /> | ||
1024 | </para> | ||
1025 | |||
1026 | <para> | ||
1027 | In the snapshot above, we can see callchains that originate in | ||
1028 | libc, and a callchain from Xorg that demonstrates that we're | ||
1029 | using a proprietary X driver in userspace (notice the presence | ||
1030 | of 'PVR' and some other unresolvable symbols in the expanded | ||
1031 | Xorg callchain). | ||
1032 | </para> | ||
1033 | |||
1034 | <para> | ||
1035 | Note also that we have both kernel and userspace entries in the | ||
1036 | above snapshot. We can also tell perf to focus on userspace but | ||
1037 | providing a modifier, in this case 'u', to the 'cycles' hardware | ||
1038 | counter when we record a profile: | ||
1039 | <literallayout class='monospaced'> | ||
1040 | root@crownbay:~# perf record -g -a -e cycles:u | ||
1041 | ^C[ perf record: Woken up 2 times to write data ] | ||
1042 | [ perf record: Captured and wrote 0.376 MB perf.data (~16443 samples) ] | ||
1043 | </literallayout> | ||
1044 | </para> | ||
1045 | |||
1046 | <para> | ||
1047 | <imagedata fileref="figures/perf-report-cycles-u.png" width="6in" depth="7in" align="center" scalefit="1" /> | ||
1048 | </para> | ||
1049 | |||
1050 | <para> | ||
1051 | Notice in the screenshot above, we see only userspace entries ([.]) | ||
1052 | </para> | ||
1053 | |||
1054 | <para> | ||
1055 | Finally, we can press 'enter' on a leaf node and select the 'Zoom | ||
1056 | into DSO' menu item to show only entries associated with a | ||
1057 | specific DSO. In the screenshot below, we've zoomed into the | ||
1058 | 'libc' DSO which shows all the entries associated with the | ||
1059 | libc-xxx.so DSO. | ||
1060 | </para> | ||
1061 | |||
1062 | <para> | ||
1063 | <imagedata fileref="figures/perf-systemwide-libc.png" width="6in" depth="7in" align="center" scalefit="1" /> | ||
1064 | </para> | ||
1065 | |||
1066 | <para> | ||
1067 | We can also use the system-wide -a switch to do system-wide | ||
1068 | tracing. Here we'll trace a couple of scheduler events: | ||
1069 | <literallayout class='monospaced'> | ||
1070 | root@crownbay:~# perf record -a -e sched:sched_switch -e sched:sched_wakeup | ||
1071 | ^C[ perf record: Woken up 38 times to write data ] | ||
1072 | [ perf record: Captured and wrote 9.780 MB perf.data (~427299 samples) ] | ||
1073 | </literallayout> | ||
1074 | We can look at the raw output using 'perf script' with no | ||
1075 | arguments: | ||
1076 | <literallayout class='monospaced'> | ||
1077 | root@crownbay:~# perf script | ||
1078 | |||
1079 | perf 1383 [001] 6171.460045: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 | ||
1080 | perf 1383 [001] 6171.460066: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120 | ||
1081 | kworker/1:1 21 [001] 6171.460093: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120 | ||
1082 | swapper 0 [000] 6171.468063: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000 | ||
1083 | swapper 0 [000] 6171.468107: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120 | ||
1084 | kworker/0:3 1209 [000] 6171.468143: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 | ||
1085 | perf 1383 [001] 6171.470039: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 | ||
1086 | perf 1383 [001] 6171.470058: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120 | ||
1087 | kworker/1:1 21 [001] 6171.470082: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120 | ||
1088 | perf 1383 [001] 6171.480035: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 | ||
1089 | </literallayout> | ||
1090 | </para> | ||
985 | 1091 | ||
1092 | <section id='perf-filtering'> | ||
1093 | <title>Filtering</title> | ||
986 | 1094 | ||
1095 | <para> | ||
1096 | Notice that there are a lot of events that don't really have | ||
1097 | anything to do with what we're interested in, namely events | ||
1098 | that schedule 'perf' itself in and out or that wake perf up. | ||
1099 | We can get rid of those by using the '--filter' option - | ||
1100 | for each event we specify using -e, we can add a --filter | ||
1101 | after that to filter out trace events that contain fields | ||
1102 | with specific values: | ||
1103 | <literallayout class='monospaced'> | ||
1104 | root@crownbay:~# perf record -a -e sched:sched_switch --filter 'next_comm != perf && prev_comm != perf' -e sched:sched_wakeup --filter 'comm != perf' | ||
1105 | ^C[ perf record: Woken up 38 times to write data ] | ||
1106 | [ perf record: Captured and wrote 9.688 MB perf.data (~423279 samples) ] | ||
987 | 1107 | ||
1108 | |||
1109 | root@crownbay:~# perf script | ||
1110 | |||
1111 | swapper 0 [000] 7932.162180: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120 | ||
1112 | kworker/0:3 1209 [000] 7932.162236: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 | ||
1113 | perf 1407 [001] 7932.170048: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 | ||
1114 | perf 1407 [001] 7932.180044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 | ||
1115 | perf 1407 [001] 7932.190038: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 | ||
1116 | perf 1407 [001] 7932.200044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 | ||
1117 | perf 1407 [001] 7932.210044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 | ||
1118 | perf 1407 [001] 7932.220044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 | ||
1119 | swapper 0 [001] 7932.230111: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 | ||
1120 | swapper 0 [001] 7932.230146: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/1:1 next_pid=21 next_prio=120 | ||
1121 | kworker/1:1 21 [001] 7932.230205: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120 | ||
1122 | swapper 0 [000] 7932.326109: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000 | ||
1123 | swapper 0 [000] 7932.326171: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120 | ||
1124 | kworker/0:3 1209 [000] 7932.326214: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 | ||
1125 | </literallayout> | ||
1126 | In this case, we've filtered out all events that have 'perf' | ||
1127 | in their 'comm' or 'comm_prev' or 'comm_next' fields. Notice | ||
1128 | that there are still events recorded for perf, but notice | ||
1129 | that those events don't have values of 'perf' for the filtered | ||
1130 | fields. To completely filter out anything from perf will | ||
1131 | require a bit more work, but for the purpose of demonstrating | ||
1132 | how to use filters, it's close enough. | ||
1133 | </para> | ||
1134 | |||
1135 | <note> | ||
1136 | Tying It Together: These are exactly the same set of event | ||
1137 | filters defined by the trace event subsystem. See the | ||
1138 | ftrace/tracecmd/kernelshark section for more discussion about | ||
1139 | these event filters. | ||
1140 | </note> | ||
1141 | |||
1142 | <note> | ||
1143 | Tying It Together: These event filters are implemented by a | ||
1144 | special-purpose pseudo-interpreter in the kernel and are an | ||
1145 | integral and indispensable part of the perf design as it | ||
1146 | relates to tracing. kernel-based event filters provide a | ||
1147 | mechanism to precisely throttle the event stream that appears | ||
1148 | in user space, where it makes sense to provide bindings to real | ||
1149 | programming languages for postprocessing the event stream. | ||
1150 | This architecture allows for the intelligent and flexible | ||
1151 | partitioning of processing between the kernel and user space. | ||
1152 | Contrast this with other tools such as SystemTap, which does | ||
1153 | all of its processing in the kernel and as such requires a | ||
1154 | special project-defined language in order to accommodate that | ||
1155 | design, or LTTng, where everything is sent to userspace and | ||
1156 | as such requires a super-efficient kernel-to-userspace | ||
1157 | transport mechanism in order to function properly. While | ||
1158 | perf certainly can benefit from for instance advances in | ||
1159 | the design of the transport, it doesn't fundamentally depend | ||
1160 | on them. Basically, if you find that your perf tracing | ||
1161 | application is causing buffer I/O overruns, it probably | ||
1162 | means that you aren't taking enough advantage of the | ||
1163 | kernel filtering engine. | ||
1164 | </note> | ||
1165 | </section> | ||
1166 | </section> | ||
1167 | |||
1168 | <section id='using-dynamic-tracepoints'> | ||
1169 | <title>Using Dynamic Tracepoints</title> | ||
1170 | |||
1171 | <para> | ||
1172 | perf isn't restricted to the fixed set of static tracepoints | ||
1173 | listed by 'perf list'. Users can also add their own 'dynamic' | ||
1174 | tracepoints anywhere in the kernel. For instance, suppose we | ||
1175 | want to define our own tracepoint on do_fork(). We can do that | ||
1176 | using the 'perf probe' perf subcommand: | ||
1177 | <literallayout class='monospaced'> | ||
1178 | root@crownbay:~# perf probe do_fork | ||
1179 | Added new event: | ||
1180 | probe:do_fork (on do_fork) | ||
1181 | |||
1182 | You can now use it in all perf tools, such as: | ||
1183 | |||
1184 | perf record -e probe:do_fork -aR sleep 1 | ||
1185 | </literallayout> | ||
1186 | Adding a new tracepoint via 'perf probe' results in an event | ||
1187 | with all the expected files and format in | ||
1188 | /sys/kernel/debug/tracing/events, just the same as for static | ||
1189 | tracepoints (as discussed in more detail in the trace events | ||
1190 | subsystem section: | ||
1191 | <literallayout class='monospaced'> | ||
1192 | root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# ls -al | ||
1193 | drwxr-xr-x 2 root root 0 Oct 28 11:42 . | ||
1194 | drwxr-xr-x 3 root root 0 Oct 28 11:42 .. | ||
1195 | -rw-r--r-- 1 root root 0 Oct 28 11:42 enable | ||
1196 | -rw-r--r-- 1 root root 0 Oct 28 11:42 filter | ||
1197 | -r--r--r-- 1 root root 0 Oct 28 11:42 format | ||
1198 | -r--r--r-- 1 root root 0 Oct 28 11:42 id | ||
1199 | |||
1200 | root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# cat format | ||
1201 | name: do_fork | ||
1202 | ID: 944 | ||
1203 | format: | ||
1204 | field:unsigned short common_type; offset:0; size:2; signed:0; | ||
1205 | field:unsigned char common_flags; offset:2; size:1; signed:0; | ||
1206 | field:unsigned char common_preempt_count; offset:3; size:1; signed:0; | ||
1207 | field:int common_pid; offset:4; size:4; signed:1; | ||
1208 | field:int common_padding; offset:8; size:4; signed:1; | ||
1209 | |||
1210 | field:unsigned long __probe_ip; offset:12; size:4; signed:0; | ||
1211 | |||
1212 | print fmt: "(%lx)", REC->__probe_ip | ||
1213 | </literallayout> | ||
1214 | We can list all dynamic tracepoints currently in existence: | ||
1215 | <literallayout class='monospaced'> | ||
1216 | root@crownbay:~# perf probe -l | ||
1217 | probe:do_fork (on do_fork) | ||
1218 | probe:schedule (on schedule) | ||
1219 | </literallayout> | ||
1220 | Let's record system-wide ('sleep 30' is a trick for recording | ||
1221 | system-wide but basically do nothing and then wake up after | ||
1222 | 30 seconds): | ||
1223 | <literallayout class='monospaced'> | ||
1224 | root@crownbay:~# perf record -g -a -e probe:do_fork sleep 30 | ||
1225 | [ perf record: Woken up 1 times to write data ] | ||
1226 | [ perf record: Captured and wrote 0.087 MB perf.data (~3812 samples) ] | ||
1227 | </literallayout> | ||
1228 | Using 'perf script' we can see each do_fork event that fired: | ||
1229 | <literallayout class='monospaced'> | ||
1230 | root@crownbay:~# perf script | ||
1231 | |||
1232 | # ======== | ||
1233 | # captured on: Sun Oct 28 11:55:18 2012 | ||
1234 | # hostname : crownbay | ||
1235 | # os release : 3.4.11-yocto-standard | ||
1236 | # perf version : 3.4.11 | ||
1237 | # arch : i686 | ||
1238 | # nrcpus online : 2 | ||
1239 | # nrcpus avail : 2 | ||
1240 | # cpudesc : Intel(R) Atom(TM) CPU E660 @ 1.30GHz | ||
1241 | # cpuid : GenuineIntel,6,38,1 | ||
1242 | # total memory : 1017184 kB | ||
1243 | # cmdline : /usr/bin/perf record -g -a -e probe:do_fork sleep 30 | ||
1244 | # event : name = probe:do_fork, type = 2, config = 0x3b0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern | ||
1245 | = 0, id = { 5, 6 } | ||
1246 | # HEADER_CPU_TOPOLOGY info available, use -I to display | ||
1247 | # ======== | ||
1248 | # | ||
1249 | matchbox-deskto 1197 [001] 34211.378318: do_fork: (c1028460) | ||
1250 | matchbox-deskto 1295 [001] 34211.380388: do_fork: (c1028460) | ||
1251 | pcmanfm 1296 [000] 34211.632350: do_fork: (c1028460) | ||
1252 | pcmanfm 1296 [000] 34211.639917: do_fork: (c1028460) | ||
1253 | matchbox-deskto 1197 [001] 34217.541603: do_fork: (c1028460) | ||
1254 | matchbox-deskto 1299 [001] 34217.543584: do_fork: (c1028460) | ||
1255 | gthumb 1300 [001] 34217.697451: do_fork: (c1028460) | ||
1256 | gthumb 1300 [001] 34219.085734: do_fork: (c1028460) | ||
1257 | gthumb 1300 [000] 34219.121351: do_fork: (c1028460) | ||
1258 | gthumb 1300 [001] 34219.264551: do_fork: (c1028460) | ||
1259 | pcmanfm 1296 [000] 34219.590380: do_fork: (c1028460) | ||
1260 | matchbox-deskto 1197 [001] 34224.955965: do_fork: (c1028460) | ||
1261 | matchbox-deskto 1306 [001] 34224.957972: do_fork: (c1028460) | ||
1262 | matchbox-termin 1307 [000] 34225.038214: do_fork: (c1028460) | ||
1263 | matchbox-termin 1307 [001] 34225.044218: do_fork: (c1028460) | ||
1264 | matchbox-termin 1307 [000] 34225.046442: do_fork: (c1028460) | ||
1265 | matchbox-deskto 1197 [001] 34237.112138: do_fork: (c1028460) | ||
1266 | matchbox-deskto 1311 [001] 34237.114106: do_fork: (c1028460) | ||
1267 | gaku 1312 [000] 34237.202388: do_fork: (c1028460) | ||
1268 | </literallayout> | ||
1269 | And using 'perf report' on the same file, we can see the | ||
1270 | callgraphs from starting a few programs during those 30 seconds: | ||
1271 | </para> | ||
1272 | |||
1273 | <para> | ||
1274 | <imagedata fileref="figures/perf-probe-do_fork-profile.png" width="6in" depth="7in" align="center" scalefit="1" /> | ||
1275 | </para> | ||
1276 | |||
1277 | <note> | ||
1278 | Tying It Together: The trace events subsystem accomodate static | ||
1279 | and dynamic tracepoints in exactly the same way - there's no | ||
1280 | difference as far as the infrastructure is concerned. See the | ||
1281 | ftrace section for more details on the trace event subsystem. | ||
1282 | </note> | ||
1283 | |||
1284 | <note> | ||
1285 | Tying It Together: Dynamic tracepoints are implemented under the | ||
1286 | covers by kprobes and uprobes. kprobes and uprobes are also used | ||
1287 | by and in fact are the main focus of SystemTap. | ||
1288 | </note> | ||
1289 | </section> | ||
1290 | |||
1291 | <section id='perf-documentation'> | ||
1292 | <title>Documentation</title> | ||
1293 | |||
1294 | <para> | ||
1295 | Online versions of the man pages for the commands discussed in this | ||
1296 | section can be found here: | ||
1297 | <itemizedlist> | ||
1298 | <listitem><para>The <ulink url='http://linux.die.net/man/1/perf-stat'>'perf stat' manpage</ulink>. | ||
1299 | </para></listitem> | ||
1300 | <listitem><para>The <ulink url='http://linux.die.net/man/1/perf-record'>'perf record' manpage</ulink>. | ||
1301 | </para></listitem> | ||
1302 | <listitem><para>The <ulink url='http://linux.die.net/man/1/perf-report'>'perf report' manpage</ulink>. | ||
1303 | </para></listitem> | ||
1304 | <listitem><para>The <ulink url='http://linux.die.net/man/1/perf-probe'>'perf probe' manpage</ulink>. | ||
1305 | </para></listitem> | ||
1306 | <listitem><para>The <ulink url='http://linux.die.net/man/1/perf-script'>'perf script' manpage</ulink>. | ||
1307 | </para></listitem> | ||
1308 | <listitem><para>Documentation on using the | ||
1309 | <ulink url='http://linux.die.net/man/1/perf-script-python'>'perf script' python binding</ulink>. | ||
1310 | </para></listitem> | ||
1311 | <listitem><para>The top-level | ||
1312 | <ulink url='http://linux.die.net/man/1/perf'>perf(1) manpage</ulink>. | ||
1313 | </para></listitem> | ||
1314 | </itemizedlist> | ||
1315 | </para> | ||
1316 | |||
1317 | <para> | ||
1318 | Normally, you should be able to invoke the man pages via perf | ||
1319 | itself e.g. 'perf help' or 'perf help record'. | ||
1320 | </para> | ||
1321 | |||
1322 | <para> | ||
1323 | However, by default Yocto doesn't install man pages, but perf | ||
1324 | invokes the man pages for most help functionality. This is a bug | ||
1325 | and is being addressed by a Yocto bug: | ||
1326 | <ulink url='https://bugzilla.yoctoproject.org/show_bug.cgi?id=3388'>Bug 3388 - perf: enable man pages for basic 'help' functionality</ulink>. | ||
1327 | </para> | ||
1328 | |||
1329 | <para> | ||
1330 | The man pages in text form, along with some other files, such as | ||
1331 | a set of examples, can be found in the 'perf' directory of the | ||
1332 | kernel tree: | ||
1333 | <literallayout class='monospaced'> | ||
1334 | tools/perf/Documentation | ||
1335 | </literallayout> | ||
1336 | There's also a nice perf tutorial on the perf wiki that goes | ||
1337 | into more detail than we do here in certain areas: | ||
1338 | <ulink url='https://perf.wiki.kernel.org/index.php/Tutorial'>Perf Tutorial</ulink> | ||
1339 | </para> | ||
1340 | </section> | ||
988 | </section> | 1341 | </section> |
989 | </chapter> | 1342 | </chapter> |
990 | <!-- | 1343 | <!-- |