summaryrefslogtreecommitdiffstats
path: root/doc/networking-profile.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/networking-profile.md')
-rw-r--r--doc/networking-profile.md508
1 files changed, 508 insertions, 0 deletions
diff --git a/doc/networking-profile.md b/doc/networking-profile.md
new file mode 100644
index 0000000..00c5f40
--- /dev/null
+++ b/doc/networking-profile.md
@@ -0,0 +1,508 @@
1# ENEA LINUX NETWORKING PROFILE
2
3Technology trends show that Linux-based Operating Systems have increased their
4presence in the area of high-performance networking applications. While there
5are no standardized ways of programatically accessing hardware offload
6capabilities, several paradigms co-exist in Linux ecosystem to address this
7specific need (e.g. USDPAA, DPDK, ODP etc.) Networking Profile in Enea Linux is
8a framework for anyone attempting to implement high-performance networking
9applications on various hardware platforms. It aims to bring in place all
10necessary building blocks which facilitate efficient development of Linux-based
11solutions on top of network accelerated hardware platforms. As different
12hardware platforms have distinct data-path acceleration solutions, Networking
13Profile implementation is very dependent on underlying hardware capabilities.
14
15This document tries to describe the implementation details, changes, additions,
16kernel configurations and tunings Enea made in order to achieve highly
17optimized Linux distributions for networking applications.
18
19The following paragraphs focus on Enea Linux Networking Profile on DPAA-based
20QorIQ platforms, illustrating the implementation and changes on NXP P2041rdb
21target.
22
23 Table of Content
24 -------------------------------------------
25 1. Supported Targets
26 2. Real-Time Performance
27 ------- 2.1 Kernel Modifications
28 ------- 2.2 CPU-Isolation with partrt
29 ------- 2.3 Latency Benchmarks
30 3. USDPAA Usage
31 ------- 3.1 Packages
32 ------- 3.2 Prepare Target
33 ------- 3.3 Device Trees
34 ------- 3.4 Boot Parameters
35 ------- 3.5 Boot Instructions P2041RDB
36 ------- 3.6 Run Reflector
37 ------- 3.7 Run SRA
38 ------- 3.8 Throughput using USDPAA
39
40## 1. Supported Targets
41Enea Linux Networking Profile has initially been tested on p2041rdb.
42<!--
43 Table 1.1 Functionally verified targets
44| Target | Reflector App | SRIO | SRIO RCW available?
45| --- | --- | --- | --- |
46| p2041rdb | OK | Yes | Yes*
47
48\* RCW that supports SRIO has to be created in code warrior or copied from the
49 SRA User Guide [NXP-SDK], see section 3.1.
50-->
51## 2. Improving Real-Time Performance
52<!--
53FIXME/WIP! need to add nohz and test.
54-->
55
56### 2.1 Kernel Modifications
57When modifying a kernel for high-performance and low-latency applications there
58are several aspects to take into consideration. In the [Enea Linux Real-Time
59Guide](http://linuxrealtime.org/index.php/Main_Page) a thorough investigation
60and explanation of how to optimize Linux for low latency is given. Below is a
61short description of kernel features added specifically to Enea Linux
62Networking Profile in order to enhance real-time performance.
63
64 Table 2.1 Added kernel features and their properties.
65| Change | Reason
66| --- | --- |
67| RCU priority boosting -> cfg/rcu_boost.cfg | Give low priority readers a higher priority to keep them from blocking tasks of higher prority. [RCU] |
68| Offload RCU callback Processing -> cfg/rcu_nocb | To reduce OS jitter, enable offloading of RCU callback processing to kernel threads. The rcu_nocbs boot parameter is used to define the set of CPUs to be offloaded. |
69| Hotplug CPU -> cfg/hotplug_cpu.cfg | Allows CPUs to be added to/removed from a live kernel. [HOTPLUG] |
70<!-- Adaptive-ticks CPU - cfg/nohz.cfg Avoid scheduling clock interruptd for CPUs running a single task. [NOHZ] -->
71
72#### 2.1.1 Boot Parameters
73<!--
74**TBD/FIXME:** Boot parameters for nohz_full and isolcpus.
75-->
76**From [KERN-PARA]:**
77<!--
78nohz= [KNL] Boottime enable/disable dynamic ticks
79 Valid arguments: on, off
80 Default: on
81-->
82```
83
84rcu_nocbs= [KNL]
85 In kernels built with CONFIG_RCU_NOCB_CPU=y, set
86 the specified list of CPUs to be no-callback CPUs.
87 Invocation of these CPUs' RCU callbacks will
88 be offloaded to "rcuox/N" kthreads created for
89 that purpose, where "x" is "b" for RCU-bh, "p"
90 for RCU-preempt, and "s" for RCU-sched, and "N"
91 is the CPU number. This reduces OS jitter on the
92 offloaded CPUs, which can be useful for HPC and
93 real-time workloads. It can also improve energy
94 efficiency for asymmetric multiprocessors.
95
96isolcpus= [KNL,SMP] Isolate CPUs from the general scheduler.
97 Format:
98 <cpu number>,...,<cpu number>
99 or
100 <cpu number>-<cpu number>
101 (must be a positive range in ascending order)
102 or a mixture
103 <cpu number>,...,<cpu number>-<cpu number>
104
105 This option can be used to specify one or more CPUs
106 to isolate from the general SMP balancing and scheduling
107 algorithms. You can move a process onto or off an
108 "isolated" CPU via the CPU affinity syscalls or cpuset.
109 <cpu number> begins at 0 and the maximum value is
110 "number of CPUs in system - 1".
111
112 This option is the preferred way to isolate CPUs. The
113 alternative -- manually setting the CPU mask of all
114 tasks in the system -- can cause problems and
115 suboptimal load balancer performance.
116```
117
118### 2.2 Cpu-isolation with partrt
119A tool called partrt is included in the networking profile to divide an SMP
120Linux system into partitions. A description of the tool can be found in the
121[Linux Real-Time
122Guide](http://linuxrealtime.org/index.php/Improving_the_Real-Time_Properties#The_CPU_Partitioning_Tool_-_partrt).
123
124### 2.3 Latency Benchmarks
125The cyclictest suite [CYCLIC] is a measurement of system latency used in many
126projects. As a comparison the measurement was applied to Enea Linux 6.0
127Standard, and to Enea Linux Networking profile to investigate the impact of the
128changes to the system.
129
130Below *cyclictest* is tested on the two different systems, average and
131maximum latency are presented in the tables below, first the test on the
132standard profile and after the results on the networking profile are shown. It
133is also combined with *stress* [STRESS] to show system performance under
134different type of loads.
135
136<!--
137FIXME!
138 **Enea Linux 6.0 Standard System Info**
139 - Kernel size:
140 - Root-fs size:
141
142**Enea Linux 6.0 Networking Profile System Info**
143 - Kernel size:
144 - Root-fs size:
145-->
146
147 Command: Cyclictest with no stress
148
149| CPU \# | P | I | C_std | Avg_std (us) | Max_std (us)| C_net | Avg_net (us)| Max_net (us)
150| --- | --- | --- | --- | --- | --- | --- | --- | --- |
151| 0 | 99 | 1000 | 100000 | 9 | 24 | 100000 | 6 | 11 |
152| 1 | 99 | 1500 | 66817 | 9 | 21 | 66812 | 6 | 12 |
153| 2 | 99 | 2000 | 50208 | 9 | 23 | 50106 | 6 | 18 |
154| 3 | 99 | 2500 | 40083 | 9 | 14 | 40082 | 6 | 10 |
155
156 Command: cyclictest with hdd stress:
157 # stress -d 4 --hdd-bytes 1M &
158
159| CPU \# | P | I | C_std | Avg_std (us) | Max_std (us)| C_net | Avg_net (us)| Max_net (us)
160| --- | --- | --- | --- | --- | --- | --- | --- | --- |
161| 0 | 99 | 1000 | 100000 | 14 | 223 | 100000 | 11 | 77 |
162| 1 | 99 | 1500 | 66820 | 14 | 231 | 66745 | 11 | 100 |
163| 2 | 99 | 2000 | 50109 | 14 | 186 | 50055 | 11 | 76 |
164| 3 | 99 | 2500 | 40083 | 14 | 176 | 40041 | 11 | 81 |
165
166 Command: cyclictest with vm stress:
167 # stress -m 4 --vm-bytes 4096 &
168
169| CPU \# | P | I | C_std | Avg_std (us) | Max_std (us)| C_net | Avg_net (us)| Max_net (us)
170| --- | --- | --- | --- | --- | --- | --- | --- | --- |
171| 0 | 99 | 1000 | 100000 | 5 | 20 | 100000 | 3 | 15 |
172| 1 | 99 | 1500 | 66818 | 6 | 14 | 66739 | 3 | 7 |
173| 2 | 99 | 1500 | 50109 | 6 | 14 | 50103 | 3 | 9 |
174| 3 | 99 | 1500 | 40079 | 6 | 14 | 40081 | 3 | 6 |
175
176 Command: cyclictest with full stress trial 1:
177 # stress -c 4 -i 4 -m 4 --vm-bytes 4096 -d 4 --hdd-bytes 4096 &
178
179| CPU \# | P | I | C_std | Avg_std (us) | Max_std (us)| C_net | Avg_net (us)| Max_net (us)
180| --- | --- | --- | --- | --- | --- | --- | --- | --- |
181| 0 | 99 | 1000 | 99808 | 7 | 93 | 99815 | 6 | 58 |
182| 1 | 99 | 1500 | 66733 | 9 | 105 | 66739 | 6 | 54 |
183| 2 | 99 | 2000 | 50039 | 9 | 79 | 50039 | 7 | 61 |
184| 3 | 99 | 2500 | 40016 | 10 | 83 | 40032 | 6 | 57 |
185
186 Command: cyclictest with full stress trial 2:
187 # stress -c 4 -i 4 -m 4 --vm-bytes 4096 -d 4 --hdd-bytes 1M &
188
189| CPU \# | P | I | C_std | Avg_std (us) | Max_std (us)| C_net | Avg_net (us)| Max_net (us)
190| --- | --- | --- | --- | --- | --- | --- | --- | --- |
191| 0 | 99 | 1000 | 100000 | 13 | 201 | 100000 | 10 | 87 |
192| 1 | 99 | 1500 | 66646 | 11 | 186 | 66685 | 9 | 85 |
193| 2 | 99 | 2000 | 49969 | 10 | 195 | 49998 | 10 | 70 |
194| 3 | 99 | 2500 | 39960 | 11 | 112 | 39992 | 10 | 90 |
195
196## 3. USDPAA Usage
197The need for predictive and good performance for networking systems is
198critical. One way of achieving greater performance is for user-space to avoid
199interactions with the kernel. The kernel is responsible for hardware
200acceleration allocation and scheduling. By using frameworks such as
201USDPAA[NXP-SDK] and DPDK[DPDK] control over certain hardware can be given
202to user-space. USDPAA is specific to NXP/Freescale's QoriQ platforms, for more
203information please see their guide to USDPAA [NXP-SDK].
204
205In the guides below, an example of how to prepare a p2041rdb target with SRIO
206and ethernet through USDPAA is given.
207
208
209### 3.1 Packages
210The networking profile supports and includes all packages necessary for
211software support of USDPAA. If another image is created the below listed
212packages are relevant to include (all available in meta-fsl-ppc and
213meta-freescale) in order to add support for USDPAA and some example applications.
214
215 * usdpaa
216 * usdpaa-apps
217 * fmc
218 * fmlib
219 * flib
220 * eth-config
221
222### 3.2 Prepare Target
223The SRIO application needs us to boot with a RCW and board configuration that
224allows usage of the PCI extender port. The below examples are specific to
225p2041rdb, but similar steps can be taken for other targets where SRIO is not
226enabled by default.
227
228#### 3.2.1 Reset Control Word (RCW)
229The reset control word must configure the serial-deserializer (SERDES) bus for
230SRIO. This can be done by either a predefined binary/setting, or can be created
231in Code Warrior [CW].
232
233The RCW used in this example was given in the NXP/Freescale SRA User Guide of
234[NXP-SDK].
235
236To program the RCW to target from u-boot follow the steps below:
237
238```
239=> tftp 1000000 <path-2-file>/RR_RS_0x02.bin
240
241=> md 0xec000000
242ec000000: aa55aa55 010e0100 12600000 00000000 .U.U.....`......
243ec000010: 241c0000 00000000 248e40c0 c3c02000 $.......$.@... .
244ec000020: de800000 40000000 00000000 00000000 ....@...........
245ec000030: 00000000 d0030f07 00000000 00000000 ................
246ec000040: 00000000 00000000 091380c0 000009c4 ................
247ec000050: 09000010 00000000 091380c0 000009c4 ................
248ec000060: 09000014 00000000 091380c0 000009c4 ................
249ec000070: 09000018 81d00000 091380c0 000009c4 ................
250ec000080: 890b0050 00000002 091380c0 000009c4 ...P............
251ec000090: 890b0054 00000002 091380c0 000009c4 ...T............
252ec0000a0: 890b0058 00000002 091380c0 000009c4 ...X............
253ec0000b0: 890b005c 00000002 091380c0 000009c4 ...\............
254ec0000c0: 890b0090 00000002 091380c0 000009c4 ................
255ec0000d0: 890b0094 00000002 091380c0 000009c4 ................
256ec0000e0: 890b0098 00000002 091380c0 000009c4 ................
257ec0000f0: 890b009c 00000002 091380c0 000009c4 ................
258
259=> protect off 0xec000000 +$filesize
260Un-Protected 1 sectors
261=> erase 0xec000000 +$filesize
262
263. done
264Erased 1 sectors
265
266=> cp.b 1000000 0xec000000 $filesize
267Copy to Flash... 9done
268
269=> protect on 0xec000000 +$filesize
270Protected 1 sectors
271
272=> md 0xec000000
273ec000000: aa55aa55 010e0100 12600000 00000000 .U.U.....`......
274ec000010: 241c0000 00000000 088040c0 c3c02000 $.........@... .
275ec000020: de800000 40000000 00000000 00000000 ....@...........
276ec000030: 00000000 d0030f07 00000000 00000000 ................
277ec000040: 00000000 00000000 091380c0 000009c4 ................
278ec000050: 09000010 00000000 091380c0 000009c4 ................
279ec000060: 09000014 00000000 091380c0 000009c4 ................
280ec000070: 09000018 81d00000 091380c0 000009c4 ................
281ec000080: 890b0050 00000002 091380c0 000009c4 ...P............
282ec000090: 890b0054 00000002 091380c0 000009c4 ...T............
283ec0000a0: 890b0058 00000002 091380c0 000009c4 ...X............
284ec0000b0: 890b005c 00000002 091380c0 000009c4 ...\............
285ec0000c0: 890b0090 00000002 091380c0 000009c4 ................
286ec0000d0: 890b0094 00000002 091380c0 000009c4 ................
287ec0000e0: 890b0098 00000002 091380c0 000009c4 ................
288ec0000f0: 890b009c 00000002 091380c0 000009c4 ................
289=>
290```
291
292In order to obtain specific hardware settings, some multiplexers need to be
293set. Descriptions of these can be obtained by typing **cpld -h** in u-boot,
294information about the peripherals are also available in the board's respective
295user guide. For SRIO on the p2041rdb target the following settings are
296necessary.
297
298```
299cpld_cmd lane_mux 6 0
300cpld_cmd lane_mux a 0
301cpld_cmd lane_mux c 0
302cpld_cmd lane_mux d 0
303```
304### 3.3 Device Trees
305USDPAA enabled targets have very specific device trees, this is because instead
306of handing over hardware to the linux kernel, it is managed by the DPAA
307framework.
308
309The networking profile includes a custom device-tree
310(<machine>-usdpaa-enea.dtb), currently defined for p2041rdb. It gives one
311ethernet interface to the kernel while the rest belongs to DPAA.
312
313Available and tested device-trees for p2041rdb:
314
315* uImage-p2041rdb-usdpaa-enea.dtb EL custom interface, gives one interface to the linux kernel and remaining to DPAA.
316
317### 3.4 Boot Parameters
318USDPAA demands some custom boot arguments. If not given, or if given improperly
319the USDPAA applications will not be usable. The NXP/Freescale manual covers these arguments, however might be misdirecting since the documentation in several places are, rather than target agnostic, specific instructions that are only applicable to certain targets. If unsure, one can consult the benchmarking
320chapter of the NXP/Freescale SDK documentation that include more exact steps per
321tested targets.
322
323For our purposes of testing SRIO and the reflector application we only need the
324'usdpaa_mem' boot argument. If the reserved memory is too large, it will cause
325segmentation faults.
326
327 Table 4.1 'usdpaa_mem=?'
328
329| TARGET | usdpaa_mem
330| --- | --- |
331| p2041rdb | =< 64M |
332
333### 3.5 Boot instructions P2041RDB
334Below are instructions on how to boot a p2041rdb board with usdpaa enabled.
335
336#### 3.5.1 Boot over NFS server
337```
338tftp 1000000 uImage-p2041rdb.bin
339tftp c00000 uImage-p2041rdb-usdpaa-enea.dtb
340
341setenv bootargs root=/dev/nfs
342nfsroot=172.21.3.8:/unix/enea_linux_rootfs/<folder-path> rw ip=dhcp
343console=ttyS0,115200 memmap=16M$0xf7000000 mem=4080M max_addr=f6ffffff
344usdpaa_mem=64M
345
346bootm 1000000 - c00000
347```
348#### 3.5.2 RAM Boot
349```
350tftp 1000000 uImage-p2041rdb.bin
351tftp 2000000 uImage-p2041rdb-usdpaa.dtb
352tftp 5000000 enea-image-networking-p2041rdb.ext2.gz.u-boot
353
354setenv bootargs root=/dev/ram rw console=ttyS0,115200 ramdisk_size=10000000
355log_buf_len=128K usdpaa_mem=64M
356
357bootm 0x1000000 0x5000000 0x2000000
358```
359
360### 3.6 Run Reflector
361Reflector is a demo application from NXP/Freescale that through ethernet
362recieves a package and sends it back to with switched source-destination IP
363addresses switched.
364
365In order to test reflector an ethernet cable must be connected between either
366two p2041rdb targets or between a work-PC and the p2041rdb target.
367
368Connect two targets by ethernet and do the following steps to test connection
369with usdpaa:
370
3711. Boot Board A with uImage-p2041rdb-uspdaa-enea.dtb
3722. Boot Board B with uImage-p2041rdb.dtb
3733. On board A do the following:
374```
375 # Configure what ethernet ports that should be used by dpaa
376 $ vi config-p2041rdb.xml
377 <cfgdata>
378 <config>
379 <engine name="fm0">
380 <port type="MAC" number="2"
381policy="hash_ipsec_src_dst_spi_policy1"/>
382 <port type="MAC" number="3"
383policy="hash_ipsec_src_dst_spi_policy2"/>
384 <port type="MAC" number="4"
385policy="hash_ipsec_src_dst_spi_policy3"/>
386 <port type="MAC" number="5"
387policy="hash_ipsec_src_dst_spi_policy4"/>
388 </engine>
389 </config>
390 </cfgdata>
391
392 $ fmc -c config-p2041rdb.xml -p /usr/etc/usdpaa_policy_hash_ipv4.xml -a
393
394 # start reflector and check the ethernet ports:
395 $ reflector
396 reflector> ifconfig
397```
3984. On board B do the following:
399```
400 # Configure the network interface to which you connected the ethernet
401 cable, by choosing an ip- and MAC address.
402 $ ifconfig -a
403 $ ifconfig <eth-x> 192.168.0.10 netmask 255.255.255.0 up
404
405 # Configure to connect to the hw address
406 $ arp -s 192.168.0.11 <hw-address> -i <eth-x>
407
408 # Ping Board A
409 $ ping 192.168.0.11
410```
411### 3.7 Run SRA
412The user space drivers from NXP/Freescale support usage of SRIO from linux user
413space. The SRA application is a demo application from NXP/Freescale that can
414implement writing from one SRIO interface to another, avioding kernel
415interaction by using DMA (direct memory access) memory management. More
416information on the drivers can be found in the NXP/freescale SDK [NXP-SDK].
417
418In the test run below two boards are prepared with SRIO interfaces. Both boards
419are initialized with a memory setting that sets the different SRIO memory
420spaces to different values. Then data from Board A's *Write-prepapration space*
421is written to Board B's *Map space*.
422
423```
424 # start the srio application
425 $ sra
426
427 # setup board B (receiver)
428 sra> sra -attr port1 win_attr 1 nwrite nread
429
430 # set local memory to data predefined in 0x100000
431 sra> sra -op port1 1 0 0 s 0x100000
432
433 # read to view what is written to port 1
434 sra> sra -op port1 1 0 0 p 0x100000
435
436
437 # setup board A (transmitter)
438 sra> sra -attr port1 win_attr 1 nwrite nread
439
440 # set local memory and then read to confirm
441 sra> sra -op port1 1 0 0 s 0x100000
442 sra> sra -op port1 1 0 0 p 0x100000
443
444 # write what is in 'write preparing space' to outbound
445 sra> sra -op port1 1 0 0 w 0x100000
446
447 # confirm on board B that 'write preparation space' from board A is written in
448 # 'map space'
449 sra> sra -op port1 1 0 0 p 0x100000
450```
451
452```
453 Board A Board B
454 +-------------------+ +-------------------+
455 | Map space | +------>| Map space |
456 +-------------------+ | +-------------------+
457 | Read data space | | | Read data space |
458 +-------------------+ outbound | +-------------------+
459 | write preparation |-----------+ | write preparation |
460 | space | | space |
461 +-------------------+ +-------------------+
462 | reserved space | | reserved space |
463 +-------------------+ +-------------------+
464
465Image 3.1 SRIO memory space
466```
467
468### 3.8 Throughput using USDPAA
469To show some of the power of using the USDPAA framework a test of the
470throughput over a 10G ethernet link was tested and the results are presented
471below.
472
473The tool used for performance measuring was Spirent Test Center which is used as
474a packet generator along with the “Spirent Test Center” application, version
4754.33. The test targets are connected to the Spirent Test Center through 10G
476Ethernet ports(XAUI-RISER card). On Enea Linux 6.0 a the USDPAA reflector was
477used as a packet forwarding application. The resulting Throughput performance measured with spirent can be seen in image 3.2 below.
478
479<!--![alt text](./throughput_p2041rdb_networking.png =250x)-->
480<img src="img/throughput_p2041rdb_networking.png" alt="Drawing" style="width: 500px;"/>
481
482 Image 3.2 Throughput on p2041rdb over a 10G ethernet port using USDPAA.
483 x-axis shows the frame size in bytes, and y-axis the aggregated throughput in
484 megabits per second.
485
486
487
488--------------
489--------------
490[NXP-SDK] QorIQ SDK 1.9 Documentation -
491https://freescale.sdlproducts.com/LiveContent/web/pub.xql?c=t&action=home&pub=QorIQ_SDK_1.9&lang=en-US#addHistory=true&filename=GUID-81837065-81AD-449B-8572-E96C3EED636F.xml&docid=GUID-81837065-81AD-449B-8572-E96C3EED636F&inner_id=&tid=&query=&scope=&resource=&toc=false&eventType=lcContent.loadHome
492
493[KERN-PARA] Kernel Parameters, https://www.kernel.org/doc/Documentation/kernel-parameters.txt
494
495[RCU] Paul McKenney, *Priority-Boosting RCU Read-Side Critical Section*, https://lwn.net/Articles/220677/
496<!--
497[NOHZ] https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt
498-->
499
500[HOTPLUG] https://www.kernel.org/doc/Documentation/cpu-hotplug.txt
501
502[DPDK] http://dpdk.org/
503
504[CYCLIC] https://rt.wiki.kernel.org/index.php/Cyclictest
505
506[STRESS] http://linux.die.net/man/1/stresshttp://linux.die.net/man/1/stress
507
508[CW] http://www.nxp.com/products/software-and-tools/software-development-tools/codewarrior-development-tools:CW_HOME