summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorNora Björklund <nora.bjorklund@enea.com>2016-07-07 13:53:12 +0200
committerMartin Borg <martin.borg@enea.com>2016-07-07 13:58:07 +0200
commitc1c3126d5667c8e8c5a2a994070110155f374a33 (patch)
treecf1880598433fc26dae4296857c67c84412a9d87
parent99f038c526ad5e52fe82df5bc1c91fe21008cd41 (diff)
downloadmeta-el-networking-krogoth.tar.gz
doc: add networking profile documentationkrogoth
Add the following: - networking-profile.md - markdown document describing the networking profile - networking-profile.html - generated from the .md file - image in img/ with the IPv4 thropughput performance Signed-off-by: Nora Björklund <nora.bjorklund@enea.com> Signed-off-by: Martin Borg <martin.borg@enea.com>
-rw-r--r--doc/img/throughput_p2041rdb_networking.pngbin0 -> 7335 bytes
-rw-r--r--doc/networking-profile.html1121
-rw-r--r--doc/networking-profile.md508
3 files changed, 1629 insertions, 0 deletions
diff --git a/doc/img/throughput_p2041rdb_networking.png b/doc/img/throughput_p2041rdb_networking.png
new file mode 100644
index 0000000..999506e
--- /dev/null
+++ b/doc/img/throughput_p2041rdb_networking.png
Binary files differ
diff --git a/doc/networking-profile.html b/doc/networking-profile.html
new file mode 100644
index 0000000..b72a025
--- /dev/null
+++ b/doc/networking-profile.html
@@ -0,0 +1,1121 @@
1<!DOCTYPE html>
2<html>
3<head>
4<meta charset="UTF-8">
5<title>networking-profile.html</title>
6<style>
7@import 'https://fonts.googleapis.com/css?family=Droid+Sans+Mono|Open+Sans:400,400i,600,600i';
8
9* {
10 overflow: visible !important;
11 -webkit-text-size-adjust: 100%;
12 -webkit-font-smoothing: antialiased;
13 box-decoration-break: clone;
14}
15
16html, body {
17 background: #FFF;
18 font-family: "Open Sans", "Segoe UI", Arial, freesans, sans-serif;
19 font-size: 10px;
20 line-height: 1.4;
21 color: #333;
22 word-wrap: break-word;
23 hyphens: auto;
24}
25
26hr {
27 margin-top: 20px;
28 margin-bottom: 20px;
29 border: 0;
30 border-top: 4px solid #EEE;
31}
32
33code {
34 padding: .1em .4em;
35 display: inline-block;
36 background-color: #f9f2f4;
37 color: #c7254e;
38 border-radius: 3px;
39 border: 0 none;
40 font: 9px "Droid Sans Mono", Consolas, "Liberation Mono", Menlo, "Courier New", Courier, monospace;
41 line-height: 1.2;
42 hyphens: manual;
43}
44
45pre code {
46 padding: 15px;
47 display: block;
48 background-color: #F9F9F9;
49 color: #555;
50 font-size: 10px;
51 box-shadow: inset -1px -1px 0 rgba(0, 0, 0, .08);
52 word-wrap: normal;
53}
54
55/* replace light background for some hljs themes */
56code.github-css,
57code.github-gist-css,
58code.tomorrow-css,
59code.default-css,
60code.googlecode-css,
61code.ascetic-css,
62code.color-brewer-css,
63code.grayscale-css,
64code.idea-css,
65code.vs-css,
66code.xcode-css {
67 background-color: #F9F9F9 !important;
68}
69
70blockquote {
71 padding: 10px;
72 margin-left: 0;
73 color: #666;
74 border: 0 none;
75 border-left: 4px solid #EEE;
76}
77
78blockquote p:last-child,
79blockquote ul:last-child,
80blockquote ol:last-child {
81 margin-bottom: 0;
82}
83
84table {
85 border-collapse: collapse;
86 border-spacing: 0;
87 background-color: #FFF;
88 width: 100%;
89 max-width: 100%;
90 margin-bottom: 20px;
91 border: 1px solid #DDD;
92}
93
94table div {
95 page-break-inside: avoid;
96}
97
98th, td {
99 text-align: left;
100}
101
102table > thead > tr > th,
103table > tbody > tr > th,
104table > tfoot > tr > th,
105table > thead > tr > td,
106table > tbody > tr > td,
107table > tfoot > tr > td {
108 padding: 8px 14px;
109 vertical-align: top;
110 border-top: 1px solid #DDD;
111}
112
113table > caption + thead > tr:first-child > th,
114table > colgroup + thead > tr:first-child > th,
115table > thead:first-child > tr:first-child > th,
116table > caption + thead > tr:first-child > td,
117table > colgroup + thead > tr:first-child > td,
118table > thead:first-child > tr:first-child > td {
119 border-top: 0;
120}
121
122table > tbody + tbody {
123 border-top: 2px solid #DDD;
124}
125
126table table {
127 background-color: #FFF;
128}
129
130table > thead > tr > th,
131table > tbody > tr > th,
132table > tfoot > tr > th,
133table > thead > tr > td,
134table > tbody > tr > td,
135table > tfoot > tr > td {
136 border: 1px solid #DDD;
137}
138
139table > thead > tr > th,
140table > thead > tr > td {
141 border-bottom-width: 2px;
142 text-align: center;
143 vertical-align: middle;
144 font-weight: bold;
145 padding-top: 6px;
146 padding-bottom: 6px;
147 font-size: 90%;
148}
149
150table > tbody > tr:nth-of-type(odd) {
151 background-color: #F9F9F9;
152}
153
154img {
155 max-width: 100%;
156 height: auto;
157 vertical-align: middle;
158}
159
160h1,
161h2,
162h3,
163h4,
164h5,
165h6 {
166 font-family: inherit;
167 font-weight: normal;
168 line-height: 1.1;
169 color: #111;
170 margin-top: 20px;
171 margin-bottom: 10px;
172 padding: 0;
173 page-break-after: avoid;
174}
175
176h1, h2 {
177 border-bottom: 1px solid #EEE;
178 padding-bottom: 7px;
179 margin-top: 10px;
180 margin-bottom: 12px;
181}
182
183h1 {
184 font-size: 22px;
185}
186
187h2 {
188 font-size: 17px;
189}
190
191h3 {
192 font-size: 15px;
193}
194
195h4 {
196 font-size: 11px;
197}
198
199h5, h6 {
200 font-size: 10px;
201 font-weight: bold;
202 color: #666;
203}
204
205p {
206 margin: 0 0 10px;
207}
208
209input[type="checkbox"] {
210 margin-right: 6px;
211 position: relative;
212 bottom: 1px;
213}
214
215ul,
216ol {
217 margin-top: 0;
218 margin-bottom: 10px;
219 padding-left: 20px;
220}
221
222ul li,
223ol li {
224 margin-bottom: 2px;
225}
226
227dl {
228 margin-top: 0;
229 margin-bottom: 20px;
230}
231
232dt {
233 font-weight: bold;
234}
235
236dd {
237 margin-left: 0;
238}
239
240a,
241a:visited {
242 text-decoration: none;
243 color: #4078C0;
244}
245
246.new-page,
247.page-break,
248.next-page,
249.page-end {
250 page-break-before: always;
251}
252
253#pageHeader,
254#pageHeader a,
255#pageHeader a:visited {
256 color: #777;
257}
258
259#pageHeader span {
260 vertical-align: middle;
261}
262
263#pageFooter {
264 border-top: 1px solid #EEE;
265 padding-top: 5px;
266 color: #777;
267 font-size: 80%;
268}
269#pageFooter a,
270#pageFooter a:visited {
271 color: #777;
272}
273
274/**
275 * Your markdown-themeable-pdf custom styles
276 *
277 * The default file can be found in the folder ~/.atom/packages/markdown-themeable-pdf/templates
278 * The base css file can be found in the folder ~/.atom/packages/markdown-themeable-pdf/css
279 * The current highlight.js css file can be found in the folder ~/.atom/packages/markdown-themeable-pdf/node_modules/highlight.js/styles
280 */
281
282/*
283html, body {
284 color: red;
285}
286*/
287
288/**
289 * GitHub Gist Theme
290 * Author : Louis Barranqueiro - https://github.com/LouisBarranqueiro
291 */
292
293.hljs {
294 display: block;
295 background: white;
296 padding: 0.5em;
297 color: #333333;
298 overflow-x: auto;
299}
300
301.hljs-comment,
302.hljs-meta {
303 color: #969896;
304}
305
306.hljs-string,
307.hljs-variable,
308.hljs-template-variable,
309.hljs-strong,
310.hljs-emphasis,
311.hljs-quote {
312 color: #df5000;
313}
314
315.hljs-keyword,
316.hljs-selector-tag,
317.hljs-type {
318 color: #a71d5d;
319}
320
321.hljs-literal,
322.hljs-symbol,
323.hljs-bullet,
324.hljs-attribute {
325 color: #0086b3;
326}
327
328.hljs-section,
329.hljs-name {
330 color: #63a35c;
331}
332
333.hljs-tag {
334 color: #333333;
335}
336
337.hljs-title,
338.hljs-attr,
339.hljs-selector-id,
340.hljs-selector-class,
341.hljs-selector-attr,
342.hljs-selector-pseudo {
343 color: #795da3;
344}
345
346.hljs-addition {
347 color: #55a532;
348 background-color: #eaffea;
349}
350
351.hljs-deletion {
352 color: #bd2c00;
353 background-color: #ffecec;
354}
355
356.hljs-link {
357 text-decoration: underline;
358}
359
360pre { white-space: pre-wrap !important; word-break: break-word !important; overflow: hidden !important;}
361</style>
362</head>
363<body>
364<div id="pageContent">
365<h1 id="enea-linux-networking-profile">ENEA LINUX NETWORKING PROFILE</h1>
366<p>Technology trends show that Linux-based Operating Systems have increased their
367presence in the area of high-performance networking applications. While there
368are no standardized ways of programatically accessing hardware offload
369capabilities, several paradigms co-exist in Linux ecosystem to address this
370specific need (e.g. USDPAA, DPDK, ODP etc.) Networking Profile in Enea Linux is
371a framework for anyone attempting to implement high-performance networking
372applications on various hardware platforms. It aims to bring in place all
373necessary building blocks which facilitate efficient development of Linux-based
374solutions on top of network accelerated hardware platforms. As different
375hardware platforms have distinct data-path acceleration solutions, Networking
376Profile implementation is very dependent on underlying hardware capabilities.</p>
377<p>This document tries to describe the implementation details, changes, additions,
378kernel configurations and tunings Enea made in order to achieve highly
379optimized Linux distributions for networking applications.</p>
380<p>The following paragraphs focus on Enea Linux Networking Profile on DPAA-based
381QorIQ platforms, illustrating the implementation and changes on NXP P2041rdb
382target.</p>
383<pre><code>Table of Content
384-------------------------------------------
3851. Supported Targets
3862. Real-Time Performance
387------- 2.1 Kernel Modifications
388------- 2.2 CPU-Isolation with partrt
389------- 2.3 Latency Benchmarks
3903. USDPAA Usage
391------- 3.1 Packages
392------- 3.2 Prepare Target
393------- 3.3 Device Trees
394------- 3.4 Boot Parameters
395------- 3.5 Boot Instructions P2041RDB
396------- 3.6 Run Reflector
397------- 3.7 Run SRA
398------- 3.8 Throughput using USDPAA
399</code></pre>
400<h2 id="1-supported-targets">1. Supported Targets</h2>
401<p>Enea Linux Networking Profile has initially been tested on p2041rdb.</p>
402<!--
403 Table 1.1 Functionally verified targets
404| Target | Reflector App | SRIO | SRIO RCW available?
405| --- | --- | --- | --- |
406| p2041rdb | OK | Yes | Yes*
407
408\* RCW that supports SRIO has to be created in code warrior or copied from the
409 SRA User Guide [NXP-SDK], see section 3.1.
410-->
411<h2 id="2-improving-real-time-performance">2. Improving Real-Time Performance</h2>
412<!--
413FIXME/WIP! need to add nohz and test.
414-->
415<h3 id="21-kernel-modifications">2.1 Kernel Modifications</h3>
416<p>When modifying a kernel for high-performance and low-latency applications there
417are several aspects to take into consideration. In the <a href="http://linuxrealtime.org/index.php/Main_Page">Enea Linux Real-Time
418Guide</a> a thorough investigation
419and explanation of how to optimize Linux for low latency is given. Below is a
420short description of kernel features added specifically to Enea Linux
421Networking Profile in order to enhance real-time performance.</p>
422<pre><code> Table 2.1 Added kernel features and their properties.
423</code></pre>
424<table>
425<thead>
426<tr>
427<th>Change</th>
428<th>Reason</th>
429</tr>
430</thead>
431<tbody>
432<tr>
433<td>RCU priority boosting -&gt; cfg/rcu_boost.cfg</td>
434<td>Give low priority readers a higher priority to keep them from blocking tasks of higher prority. [RCU]</td>
435</tr>
436<tr>
437<td>Offload RCU callback Processing -&gt; cfg/rcu_nocb</td>
438<td>To reduce OS jitter, enable offloading of RCU callback processing to kernel threads. The rcu_nocbs boot parameter is used to define the set of CPUs to be offloaded.</td>
439</tr>
440<tr>
441<td>Hotplug CPU -&gt; cfg/hotplug_cpu.cfg</td>
442<td>Allows CPUs to be added to/removed from a live kernel. [HOTPLUG]</td>
443</tr>
444</tbody>
445</table>
446<!-- Adaptive-ticks CPU - cfg/nohz.cfg Avoid scheduling clock interruptd for CPUs running a single task. [NOHZ] -->
447<h4 id="211-boot-parameters">2.1.1 Boot Parameters</h4>
448<!--
449**TBD/FIXME:** Boot parameters for nohz_full and isolcpus.
450-->
451<p><strong>From [KERN-PARA]:</strong></p>
452<!--
453nohz= [KNL] Boottime enable/disable dynamic ticks
454 Valid arguments: on, off
455 Default: on
456-->
457<pre><code>
458rcu_nocbs= [KNL]
459 In kernels built with CONFIG_RCU_NOCB_CPU=y, set
460 the specified list of CPUs to be no-callback CPUs.
461 Invocation of these CPUs' RCU callbacks will
462 be offloaded to &quot;rcuox/N&quot; kthreads created for
463 that purpose, where &quot;x&quot; is &quot;b&quot; for RCU-bh, &quot;p&quot;
464 for RCU-preempt, and &quot;s&quot; for RCU-sched, and &quot;N&quot;
465 is the CPU number. This reduces OS jitter on the
466 offloaded CPUs, which can be useful for HPC and
467 real-time workloads. It can also improve energy
468 efficiency for asymmetric multiprocessors.
469
470isolcpus= [KNL,SMP] Isolate CPUs from the general scheduler.
471 Format:
472 &lt;cpu number&gt;,...,&lt;cpu number&gt;
473 or
474 &lt;cpu number&gt;-&lt;cpu number&gt;
475 (must be a positive range in ascending order)
476 or a mixture
477 &lt;cpu number&gt;,...,&lt;cpu number&gt;-&lt;cpu number&gt;
478
479 This option can be used to specify one or more CPUs
480 to isolate from the general SMP balancing and scheduling
481 algorithms. You can move a process onto or off an
482 &quot;isolated&quot; CPU via the CPU affinity syscalls or cpuset.
483 &lt;cpu number&gt; begins at 0 and the maximum value is
484 &quot;number of CPUs in system - 1&quot;.
485
486 This option is the preferred way to isolate CPUs. The
487 alternative -- manually setting the CPU mask of all
488 tasks in the system -- can cause problems and
489 suboptimal load balancer performance.
490</code></pre>
491<h3 id="22-cpu-isolation-with-partrt">2.2 Cpu-isolation with partrt</h3>
492<p>A tool called partrt is included in the networking profile to divide an SMP
493Linux system into partitions. A description of the tool can be found in the
494<a href="http://linuxrealtime.org/index.php/Improving_the_Real-Time_Properties#The_CPU_Partitioning_Tool_-_partrt">Linux Real-Time
495Guide</a>.</p>
496<h3 id="23-latency-benchmarks">2.3 Latency Benchmarks</h3>
497<p>The cyclictest suite [CYCLIC] is a measurement of system latency used in many
498projects. As a comparison the measurement was applied to Enea Linux 6.0
499Standard, and to Enea Linux Networking profile to investigate the impact of the
500changes to the system.</p>
501<p>Below <em>cyclictest</em> is tested on the two different systems, average and
502maximum latency are presented in the tables below, first the test on the
503standard profile and after the results on the networking profile are shown. It
504is also combined with <em>stress</em> [STRESS] to show system performance under
505different type of loads.</p>
506<!--
507FIXME!
508 **Enea Linux 6.0 Standard System Info**
509 - Kernel size:
510 - Root-fs size:
511
512**Enea Linux 6.0 Networking Profile System Info**
513 - Kernel size:
514 - Root-fs size:
515-->
516<pre><code> Command: Cyclictest with no stress
517</code></pre>
518<table>
519<thead>
520<tr>
521<th>CPU #</th>
522<th>P</th>
523<th>I</th>
524<th>C_std</th>
525<th>Avg_std (us)</th>
526<th>Max_std (us)</th>
527<th>C_net</th>
528<th>Avg_net (us)</th>
529<th>Max_net (us)</th>
530</tr>
531</thead>
532<tbody>
533<tr>
534<td>0</td>
535<td>99</td>
536<td>1000</td>
537<td>100000</td>
538<td>9</td>
539<td>24</td>
540<td>100000</td>
541<td>6</td>
542<td>11</td>
543</tr>
544<tr>
545<td>1</td>
546<td>99</td>
547<td>1500</td>
548<td>66817</td>
549<td>9</td>
550<td>21</td>
551<td>66812</td>
552<td>6</td>
553<td>12</td>
554</tr>
555<tr>
556<td>2</td>
557<td>99</td>
558<td>2000</td>
559<td>50208</td>
560<td>9</td>
561<td>23</td>
562<td>50106</td>
563<td>6</td>
564<td>18</td>
565</tr>
566<tr>
567<td>3</td>
568<td>99</td>
569<td>2500</td>
570<td>40083</td>
571<td>9</td>
572<td>14</td>
573<td>40082</td>
574<td>6</td>
575<td>10</td>
576</tr>
577</tbody>
578</table>
579<pre><code> Command: cyclictest with hdd stress:
580 # stress -d 4 --hdd-bytes 1M &amp;
581</code></pre>
582<table>
583<thead>
584<tr>
585<th>CPU #</th>
586<th>P</th>
587<th>I</th>
588<th>C_std</th>
589<th>Avg_std (us)</th>
590<th>Max_std (us)</th>
591<th>C_net</th>
592<th>Avg_net (us)</th>
593<th>Max_net (us)</th>
594</tr>
595</thead>
596<tbody>
597<tr>
598<td>0</td>
599<td>99</td>
600<td>1000</td>
601<td>100000</td>
602<td>14</td>
603<td>223</td>
604<td>100000</td>
605<td>11</td>
606<td>77</td>
607</tr>
608<tr>
609<td>1</td>
610<td>99</td>
611<td>1500</td>
612<td>66820</td>
613<td>14</td>
614<td>231</td>
615<td>66745</td>
616<td>11</td>
617<td>100</td>
618</tr>
619<tr>
620<td>2</td>
621<td>99</td>
622<td>2000</td>
623<td>50109</td>
624<td>14</td>
625<td>186</td>
626<td>50055</td>
627<td>11</td>
628<td>76</td>
629</tr>
630<tr>
631<td>3</td>
632<td>99</td>
633<td>2500</td>
634<td>40083</td>
635<td>14</td>
636<td>176</td>
637<td>40041</td>
638<td>11</td>
639<td>81</td>
640</tr>
641</tbody>
642</table>
643<pre><code> Command: cyclictest with vm stress:
644 # stress -m 4 --vm-bytes 4096 &amp;
645</code></pre>
646<table>
647<thead>
648<tr>
649<th>CPU #</th>
650<th>P</th>
651<th>I</th>
652<th>C_std</th>
653<th>Avg_std (us)</th>
654<th>Max_std (us)</th>
655<th>C_net</th>
656<th>Avg_net (us)</th>
657<th>Max_net (us)</th>
658</tr>
659</thead>
660<tbody>
661<tr>
662<td>0</td>
663<td>99</td>
664<td>1000</td>
665<td>100000</td>
666<td>5</td>
667<td>20</td>
668<td>100000</td>
669<td>3</td>
670<td>15</td>
671</tr>
672<tr>
673<td>1</td>
674<td>99</td>
675<td>1500</td>
676<td>66818</td>
677<td>6</td>
678<td>14</td>
679<td>66739</td>
680<td>3</td>
681<td>7</td>
682</tr>
683<tr>
684<td>2</td>
685<td>99</td>
686<td>1500</td>
687<td>50109</td>
688<td>6</td>
689<td>14</td>
690<td>50103</td>
691<td>3</td>
692<td>9</td>
693</tr>
694<tr>
695<td>3</td>
696<td>99</td>
697<td>1500</td>
698<td>40079</td>
699<td>6</td>
700<td>14</td>
701<td>40081</td>
702<td>3</td>
703<td>6</td>
704</tr>
705</tbody>
706</table>
707<pre><code> Command: cyclictest with full stress trial 1:
708 # stress -c 4 -i 4 -m 4 --vm-bytes 4096 -d 4 --hdd-bytes 4096 &amp;
709</code></pre>
710<table>
711<thead>
712<tr>
713<th>CPU #</th>
714<th>P</th>
715<th>I</th>
716<th>C_std</th>
717<th>Avg_std (us)</th>
718<th>Max_std (us)</th>
719<th>C_net</th>
720<th>Avg_net (us)</th>
721<th>Max_net (us)</th>
722</tr>
723</thead>
724<tbody>
725<tr>
726<td>0</td>
727<td>99</td>
728<td>1000</td>
729<td>99808</td>
730<td>7</td>
731<td>93</td>
732<td>99815</td>
733<td>6</td>
734<td>58</td>
735</tr>
736<tr>
737<td>1</td>
738<td>99</td>
739<td>1500</td>
740<td>66733</td>
741<td>9</td>
742<td>105</td>
743<td>66739</td>
744<td>6</td>
745<td>54</td>
746</tr>
747<tr>
748<td>2</td>
749<td>99</td>
750<td>2000</td>
751<td>50039</td>
752<td>9</td>
753<td>79</td>
754<td>50039</td>
755<td>7</td>
756<td>61</td>
757</tr>
758<tr>
759<td>3</td>
760<td>99</td>
761<td>2500</td>
762<td>40016</td>
763<td>10</td>
764<td>83</td>
765<td>40032</td>
766<td>6</td>
767<td>57</td>
768</tr>
769</tbody>
770</table>
771<pre><code> Command: cyclictest with full stress trial 2:
772 # stress -c 4 -i 4 -m 4 --vm-bytes 4096 -d 4 --hdd-bytes 1M &amp;
773</code></pre>
774<table>
775<thead>
776<tr>
777<th>CPU #</th>
778<th>P</th>
779<th>I</th>
780<th>C_std</th>
781<th>Avg_std (us)</th>
782<th>Max_std (us)</th>
783<th>C_net</th>
784<th>Avg_net (us)</th>
785<th>Max_net (us)</th>
786</tr>
787</thead>
788<tbody>
789<tr>
790<td>0</td>
791<td>99</td>
792<td>1000</td>
793<td>100000</td>
794<td>13</td>
795<td>201</td>
796<td>100000</td>
797<td>10</td>
798<td>87</td>
799</tr>
800<tr>
801<td>1</td>
802<td>99</td>
803<td>1500</td>
804<td>66646</td>
805<td>11</td>
806<td>186</td>
807<td>66685</td>
808<td>9</td>
809<td>85</td>
810</tr>
811<tr>
812<td>2</td>
813<td>99</td>
814<td>2000</td>
815<td>49969</td>
816<td>10</td>
817<td>195</td>
818<td>49998</td>
819<td>10</td>
820<td>70</td>
821</tr>
822<tr>
823<td>3</td>
824<td>99</td>
825<td>2500</td>
826<td>39960</td>
827<td>11</td>
828<td>112</td>
829<td>39992</td>
830<td>10</td>
831<td>90</td>
832</tr>
833</tbody>
834</table>
835<h2 id="3-usdpaa-usage">3. USDPAA Usage</h2>
836<p>The need for predictive and good performance for networking systems is
837critical. One way of achieving greater performance is for user-space to avoid
838interactions with the kernel. The kernel is responsible for hardware
839acceleration allocation and scheduling. By using frameworks such as
840USDPAA[NXP-SDK] and DPDK[DPDK] control over certain hardware can be given
841to user-space. USDPAA is specific to NXP/Freescale's QoriQ platforms, for more
842information please see their guide to USDPAA [NXP-SDK].</p>
843<p>In the guides below, an example of how to prepare a p2041rdb target with SRIO
844and ethernet through USDPAA is given.</p>
845<h3 id="31-packages">3.1 Packages</h3>
846<p>The networking profile supports and includes all packages necessary for
847software support of USDPAA. If another image is created the below listed
848packages are relevant to include (all available in meta-fsl-ppc and
849meta-freescale) in order to add support for USDPAA and some example applications.</p>
850<pre><code> * usdpaa
851 * usdpaa-apps
852 * fmc
853 * fmlib
854 * flib
855 * eth-config
856</code></pre>
857<h3 id="32-prepare-target">3.2 Prepare Target</h3>
858<p>The SRIO application needs us to boot with a RCW and board configuration that
859allows usage of the PCI extender port. The below examples are specific to
860p2041rdb, but similar steps can be taken for other targets where SRIO is not
861enabled by default.</p>
862<h4 id="321-reset-control-word-rcw">3.2.1 Reset Control Word (RCW)</h4>
863<p>The reset control word must configure the serial-deserializer (SERDES) bus for
864SRIO. This can be done by either a predefined binary/setting, or can be created
865in Code Warrior [CW].</p>
866<p>The RCW used in this example was given in the NXP/Freescale SRA User Guide of
867[NXP-SDK].</p>
868<p>To program the RCW to target from u-boot follow the steps below:</p>
869<pre><code>=&gt; tftp 1000000 &lt;path-2-file&gt;/RR_RS_0x02.bin
870
871=&gt; md 0xec000000
872ec000000: aa55aa55 010e0100 12600000 00000000 .U.U.....`......
873ec000010: 241c0000 00000000 248e40c0 c3c02000 $.......$.@... .
874ec000020: de800000 40000000 00000000 00000000 ....@...........
875ec000030: 00000000 d0030f07 00000000 00000000 ................
876ec000040: 00000000 00000000 091380c0 000009c4 ................
877ec000050: 09000010 00000000 091380c0 000009c4 ................
878ec000060: 09000014 00000000 091380c0 000009c4 ................
879ec000070: 09000018 81d00000 091380c0 000009c4 ................
880ec000080: 890b0050 00000002 091380c0 000009c4 ...P............
881ec000090: 890b0054 00000002 091380c0 000009c4 ...T............
882ec0000a0: 890b0058 00000002 091380c0 000009c4 ...X............
883ec0000b0: 890b005c 00000002 091380c0 000009c4 ...\............
884ec0000c0: 890b0090 00000002 091380c0 000009c4 ................
885ec0000d0: 890b0094 00000002 091380c0 000009c4 ................
886ec0000e0: 890b0098 00000002 091380c0 000009c4 ................
887ec0000f0: 890b009c 00000002 091380c0 000009c4 ................
888
889=&gt; protect off 0xec000000 +$filesize
890Un-Protected 1 sectors
891=&gt; erase 0xec000000 +$filesize
892
893. done
894Erased 1 sectors
895
896=&gt; cp.b 1000000 0xec000000 $filesize
897Copy to Flash... 9done
898
899=&gt; protect on 0xec000000 +$filesize
900Protected 1 sectors
901
902=&gt; md 0xec000000
903ec000000: aa55aa55 010e0100 12600000 00000000 .U.U.....`......
904ec000010: 241c0000 00000000 088040c0 c3c02000 $.........@... .
905ec000020: de800000 40000000 00000000 00000000 ....@...........
906ec000030: 00000000 d0030f07 00000000 00000000 ................
907ec000040: 00000000 00000000 091380c0 000009c4 ................
908ec000050: 09000010 00000000 091380c0 000009c4 ................
909ec000060: 09000014 00000000 091380c0 000009c4 ................
910ec000070: 09000018 81d00000 091380c0 000009c4 ................
911ec000080: 890b0050 00000002 091380c0 000009c4 ...P............
912ec000090: 890b0054 00000002 091380c0 000009c4 ...T............
913ec0000a0: 890b0058 00000002 091380c0 000009c4 ...X............
914ec0000b0: 890b005c 00000002 091380c0 000009c4 ...\............
915ec0000c0: 890b0090 00000002 091380c0 000009c4 ................
916ec0000d0: 890b0094 00000002 091380c0 000009c4 ................
917ec0000e0: 890b0098 00000002 091380c0 000009c4 ................
918ec0000f0: 890b009c 00000002 091380c0 000009c4 ................
919=&gt;
920</code></pre>
921<p>In order to obtain specific hardware settings, some multiplexers need to be
922set. Descriptions of these can be obtained by typing <strong>cpld -h</strong> in u-boot,
923information about the peripherals are also available in the board's respective
924user guide. For SRIO on the p2041rdb target the following settings are
925necessary.</p>
926<pre><code>cpld_cmd lane_mux 6 0
927cpld_cmd lane_mux a 0
928cpld_cmd lane_mux c 0
929cpld_cmd lane_mux d 0
930</code></pre>
931<h3 id="33-device-trees">3.3 Device Trees</h3>
932<p>USDPAA enabled targets have very specific device trees, this is because instead
933of handing over hardware to the linux kernel, it is managed by the DPAA
934framework.</p>
935<p>The networking profile includes a custom device-tree
936(<machine>-usdpaa-enea.dtb), currently defined for p2041rdb. It gives one
937ethernet interface to the kernel while the rest belongs to DPAA.</p>
938<p>Available and tested device-trees for p2041rdb:</p>
939<ul>
940<li>uImage-p2041rdb-usdpaa-enea.dtb EL custom interface, gives one interface to the linux kernel and remaining to DPAA.</li>
941</ul>
942<h3 id="34-boot-parameters">3.4 Boot Parameters</h3>
943<p>USDPAA demands some custom boot arguments. If not given, or if given improperly
944the USDPAA applications will not be usable. The NXP/Freescale manual covers these arguments, however might be misdirecting since the documentation in several places are, rather than target agnostic, specific instructions that are only applicable to certain targets. If unsure, one can consult the benchmarking
945chapter of the NXP/Freescale SDK documentation that include more exact steps per
946tested targets.</p>
947<p>For our purposes of testing SRIO and the reflector application we only need the
948'usdpaa_mem' boot argument. If the reserved memory is too large, it will cause
949segmentation faults.</p>
950<pre><code> Table 4.1 'usdpaa_mem=?'
951</code></pre>
952<table>
953<thead>
954<tr>
955<th>TARGET</th>
956<th>usdpaa_mem</th>
957</tr>
958</thead>
959<tbody>
960<tr>
961<td>p2041rdb</td>
962<td>=&lt; 64M</td>
963</tr>
964</tbody>
965</table>
966<h3 id="35-boot-instructions-p2041rdb">3.5 Boot instructions P2041RDB</h3>
967<p>Below are instructions on how to boot a p2041rdb board with usdpaa enabled.</p>
968<h4 id="351-boot-over-nfs-server">3.5.1 Boot over NFS server</h4>
969<pre><code>tftp 1000000 uImage-p2041rdb.bin
970tftp c00000 uImage-p2041rdb-usdpaa-enea.dtb
971
972setenv bootargs root=/dev/nfs
973nfsroot=172.21.3.8:/unix/enea_linux_rootfs/&lt;folder-path&gt; rw ip=dhcp
974console=ttyS0,115200 memmap=16M$0xf7000000 mem=4080M max_addr=f6ffffff
975usdpaa_mem=64M
976
977bootm 1000000 - c00000
978</code></pre>
979<h4 id="352-ram-boot">3.5.2 RAM Boot</h4>
980<pre><code>tftp 1000000 uImage-p2041rdb.bin
981tftp 2000000 uImage-p2041rdb-usdpaa.dtb
982tftp 5000000 enea-image-networking-p2041rdb.ext2.gz.u-boot
983
984setenv bootargs root=/dev/ram rw console=ttyS0,115200 ramdisk_size=10000000
985log_buf_len=128K usdpaa_mem=64M
986
987bootm 0x1000000 0x5000000 0x2000000
988</code></pre>
989<h3 id="36-run-reflector">3.6 Run Reflector</h3>
990<p>Reflector is a demo application from NXP/Freescale that through ethernet
991recieves a package and sends it back to with switched source-destination IP
992addresses switched.</p>
993<p>In order to test reflector an ethernet cable must be connected between either
994two p2041rdb targets or between a work-PC and the p2041rdb target.</p>
995<p>Connect two targets by ethernet and do the following steps to test connection
996with usdpaa:</p>
997<ol>
998<li>Boot Board A with uImage-p2041rdb-uspdaa-enea.dtb</li>
999<li>Boot Board B with uImage-p2041rdb.dtb</li>
1000<li>On board A do the following:</li>
1001</ol>
1002<pre><code> # Configure what ethernet ports that should be used by dpaa
1003 $ vi config-p2041rdb.xml
1004 &lt;cfgdata&gt;
1005 &lt;config&gt;
1006 &lt;engine name=&quot;fm0&quot;&gt;
1007 &lt;port type=&quot;MAC&quot; number=&quot;2&quot;
1008policy=&quot;hash_ipsec_src_dst_spi_policy1&quot;/&gt;
1009 &lt;port type=&quot;MAC&quot; number=&quot;3&quot;
1010policy=&quot;hash_ipsec_src_dst_spi_policy2&quot;/&gt;
1011 &lt;port type=&quot;MAC&quot; number=&quot;4&quot;
1012policy=&quot;hash_ipsec_src_dst_spi_policy3&quot;/&gt;
1013 &lt;port type=&quot;MAC&quot; number=&quot;5&quot;
1014policy=&quot;hash_ipsec_src_dst_spi_policy4&quot;/&gt;
1015 &lt;/engine&gt;
1016 &lt;/config&gt;
1017 &lt;/cfgdata&gt;
1018
1019 $ fmc -c config-p2041rdb.xml -p /usr/etc/usdpaa_policy_hash_ipv4.xml -a
1020
1021 # start reflector and check the ethernet ports:
1022 $ reflector
1023 reflector&gt; ifconfig
1024</code></pre>
1025<ol start="4">
1026<li>On board B do the following:</li>
1027</ol>
1028<pre><code> # Configure the network interface to which you connected the ethernet
1029 cable, by choosing an ip- and MAC address.
1030 $ ifconfig -a
1031 $ ifconfig &lt;eth-x&gt; 192.168.0.10 netmask 255.255.255.0 up
1032
1033 # Configure to connect to the hw address
1034 $ arp -s 192.168.0.11 &lt;hw-address&gt; -i &lt;eth-x&gt;
1035
1036 # Ping Board A
1037 $ ping 192.168.0.11
1038</code></pre>
1039<h3 id="37-run-sra">3.7 Run SRA</h3>
1040<p>The user space drivers from NXP/Freescale support usage of SRIO from linux user
1041space. The SRA application is a demo application from NXP/Freescale that can
1042implement writing from one SRIO interface to another, avioding kernel
1043interaction by using DMA (direct memory access) memory management. More
1044information on the drivers can be found in the NXP/freescale SDK [NXP-SDK].</p>
1045<p>In the test run below two boards are prepared with SRIO interfaces. Both boards
1046are initialized with a memory setting that sets the different SRIO memory
1047spaces to different values. Then data from Board A's <em>Write-prepapration space</em>
1048is written to Board B's <em>Map space</em>.</p>
1049<pre><code> # start the srio application
1050 $ sra
1051
1052 # setup board B (receiver)
1053 sra&gt; sra -attr port1 win_attr 1 nwrite nread
1054
1055 # set local memory to data predefined in 0x100000
1056 sra&gt; sra -op port1 1 0 0 s 0x100000
1057
1058 # read to view what is written to port 1
1059 sra&gt; sra -op port1 1 0 0 p 0x100000
1060
1061
1062 # setup board A (transmitter)
1063 sra&gt; sra -attr port1 win_attr 1 nwrite nread
1064
1065 # set local memory and then read to confirm
1066 sra&gt; sra -op port1 1 0 0 s 0x100000
1067 sra&gt; sra -op port1 1 0 0 p 0x100000
1068
1069 # write what is in 'write preparing space' to outbound
1070 sra&gt; sra -op port1 1 0 0 w 0x100000
1071
1072 # confirm on board B that 'write preparation space' from board A is written in
1073 # 'map space'
1074 sra&gt; sra -op port1 1 0 0 p 0x100000
1075</code></pre>
1076<pre><code> Board A Board B
1077 +-------------------+ +-------------------+
1078 | Map space | +------&gt;| Map space |
1079 +-------------------+ | +-------------------+
1080 | Read data space | | | Read data space |
1081 +-------------------+ outbound | +-------------------+
1082 | write preparation |-----------+ | write preparation |
1083 | space | | space |
1084 +-------------------+ +-------------------+
1085 | reserved space | | reserved space |
1086 +-------------------+ +-------------------+
1087
1088Image 3.1 SRIO memory space
1089</code></pre>
1090<h3 id="38-throughput-using-usdpaa">3.8 Throughput using USDPAA</h3>
1091<p>To show some of the power of using the USDPAA framework a test of the
1092throughput over a 10G ethernet link was tested and the results are presented
1093below.</p>
1094<p>The tool used for performance measuring was Spirent Test Center which is used as
1095a packet generator along with the “Spirent Test Center” application, version
10964.33. The test targets are connected to the Spirent Test Center through 10G
1097Ethernet ports(XAUI-RISER card). On Enea Linux 6.0 a the USDPAA reflector was
1098used as a packet forwarding application. The resulting Throughput performance measured with spirent can be seen in image 3.2 below.</p>
1099<!--![alt text](./throughput_p2041rdb_networking.png =250x)-->
1100<img src="img/throughput_p2041rdb_networking.png" alt="Drawing" style="width: 500px;"/>
1101<pre><code>Image 3.2 Throughput on p2041rdb over a 10G ethernet port using USDPAA.
1102x-axis shows the frame size in bytes, and y-axis the aggregated throughput in
1103megabits per second.
1104</code></pre>
1105<hr>
1106<hr>
1107<p>[NXP-SDK] QorIQ SDK 1.9 Documentation -
1108https://freescale.sdlproducts.com/LiveContent/web/pub.xql?c=t&amp;action=home&amp;pub=QorIQ_SDK_1.9&amp;lang=en-US#addHistory=true&amp;filename=GUID-81837065-81AD-449B-8572-E96C3EED636F.xml&amp;docid=GUID-81837065-81AD-449B-8572-E96C3EED636F&amp;inner_id=&amp;tid=&amp;query=&amp;scope=&amp;resource=&amp;toc=false&amp;eventType=lcContent.loadHome</p>
1109<p>[KERN-PARA] Kernel Parameters, https://www.kernel.org/doc/Documentation/kernel-parameters.txt</p>
1110<p>[RCU] Paul McKenney, <em>Priority-Boosting RCU Read-Side Critical Section</em>, https://lwn.net/Articles/220677/</p>
1111<!--
1112[NOHZ] https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt
1113-->
1114<p>[HOTPLUG] https://www.kernel.org/doc/Documentation/cpu-hotplug.txt</p>
1115<p>[DPDK] http://dpdk.org/</p>
1116<p>[CYCLIC] https://rt.wiki.kernel.org/index.php/Cyclictest</p>
1117<p>[STRESS] http://linux.die.net/man/1/stresshttp://linux.die.net/man/1/stress</p>
1118<p>[CW] http://www.nxp.com/products/software-and-tools/software-development-tools/codewarrior-development-tools:CW_HOME</p>
1119
1120</div></body>
1121</html>
diff --git a/doc/networking-profile.md b/doc/networking-profile.md
new file mode 100644
index 0000000..00c5f40
--- /dev/null
+++ b/doc/networking-profile.md
@@ -0,0 +1,508 @@
1# ENEA LINUX NETWORKING PROFILE
2
3Technology trends show that Linux-based Operating Systems have increased their
4presence in the area of high-performance networking applications. While there
5are no standardized ways of programatically accessing hardware offload
6capabilities, several paradigms co-exist in Linux ecosystem to address this
7specific need (e.g. USDPAA, DPDK, ODP etc.) Networking Profile in Enea Linux is
8a framework for anyone attempting to implement high-performance networking
9applications on various hardware platforms. It aims to bring in place all
10necessary building blocks which facilitate efficient development of Linux-based
11solutions on top of network accelerated hardware platforms. As different
12hardware platforms have distinct data-path acceleration solutions, Networking
13Profile implementation is very dependent on underlying hardware capabilities.
14
15This document tries to describe the implementation details, changes, additions,
16kernel configurations and tunings Enea made in order to achieve highly
17optimized Linux distributions for networking applications.
18
19The following paragraphs focus on Enea Linux Networking Profile on DPAA-based
20QorIQ platforms, illustrating the implementation and changes on NXP P2041rdb
21target.
22
23 Table of Content
24 -------------------------------------------
25 1. Supported Targets
26 2. Real-Time Performance
27 ------- 2.1 Kernel Modifications
28 ------- 2.2 CPU-Isolation with partrt
29 ------- 2.3 Latency Benchmarks
30 3. USDPAA Usage
31 ------- 3.1 Packages
32 ------- 3.2 Prepare Target
33 ------- 3.3 Device Trees
34 ------- 3.4 Boot Parameters
35 ------- 3.5 Boot Instructions P2041RDB
36 ------- 3.6 Run Reflector
37 ------- 3.7 Run SRA
38 ------- 3.8 Throughput using USDPAA
39
40## 1. Supported Targets
41Enea Linux Networking Profile has initially been tested on p2041rdb.
42<!--
43 Table 1.1 Functionally verified targets
44| Target | Reflector App | SRIO | SRIO RCW available?
45| --- | --- | --- | --- |
46| p2041rdb | OK | Yes | Yes*
47
48\* RCW that supports SRIO has to be created in code warrior or copied from the
49 SRA User Guide [NXP-SDK], see section 3.1.
50-->
51## 2. Improving Real-Time Performance
52<!--
53FIXME/WIP! need to add nohz and test.
54-->
55
56### 2.1 Kernel Modifications
57When modifying a kernel for high-performance and low-latency applications there
58are several aspects to take into consideration. In the [Enea Linux Real-Time
59Guide](http://linuxrealtime.org/index.php/Main_Page) a thorough investigation
60and explanation of how to optimize Linux for low latency is given. Below is a
61short description of kernel features added specifically to Enea Linux
62Networking Profile in order to enhance real-time performance.
63
64 Table 2.1 Added kernel features and their properties.
65| Change | Reason
66| --- | --- |
67| RCU priority boosting -> cfg/rcu_boost.cfg | Give low priority readers a higher priority to keep them from blocking tasks of higher prority. [RCU] |
68| Offload RCU callback Processing -> cfg/rcu_nocb | To reduce OS jitter, enable offloading of RCU callback processing to kernel threads. The rcu_nocbs boot parameter is used to define the set of CPUs to be offloaded. |
69| Hotplug CPU -> cfg/hotplug_cpu.cfg | Allows CPUs to be added to/removed from a live kernel. [HOTPLUG] |
70<!-- Adaptive-ticks CPU - cfg/nohz.cfg Avoid scheduling clock interruptd for CPUs running a single task. [NOHZ] -->
71
72#### 2.1.1 Boot Parameters
73<!--
74**TBD/FIXME:** Boot parameters for nohz_full and isolcpus.
75-->
76**From [KERN-PARA]:**
77<!--
78nohz= [KNL] Boottime enable/disable dynamic ticks
79 Valid arguments: on, off
80 Default: on
81-->
82```
83
84rcu_nocbs= [KNL]
85 In kernels built with CONFIG_RCU_NOCB_CPU=y, set
86 the specified list of CPUs to be no-callback CPUs.
87 Invocation of these CPUs' RCU callbacks will
88 be offloaded to "rcuox/N" kthreads created for
89 that purpose, where "x" is "b" for RCU-bh, "p"
90 for RCU-preempt, and "s" for RCU-sched, and "N"
91 is the CPU number. This reduces OS jitter on the
92 offloaded CPUs, which can be useful for HPC and
93 real-time workloads. It can also improve energy
94 efficiency for asymmetric multiprocessors.
95
96isolcpus= [KNL,SMP] Isolate CPUs from the general scheduler.
97 Format:
98 <cpu number>,...,<cpu number>
99 or
100 <cpu number>-<cpu number>
101 (must be a positive range in ascending order)
102 or a mixture
103 <cpu number>,...,<cpu number>-<cpu number>
104
105 This option can be used to specify one or more CPUs
106 to isolate from the general SMP balancing and scheduling
107 algorithms. You can move a process onto or off an
108 "isolated" CPU via the CPU affinity syscalls or cpuset.
109 <cpu number> begins at 0 and the maximum value is
110 "number of CPUs in system - 1".
111
112 This option is the preferred way to isolate CPUs. The
113 alternative -- manually setting the CPU mask of all
114 tasks in the system -- can cause problems and
115 suboptimal load balancer performance.
116```
117
118### 2.2 Cpu-isolation with partrt
119A tool called partrt is included in the networking profile to divide an SMP
120Linux system into partitions. A description of the tool can be found in the
121[Linux Real-Time
122Guide](http://linuxrealtime.org/index.php/Improving_the_Real-Time_Properties#The_CPU_Partitioning_Tool_-_partrt).
123
124### 2.3 Latency Benchmarks
125The cyclictest suite [CYCLIC] is a measurement of system latency used in many
126projects. As a comparison the measurement was applied to Enea Linux 6.0
127Standard, and to Enea Linux Networking profile to investigate the impact of the
128changes to the system.
129
130Below *cyclictest* is tested on the two different systems, average and
131maximum latency are presented in the tables below, first the test on the
132standard profile and after the results on the networking profile are shown. It
133is also combined with *stress* [STRESS] to show system performance under
134different type of loads.
135
136<!--
137FIXME!
138 **Enea Linux 6.0 Standard System Info**
139 - Kernel size:
140 - Root-fs size:
141
142**Enea Linux 6.0 Networking Profile System Info**
143 - Kernel size:
144 - Root-fs size:
145-->
146
147 Command: Cyclictest with no stress
148
149| CPU \# | P | I | C_std | Avg_std (us) | Max_std (us)| C_net | Avg_net (us)| Max_net (us)
150| --- | --- | --- | --- | --- | --- | --- | --- | --- |
151| 0 | 99 | 1000 | 100000 | 9 | 24 | 100000 | 6 | 11 |
152| 1 | 99 | 1500 | 66817 | 9 | 21 | 66812 | 6 | 12 |
153| 2 | 99 | 2000 | 50208 | 9 | 23 | 50106 | 6 | 18 |
154| 3 | 99 | 2500 | 40083 | 9 | 14 | 40082 | 6 | 10 |
155
156 Command: cyclictest with hdd stress:
157 # stress -d 4 --hdd-bytes 1M &
158
159| CPU \# | P | I | C_std | Avg_std (us) | Max_std (us)| C_net | Avg_net (us)| Max_net (us)
160| --- | --- | --- | --- | --- | --- | --- | --- | --- |
161| 0 | 99 | 1000 | 100000 | 14 | 223 | 100000 | 11 | 77 |
162| 1 | 99 | 1500 | 66820 | 14 | 231 | 66745 | 11 | 100 |
163| 2 | 99 | 2000 | 50109 | 14 | 186 | 50055 | 11 | 76 |
164| 3 | 99 | 2500 | 40083 | 14 | 176 | 40041 | 11 | 81 |
165
166 Command: cyclictest with vm stress:
167 # stress -m 4 --vm-bytes 4096 &
168
169| CPU \# | P | I | C_std | Avg_std (us) | Max_std (us)| C_net | Avg_net (us)| Max_net (us)
170| --- | --- | --- | --- | --- | --- | --- | --- | --- |
171| 0 | 99 | 1000 | 100000 | 5 | 20 | 100000 | 3 | 15 |
172| 1 | 99 | 1500 | 66818 | 6 | 14 | 66739 | 3 | 7 |
173| 2 | 99 | 1500 | 50109 | 6 | 14 | 50103 | 3 | 9 |
174| 3 | 99 | 1500 | 40079 | 6 | 14 | 40081 | 3 | 6 |
175
176 Command: cyclictest with full stress trial 1:
177 # stress -c 4 -i 4 -m 4 --vm-bytes 4096 -d 4 --hdd-bytes 4096 &
178
179| CPU \# | P | I | C_std | Avg_std (us) | Max_std (us)| C_net | Avg_net (us)| Max_net (us)
180| --- | --- | --- | --- | --- | --- | --- | --- | --- |
181| 0 | 99 | 1000 | 99808 | 7 | 93 | 99815 | 6 | 58 |
182| 1 | 99 | 1500 | 66733 | 9 | 105 | 66739 | 6 | 54 |
183| 2 | 99 | 2000 | 50039 | 9 | 79 | 50039 | 7 | 61 |
184| 3 | 99 | 2500 | 40016 | 10 | 83 | 40032 | 6 | 57 |
185
186 Command: cyclictest with full stress trial 2:
187 # stress -c 4 -i 4 -m 4 --vm-bytes 4096 -d 4 --hdd-bytes 1M &
188
189| CPU \# | P | I | C_std | Avg_std (us) | Max_std (us)| C_net | Avg_net (us)| Max_net (us)
190| --- | --- | --- | --- | --- | --- | --- | --- | --- |
191| 0 | 99 | 1000 | 100000 | 13 | 201 | 100000 | 10 | 87 |
192| 1 | 99 | 1500 | 66646 | 11 | 186 | 66685 | 9 | 85 |
193| 2 | 99 | 2000 | 49969 | 10 | 195 | 49998 | 10 | 70 |
194| 3 | 99 | 2500 | 39960 | 11 | 112 | 39992 | 10 | 90 |
195
196## 3. USDPAA Usage
197The need for predictive and good performance for networking systems is
198critical. One way of achieving greater performance is for user-space to avoid
199interactions with the kernel. The kernel is responsible for hardware
200acceleration allocation and scheduling. By using frameworks such as
201USDPAA[NXP-SDK] and DPDK[DPDK] control over certain hardware can be given
202to user-space. USDPAA is specific to NXP/Freescale's QoriQ platforms, for more
203information please see their guide to USDPAA [NXP-SDK].
204
205In the guides below, an example of how to prepare a p2041rdb target with SRIO
206and ethernet through USDPAA is given.
207
208
209### 3.1 Packages
210The networking profile supports and includes all packages necessary for
211software support of USDPAA. If another image is created the below listed
212packages are relevant to include (all available in meta-fsl-ppc and
213meta-freescale) in order to add support for USDPAA and some example applications.
214
215 * usdpaa
216 * usdpaa-apps
217 * fmc
218 * fmlib
219 * flib
220 * eth-config
221
222### 3.2 Prepare Target
223The SRIO application needs us to boot with a RCW and board configuration that
224allows usage of the PCI extender port. The below examples are specific to
225p2041rdb, but similar steps can be taken for other targets where SRIO is not
226enabled by default.
227
228#### 3.2.1 Reset Control Word (RCW)
229The reset control word must configure the serial-deserializer (SERDES) bus for
230SRIO. This can be done by either a predefined binary/setting, or can be created
231in Code Warrior [CW].
232
233The RCW used in this example was given in the NXP/Freescale SRA User Guide of
234[NXP-SDK].
235
236To program the RCW to target from u-boot follow the steps below:
237
238```
239=> tftp 1000000 <path-2-file>/RR_RS_0x02.bin
240
241=> md 0xec000000
242ec000000: aa55aa55 010e0100 12600000 00000000 .U.U.....`......
243ec000010: 241c0000 00000000 248e40c0 c3c02000 $.......$.@... .
244ec000020: de800000 40000000 00000000 00000000 ....@...........
245ec000030: 00000000 d0030f07 00000000 00000000 ................
246ec000040: 00000000 00000000 091380c0 000009c4 ................
247ec000050: 09000010 00000000 091380c0 000009c4 ................
248ec000060: 09000014 00000000 091380c0 000009c4 ................
249ec000070: 09000018 81d00000 091380c0 000009c4 ................
250ec000080: 890b0050 00000002 091380c0 000009c4 ...P............
251ec000090: 890b0054 00000002 091380c0 000009c4 ...T............
252ec0000a0: 890b0058 00000002 091380c0 000009c4 ...X............
253ec0000b0: 890b005c 00000002 091380c0 000009c4 ...\............
254ec0000c0: 890b0090 00000002 091380c0 000009c4 ................
255ec0000d0: 890b0094 00000002 091380c0 000009c4 ................
256ec0000e0: 890b0098 00000002 091380c0 000009c4 ................
257ec0000f0: 890b009c 00000002 091380c0 000009c4 ................
258
259=> protect off 0xec000000 +$filesize
260Un-Protected 1 sectors
261=> erase 0xec000000 +$filesize
262
263. done
264Erased 1 sectors
265
266=> cp.b 1000000 0xec000000 $filesize
267Copy to Flash... 9done
268
269=> protect on 0xec000000 +$filesize
270Protected 1 sectors
271
272=> md 0xec000000
273ec000000: aa55aa55 010e0100 12600000 00000000 .U.U.....`......
274ec000010: 241c0000 00000000 088040c0 c3c02000 $.........@... .
275ec000020: de800000 40000000 00000000 00000000 ....@...........
276ec000030: 00000000 d0030f07 00000000 00000000 ................
277ec000040: 00000000 00000000 091380c0 000009c4 ................
278ec000050: 09000010 00000000 091380c0 000009c4 ................
279ec000060: 09000014 00000000 091380c0 000009c4 ................
280ec000070: 09000018 81d00000 091380c0 000009c4 ................
281ec000080: 890b0050 00000002 091380c0 000009c4 ...P............
282ec000090: 890b0054 00000002 091380c0 000009c4 ...T............
283ec0000a0: 890b0058 00000002 091380c0 000009c4 ...X............
284ec0000b0: 890b005c 00000002 091380c0 000009c4 ...\............
285ec0000c0: 890b0090 00000002 091380c0 000009c4 ................
286ec0000d0: 890b0094 00000002 091380c0 000009c4 ................
287ec0000e0: 890b0098 00000002 091380c0 000009c4 ................
288ec0000f0: 890b009c 00000002 091380c0 000009c4 ................
289=>
290```
291
292In order to obtain specific hardware settings, some multiplexers need to be
293set. Descriptions of these can be obtained by typing **cpld -h** in u-boot,
294information about the peripherals are also available in the board's respective
295user guide. For SRIO on the p2041rdb target the following settings are
296necessary.
297
298```
299cpld_cmd lane_mux 6 0
300cpld_cmd lane_mux a 0
301cpld_cmd lane_mux c 0
302cpld_cmd lane_mux d 0
303```
304### 3.3 Device Trees
305USDPAA enabled targets have very specific device trees, this is because instead
306of handing over hardware to the linux kernel, it is managed by the DPAA
307framework.
308
309The networking profile includes a custom device-tree
310(<machine>-usdpaa-enea.dtb), currently defined for p2041rdb. It gives one
311ethernet interface to the kernel while the rest belongs to DPAA.
312
313Available and tested device-trees for p2041rdb:
314
315* uImage-p2041rdb-usdpaa-enea.dtb EL custom interface, gives one interface to the linux kernel and remaining to DPAA.
316
317### 3.4 Boot Parameters
318USDPAA demands some custom boot arguments. If not given, or if given improperly
319the USDPAA applications will not be usable. The NXP/Freescale manual covers these arguments, however might be misdirecting since the documentation in several places are, rather than target agnostic, specific instructions that are only applicable to certain targets. If unsure, one can consult the benchmarking
320chapter of the NXP/Freescale SDK documentation that include more exact steps per
321tested targets.
322
323For our purposes of testing SRIO and the reflector application we only need the
324'usdpaa_mem' boot argument. If the reserved memory is too large, it will cause
325segmentation faults.
326
327 Table 4.1 'usdpaa_mem=?'
328
329| TARGET | usdpaa_mem
330| --- | --- |
331| p2041rdb | =< 64M |
332
333### 3.5 Boot instructions P2041RDB
334Below are instructions on how to boot a p2041rdb board with usdpaa enabled.
335
336#### 3.5.1 Boot over NFS server
337```
338tftp 1000000 uImage-p2041rdb.bin
339tftp c00000 uImage-p2041rdb-usdpaa-enea.dtb
340
341setenv bootargs root=/dev/nfs
342nfsroot=172.21.3.8:/unix/enea_linux_rootfs/<folder-path> rw ip=dhcp
343console=ttyS0,115200 memmap=16M$0xf7000000 mem=4080M max_addr=f6ffffff
344usdpaa_mem=64M
345
346bootm 1000000 - c00000
347```
348#### 3.5.2 RAM Boot
349```
350tftp 1000000 uImage-p2041rdb.bin
351tftp 2000000 uImage-p2041rdb-usdpaa.dtb
352tftp 5000000 enea-image-networking-p2041rdb.ext2.gz.u-boot
353
354setenv bootargs root=/dev/ram rw console=ttyS0,115200 ramdisk_size=10000000
355log_buf_len=128K usdpaa_mem=64M
356
357bootm 0x1000000 0x5000000 0x2000000
358```
359
360### 3.6 Run Reflector
361Reflector is a demo application from NXP/Freescale that through ethernet
362recieves a package and sends it back to with switched source-destination IP
363addresses switched.
364
365In order to test reflector an ethernet cable must be connected between either
366two p2041rdb targets or between a work-PC and the p2041rdb target.
367
368Connect two targets by ethernet and do the following steps to test connection
369with usdpaa:
370
3711. Boot Board A with uImage-p2041rdb-uspdaa-enea.dtb
3722. Boot Board B with uImage-p2041rdb.dtb
3733. On board A do the following:
374```
375 # Configure what ethernet ports that should be used by dpaa
376 $ vi config-p2041rdb.xml
377 <cfgdata>
378 <config>
379 <engine name="fm0">
380 <port type="MAC" number="2"
381policy="hash_ipsec_src_dst_spi_policy1"/>
382 <port type="MAC" number="3"
383policy="hash_ipsec_src_dst_spi_policy2"/>
384 <port type="MAC" number="4"
385policy="hash_ipsec_src_dst_spi_policy3"/>
386 <port type="MAC" number="5"
387policy="hash_ipsec_src_dst_spi_policy4"/>
388 </engine>
389 </config>
390 </cfgdata>
391
392 $ fmc -c config-p2041rdb.xml -p /usr/etc/usdpaa_policy_hash_ipv4.xml -a
393
394 # start reflector and check the ethernet ports:
395 $ reflector
396 reflector> ifconfig
397```
3984. On board B do the following:
399```
400 # Configure the network interface to which you connected the ethernet
401 cable, by choosing an ip- and MAC address.
402 $ ifconfig -a
403 $ ifconfig <eth-x> 192.168.0.10 netmask 255.255.255.0 up
404
405 # Configure to connect to the hw address
406 $ arp -s 192.168.0.11 <hw-address> -i <eth-x>
407
408 # Ping Board A
409 $ ping 192.168.0.11
410```
411### 3.7 Run SRA
412The user space drivers from NXP/Freescale support usage of SRIO from linux user
413space. The SRA application is a demo application from NXP/Freescale that can
414implement writing from one SRIO interface to another, avioding kernel
415interaction by using DMA (direct memory access) memory management. More
416information on the drivers can be found in the NXP/freescale SDK [NXP-SDK].
417
418In the test run below two boards are prepared with SRIO interfaces. Both boards
419are initialized with a memory setting that sets the different SRIO memory
420spaces to different values. Then data from Board A's *Write-prepapration space*
421is written to Board B's *Map space*.
422
423```
424 # start the srio application
425 $ sra
426
427 # setup board B (receiver)
428 sra> sra -attr port1 win_attr 1 nwrite nread
429
430 # set local memory to data predefined in 0x100000
431 sra> sra -op port1 1 0 0 s 0x100000
432
433 # read to view what is written to port 1
434 sra> sra -op port1 1 0 0 p 0x100000
435
436
437 # setup board A (transmitter)
438 sra> sra -attr port1 win_attr 1 nwrite nread
439
440 # set local memory and then read to confirm
441 sra> sra -op port1 1 0 0 s 0x100000
442 sra> sra -op port1 1 0 0 p 0x100000
443
444 # write what is in 'write preparing space' to outbound
445 sra> sra -op port1 1 0 0 w 0x100000
446
447 # confirm on board B that 'write preparation space' from board A is written in
448 # 'map space'
449 sra> sra -op port1 1 0 0 p 0x100000
450```
451
452```
453 Board A Board B
454 +-------------------+ +-------------------+
455 | Map space | +------>| Map space |
456 +-------------------+ | +-------------------+
457 | Read data space | | | Read data space |
458 +-------------------+ outbound | +-------------------+
459 | write preparation |-----------+ | write preparation |
460 | space | | space |
461 +-------------------+ +-------------------+
462 | reserved space | | reserved space |
463 +-------------------+ +-------------------+
464
465Image 3.1 SRIO memory space
466```
467
468### 3.8 Throughput using USDPAA
469To show some of the power of using the USDPAA framework a test of the
470throughput over a 10G ethernet link was tested and the results are presented
471below.
472
473The tool used for performance measuring was Spirent Test Center which is used as
474a packet generator along with the “Spirent Test Center” application, version
4754.33. The test targets are connected to the Spirent Test Center through 10G
476Ethernet ports(XAUI-RISER card). On Enea Linux 6.0 a the USDPAA reflector was
477used as a packet forwarding application. The resulting Throughput performance measured with spirent can be seen in image 3.2 below.
478
479<!--![alt text](./throughput_p2041rdb_networking.png =250x)-->
480<img src="img/throughput_p2041rdb_networking.png" alt="Drawing" style="width: 500px;"/>
481
482 Image 3.2 Throughput on p2041rdb over a 10G ethernet port using USDPAA.
483 x-axis shows the frame size in bytes, and y-axis the aggregated throughput in
484 megabits per second.
485
486
487
488--------------
489--------------
490[NXP-SDK] QorIQ SDK 1.9 Documentation -
491https://freescale.sdlproducts.com/LiveContent/web/pub.xql?c=t&action=home&pub=QorIQ_SDK_1.9&lang=en-US#addHistory=true&filename=GUID-81837065-81AD-449B-8572-E96C3EED636F.xml&docid=GUID-81837065-81AD-449B-8572-E96C3EED636F&inner_id=&tid=&query=&scope=&resource=&toc=false&eventType=lcContent.loadHome
492
493[KERN-PARA] Kernel Parameters, https://www.kernel.org/doc/Documentation/kernel-parameters.txt
494
495[RCU] Paul McKenney, *Priority-Boosting RCU Read-Side Critical Section*, https://lwn.net/Articles/220677/
496<!--
497[NOHZ] https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt
498-->
499
500[HOTPLUG] https://www.kernel.org/doc/Documentation/cpu-hotplug.txt
501
502[DPDK] http://dpdk.org/
503
504[CYCLIC] https://rt.wiki.kernel.org/index.php/Cyclictest
505
506[STRESS] http://linux.die.net/man/1/stresshttp://linux.die.net/man/1/stress
507
508[CW] http://www.nxp.com/products/software-and-tools/software-development-tools/codewarrior-development-tools:CW_HOME