diff options
Diffstat (limited to 'doc/book-enea-nfv-access-guide/doc/hypervisor_virtualization.xml')
-rw-r--r-- | doc/book-enea-nfv-access-guide/doc/hypervisor_virtualization.xml | 741 |
1 files changed, 741 insertions, 0 deletions
diff --git a/doc/book-enea-nfv-access-guide/doc/hypervisor_virtualization.xml b/doc/book-enea-nfv-access-guide/doc/hypervisor_virtualization.xml new file mode 100644 index 0000000..f7f186c --- /dev/null +++ b/doc/book-enea-nfv-access-guide/doc/hypervisor_virtualization.xml | |||
@@ -0,0 +1,741 @@ | |||
1 | <?xml version="1.0" encoding="ISO-8859-1"?> | ||
2 | <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"> | ||
4 | <chapter id="hypervisor_virt"> | ||
5 | <title>Hypervisor Virtualization</title> | ||
6 | |||
7 | <para>The KVM, Kernel-based Virtual Machine, is a virtualization | ||
8 | infrastructure for the Linux kernel which turns it into a hypervisor. KVM | ||
9 | requires a processor with a hardware virtualization extension.</para> | ||
10 | |||
11 | <para>KVM uses QEMU, an open source machine emulator and virtualizer, to | ||
12 | virtualize a complete system. With KVM it is possible to run multiple guests | ||
13 | of a variety of operating systems, each with a complete set of virtualized | ||
14 | hardware.</para> | ||
15 | |||
16 | <section id="launch_virt_machine"> | ||
17 | <title>Launching a Virtual Machine</title> | ||
18 | |||
19 | <para>QEMU can make use of KVM when running a target architecture that is | ||
20 | the same as the host architecture. For instance, when running | ||
21 | qemu-system-x86_64 on an x86-64 compatible processor (containing | ||
22 | virtualization extensions Intel VT or AMD-V), you can take advantage of | ||
23 | the KVM acceleration, giving you benefit for your host and your guest | ||
24 | system.</para> | ||
25 | |||
26 | <para>Enea Linux includes an optimizied version of QEMU with KVM-only | ||
27 | support. To use KVM pass<command> --enable-kvm</command> to QEMU.</para> | ||
28 | |||
29 | <para>The following is an example of starting a guest:</para> | ||
30 | |||
31 | <programlisting>taskset -c 0,1 qemu-system-x86_64 \ | ||
32 | -cpu host -M q35 -smp cores=2,sockets=1 \ | ||
33 | -vcpu 0,affinity=0 -vcpu 1,affinity=1 \ | ||
34 | -enable-kvm -nographic \ | ||
35 | -kernel bzImage \ | ||
36 | -drive file=enea-image-virtualization-guest-qemux86-64.ext4,if=virtio,format=raw \ | ||
37 | -append 'root=/dev/vda console=ttyS0,115200' \ | ||
38 | -m 4096 \ | ||
39 | -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on \ | ||
40 | -numa node,memdev=mem -mem-prealloc</programlisting> | ||
41 | </section> | ||
42 | |||
43 | <section id="qemu_boot"> | ||
44 | <title>Main QEMU boot options</title> | ||
45 | |||
46 | <para>Below are detailed all the pertinent boot options for the QEMU | ||
47 | emulator:</para> | ||
48 | |||
49 | <itemizedlist> | ||
50 | <listitem> | ||
51 | <para>SMP - at least 2 cores should be enabled in order to isolate | ||
52 | application(s) running in virtual machine(s) on specific cores for | ||
53 | better performance.</para> | ||
54 | |||
55 | <programlisting>-smp cores=2,threads=1,sockets=1 \</programlisting> | ||
56 | </listitem> | ||
57 | |||
58 | <listitem> | ||
59 | <para>CPU affinity - associate virtual CPUs with physical CPUs and | ||
60 | optionally assign a default real time priority to the virtual CPU | ||
61 | process in the host kernel. This option allows you to start qemu vCPUs | ||
62 | on isolated physical CPUs.</para> | ||
63 | |||
64 | <programlisting>-vcpu 0,affinity=0 \</programlisting> | ||
65 | </listitem> | ||
66 | |||
67 | <listitem> | ||
68 | <para>Hugepages - KVM guests can be deployed with huge page memory | ||
69 | support in order to reduce memory consumption and improve performance, | ||
70 | by reducing CPU cache usage. By using huge pages for a KVM guest, less | ||
71 | memory is used for page tables and TLB (Translation Lookaside Buffer) | ||
72 | misses are reduced, thereby significantly increasing performance, | ||
73 | especially for memory-intensive situations.</para> | ||
74 | |||
75 | <programlisting>-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on \</programlisting> | ||
76 | </listitem> | ||
77 | |||
78 | <listitem> | ||
79 | <para>Memory preallocation - preallocate huge pages at startup time | ||
80 | can improve performance but it may affect the qemu boot time.</para> | ||
81 | |||
82 | <programlisting>-mem-prealloc \</programlisting> | ||
83 | </listitem> | ||
84 | |||
85 | <listitem> | ||
86 | <para>Enable realtime characteristics - run qemu with realtime | ||
87 | features. While that mildly implies that "-realtime" alone might do | ||
88 | something, it's just an identifier for options that are partially | ||
89 | realtime. If you're running in a realtime or low latency environment, | ||
90 | you don't want your pages to be swapped out and mlock does that, thus | ||
91 | mlock=on. If you want VM density, then you may want swappable VMs, | ||
92 | thus mlock=off.</para> | ||
93 | |||
94 | <programlisting>-realtime mlock=on \</programlisting> | ||
95 | </listitem> | ||
96 | </itemizedlist> | ||
97 | |||
98 | <para>If the hardware does not have an IOMMU (known as "Intel VT-d" on | ||
99 | Intel-based machines and "AMD I/O Virtualization Technology" on AMD-based | ||
100 | machines), it will not be possible to assign devices in KVM. | ||
101 | Virtualization Technology features (VT-d, VT-x, etc.) must be enabled from | ||
102 | BIOS on the host target before starting a virtual machine.</para> | ||
103 | </section> | ||
104 | |||
105 | <section id="net_in_guest"> | ||
106 | <title>Networking in guest</title> | ||
107 | |||
108 | <section id="vhost-user-support"> | ||
109 | <title>Using vhost-user support</title> | ||
110 | |||
111 | <para>The goal of vhost-user is to implement a Virtio transport, staying | ||
112 | as close as possible to the vhost paradigm of using shared memory, | ||
113 | ioeventfds and irqfds. A UNIX domain socket based mechanism allows the | ||
114 | set up of resources used by a number of Vrings shared between two | ||
115 | userspace processes, which will be placed in shared memory.</para> | ||
116 | |||
117 | <para>To run QEMU with the vhost-user backend, you have to provide the | ||
118 | named UNIX domain socket which needs to be already opened by the | ||
119 | backend:</para> | ||
120 | |||
121 | <programlisting>-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on \ | ||
122 | -chardev socket,id=char0,path=/var/run/openvswitch/vhost-user1 \ | ||
123 | -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ | ||
124 | -device virtio-net-pci,netdev=mynet1,mac=52:54:00:00:00:01 \</programlisting> | ||
125 | |||
126 | <para>The vHost User standard uses a client-server model. The server | ||
127 | creates and manages the vHost User sockets and the client connects to | ||
128 | the sockets created by the server. It is recommended to use QEMU as | ||
129 | server so the vhost-user client can be restarted without affecting the | ||
130 | server, otherwise if the server side dies all clients need to be | ||
131 | restarted.</para> | ||
132 | |||
133 | <para>Using vhost-user in QEMU as server will offer the flexibility to | ||
134 | stop and start the virtual machine with no impact on virtual switch from | ||
135 | the host (vhost-user-client).</para> | ||
136 | |||
137 | <programlisting>-chardev socket,id=char0,path=/var/run/openvswitch/vhost-user1,server \</programlisting> | ||
138 | </section> | ||
139 | |||
140 | <section id="tap-interface"> | ||
141 | <title>Using TAP Interfaces</title> | ||
142 | |||
143 | <para>QEMU can use TAP interfaces to provide full networking capability | ||
144 | for the guest OS:</para> | ||
145 | |||
146 | <programlisting>-netdev tap,id=net0,ifname=tap0,script=no,downscript=no \ | ||
147 | -device virtio-net-pci,netdev=net0,mac=22:EA:FB:A8:25:AE \</programlisting> | ||
148 | </section> | ||
149 | |||
150 | <section id="vfio-passthrough"> | ||
151 | <title>VFIO passthrough VF (SR-IOV) to guest</title> | ||
152 | |||
153 | <para>KVM hypervisor support for attaching PCI devices on the host | ||
154 | system to guests. PCI passthrough allows guests to have exclusive access | ||
155 | to PCI devices for a range of tasks. PCI passthrough allows PCI devices | ||
156 | to appear and behave as if they were physically attached to the guest | ||
157 | operating system.</para> | ||
158 | |||
159 | <para>Preparing an Intel system for PCI passthrough:</para> | ||
160 | |||
161 | <itemizedlist> | ||
162 | <listitem> | ||
163 | <para>Enable the Intel VT-d extensions in BIOS</para> | ||
164 | </listitem> | ||
165 | |||
166 | <listitem> | ||
167 | <para>Activate Intel VT-d in the kernel by using | ||
168 | <literal>intel_iommu=on</literal> as a kernel boot parameter</para> | ||
169 | </listitem> | ||
170 | |||
171 | <listitem> | ||
172 | <para>Allow unsafe interrupts in case the system doesn't support | ||
173 | interrupt remapping. This can be done using | ||
174 | <literal>vfio_iommu_type1.allow_unsafe_interrupts=1</literal> as a | ||
175 | boot kernel parameter.</para> | ||
176 | </listitem> | ||
177 | </itemizedlist> | ||
178 | |||
179 | <para>Create guest with direct passthrough via VFIO framework like | ||
180 | so:</para> | ||
181 | |||
182 | <programlisting>-device vfio-pci,host=0000:03:10.2 \</programlisting> | ||
183 | |||
184 | <para>On the host, one or more VirtualFunctions (VFs) must be created in | ||
185 | order to be allocated for a guest network to access, before starting | ||
186 | QEMU:</para> | ||
187 | |||
188 | <programlisting>$ echo 2 > /sys/class/net/eno3/device/sriov_numvfs | ||
189 | $ modprobe vfio_pci | ||
190 | $ dpdk-devbind.py --bind=vfio-pci 0000:03:10.2</programlisting> | ||
191 | </section> | ||
192 | |||
193 | <section id="multiqueue"> | ||
194 | <title>Multi-queue</title> | ||
195 | |||
196 | <section id="qemu-multiqueue-support"> | ||
197 | <title>QEMU multi queue support configuration</title> | ||
198 | |||
199 | <programlisting>-chardev socket,id=char0,path=/var/run/openvswitch/vhost-user1 \ | ||
200 | -netdev type=vhost-user,id=net0,chardev=char0,queues=2 \ | ||
201 | -device virtio-net-pci,netdev=net0,mac=22:EA:FB:A8:25:AE,mq=on,vectors=6 | ||
202 | where vectors is calculated as: 2 + 2 * queues number.</programlisting> | ||
203 | </section> | ||
204 | |||
205 | <section id="inside-guest"> | ||
206 | <title>Inside guest</title> | ||
207 | |||
208 | <para>Linux kernel virtio-net driver (one queue is enabled by | ||
209 | default):</para> | ||
210 | |||
211 | <programlisting>$ ethtool -L combined 2 eth0 | ||
212 | DPDK Virtio PMD | ||
213 | $ testpmd -c 0x7 -- -i --rxq=2 --txq=2 --nb-cores=2 ...</programlisting> | ||
214 | |||
215 | <para>For QEMU documentation please see: <ulink | ||
216 | url="https://qemu.weilnetz.de/doc/qemu-doc.html">https://qemu.weilnetz.de/doc/qemu-doc.html</ulink>.</para> | ||
217 | </section> | ||
218 | </section> | ||
219 | </section> | ||
220 | |||
221 | <section id="libvirt"> | ||
222 | <title>Libvirt</title> | ||
223 | |||
224 | <para>One way to manage guests in Enea NFV Access is by using | ||
225 | <literal>libvirt</literal>. Libvirt is used in conjunction with a daemon | ||
226 | (<literal>libvirtd</literal>) and a command line utility (virsh) to manage | ||
227 | virtualized environments.</para> | ||
228 | |||
229 | <para>The libvirt library is a hypervisor-independent virtualization API | ||
230 | and toolkit that is able to interact with the virtualization capabilities | ||
231 | of a range of operating systems. Libvirt provides a common, generic and | ||
232 | stable layer to securely manage domains on a node. As nodes may be | ||
233 | remotely located, libvirt provides all methods required to provision, | ||
234 | create, modify, monitor, control, migrate and stop the domains, within the | ||
235 | limits of hypervisor support for these operations.</para> | ||
236 | |||
237 | <para>The libvirt daemon runs on the Enea NFV Access host. All tools built | ||
238 | on libvirt API connect to the daemon to request the desired operation, and | ||
239 | to collect information about the configuration and resources of the host | ||
240 | system and guests. <literal>virsh</literal> is a command line interface | ||
241 | tool for managing guests and the hypervisor. The virsh tool is built on | ||
242 | the libvirt management API.</para> | ||
243 | |||
244 | <para><emphasis role="bold">Major functionality provided by | ||
245 | libvirt</emphasis></para> | ||
246 | |||
247 | <para>The following is a summary from the libvirt <ulink | ||
248 | url="http://wiki.libvirt.org/page/FAQ#What_is_libvirt.3F">home | ||
249 | page</ulink> describing the major libvirt features:</para> | ||
250 | |||
251 | <itemizedlist> | ||
252 | <listitem> | ||
253 | <para><emphasis role="bold">VM management:</emphasis> Various domain | ||
254 | lifecycle operations such as start, stop, pause, save, restore, and | ||
255 | migrate. Hotplug operations for many device types including disk and | ||
256 | network interfaces, memory, and cpus.</para> | ||
257 | </listitem> | ||
258 | |||
259 | <listitem> | ||
260 | <para><emphasis role="bold">Remote machine support:</emphasis> All | ||
261 | libvirt functionality is accessible on any machine running the libvirt | ||
262 | daemon, including remote machines. A variety of network transports are | ||
263 | supported for connecting remotely, with the simplest being | ||
264 | <literal>SSH</literal>, which requires no extra explicit | ||
265 | configuration. For more information, see: <ulink | ||
266 | url="http://libvirt.org/remote.html">http://libvirt.org/remote.html</ulink>.</para> | ||
267 | </listitem> | ||
268 | |||
269 | <listitem> | ||
270 | <para><emphasis role="bold">Network interface management:</emphasis> | ||
271 | Any host running the libvirt daemon can be used to manage physical and | ||
272 | logical network interfaces. Enumerate existing interfaces, as well as | ||
273 | configure (and create) interfaces, bridges, vlans, and bond devices. | ||
274 | For more details see: <ulink | ||
275 | url="https://fedorahosted.org/netcf/">https://fedorahosted.org/netcf/</ulink>.</para> | ||
276 | </listitem> | ||
277 | |||
278 | <listitem> | ||
279 | <para><emphasis role="bold">Virtual NAT and Route based | ||
280 | networking:</emphasis> Any host running the libvirt daemon can manage | ||
281 | and create virtual networks. Libvirt virtual networks use firewall | ||
282 | rules to act as a router, providing VMs transparent access to the host | ||
283 | machines network. For more information, see: <ulink | ||
284 | url="http://libvirt.org/archnetwork.html">http://libvirt.org/archnetwork.html</ulink>.</para> | ||
285 | </listitem> | ||
286 | |||
287 | <listitem> | ||
288 | <para><emphasis role="bold">Storage management:</emphasis> Any host | ||
289 | running the libvirt daemon can be used to manage various types of | ||
290 | storage: create file images of various formats (raw, qcow2, etc.), | ||
291 | mount NFS shares, enumerate existing LVM volume groups, create new LVM | ||
292 | volume groups and logical volumes, partition raw disk devices, mount | ||
293 | iSCSI shares, and much more. For more details, see: <ulink | ||
294 | url="http://libvirt.org/storage.html">http://libvirt.org/storage.html</ulink>.</para> | ||
295 | </listitem> | ||
296 | |||
297 | <listitem> | ||
298 | <para><emphasis role="bold">Libvirt Configuration:</emphasis> A | ||
299 | properly running libvirt requires that the following elements be in | ||
300 | place:</para> | ||
301 | |||
302 | <itemizedlist> | ||
303 | <listitem> | ||
304 | <para>Configuration files, located in the directory | ||
305 | <literal>/etc/libvirt</literal>. They include the daemon's | ||
306 | configuration file <literal>libvirtd.conf</literal>, and | ||
307 | hypervisor-specific configuration files, like | ||
308 | <literal>qemu.conf</literal> for the QEMU.</para> | ||
309 | </listitem> | ||
310 | |||
311 | <listitem> | ||
312 | <para>A running libvirtd daemon. The daemon is started | ||
313 | automatically in Enea NFV Access host.</para> | ||
314 | </listitem> | ||
315 | |||
316 | <listitem> | ||
317 | <para>Configuration files for the libvirt domains, or guests, to | ||
318 | be managed by the KVM host. The specifics for guest domains shall | ||
319 | be defined in an XML file of a format specified at <ulink | ||
320 | url="http://libvirt.org/formatdomain.html">http://libvirt.org/formatdomain.html</ulink>. | ||
321 | XML formats for other structures are specified at <ulink type="" | ||
322 | url="http://libvirt.org/format.html">http://libvirt.org/format.html</ulink>.</para> | ||
323 | </listitem> | ||
324 | </itemizedlist> | ||
325 | </listitem> | ||
326 | </itemizedlist> | ||
327 | |||
328 | <section id="boot-kvm-guest"> | ||
329 | <title>Booting a KVM Guest</title> | ||
330 | |||
331 | <para>There are several ways to boot a KVM guest. Here we describe how | ||
332 | to boot using a raw image. A direct kernel boot can be performed by | ||
333 | transferring the guest kernel and the file system files to the host and | ||
334 | specifying a <literal><kernel></literal> and an | ||
335 | <literal><initrd></literal> element inside the | ||
336 | <literal><os></literal> element of the guest XML file, as in the | ||
337 | following example:</para> | ||
338 | |||
339 | <programlisting><os> | ||
340 | <kernel>bzImage</kernel> | ||
341 | </os> | ||
342 | <devices> | ||
343 | <disk type='file' device='disk'> | ||
344 | <driver name='qemu' type='raw' cache='none'/> | ||
345 | <source file='enea-image-virtualization-guest-qemux86-64.ext4'/> | ||
346 | <target dev='vda' bus='virtio'/> | ||
347 | </disk> | ||
348 | </devices></programlisting> | ||
349 | </section> | ||
350 | |||
351 | <section id="start-guest"> | ||
352 | <title>Starting a Guest</title> | ||
353 | |||
354 | <para>Command <command>virsh create</command> starts a guest:</para> | ||
355 | |||
356 | <programlisting>virsh create example-guest-x86.xml</programlisting> | ||
357 | |||
358 | <para>If further configurations are needed before the guest is reachable | ||
359 | through <literal>ssh</literal>, a console can be started using command | ||
360 | <command>virsh console</command>. The example below shows how to start a | ||
361 | console where kvm-example-guest is the name of the guest defined in the | ||
362 | guest XML file:</para> | ||
363 | |||
364 | <programlisting>virsh console kvm-example-guest</programlisting> | ||
365 | |||
366 | <para>This requires that the guest domain has a console configured in | ||
367 | the guest XML file:</para> | ||
368 | |||
369 | <programlisting><os> | ||
370 | <cmdline>console=ttyS0,115200</cmdline> | ||
371 | </os> | ||
372 | <devices> | ||
373 | <console type='pty'> | ||
374 | <target type='serial' port='0'/> | ||
375 | </console> | ||
376 | </devices></programlisting> | ||
377 | </section> | ||
378 | |||
379 | <section id="isolation"> | ||
380 | <title>Isolation</title> | ||
381 | |||
382 | <para>It may be desirable to isolate execution in a guest, to a specific | ||
383 | guest core. It might also be desirable to run a guest on a specific host | ||
384 | core.</para> | ||
385 | |||
386 | <para>To pin the virtual CPUs of the guest to specific cores, configure | ||
387 | the <literal><cputune></literal> contents as follows:</para> | ||
388 | |||
389 | <orderedlist> | ||
390 | <listitem> | ||
391 | <para>First explicitly state on which host core each guest core | ||
392 | shall run, by mapping <literal>vcpu</literal> to | ||
393 | <literal>cpuset</literal> in the <literal><vcpupin></literal> | ||
394 | tag.</para> | ||
395 | </listitem> | ||
396 | |||
397 | <listitem> | ||
398 | <para>In the <literal><cputune></literal> tag it is further | ||
399 | possible to specify on which CPU the emulator shall run by adding | ||
400 | the cpuset to the <literal><emulatorpin></literal> tag.</para> | ||
401 | |||
402 | <programlisting><vcpu placement='static'>2</vcpu> | ||
403 | <cputune> | ||
404 | <vcpupin vcpu='0' cpuset='2'/> | ||
405 | <vcpupin vcpu='1' cpuset='3'/> | ||
406 | <emulatorpin cpuset="2"/> | ||
407 | </cputune></programlisting> | ||
408 | |||
409 | <para><literal>libvirt</literal> will group all threads belonging to | ||
410 | a qemu instance into cgroups that will be created for that purpose. | ||
411 | It is possible to supply a base name for those cgroups using the | ||
412 | <literal><resource></literal> tag:</para> | ||
413 | |||
414 | <programlisting><resource> | ||
415 | <partition>/rt</partition> | ||
416 | </resource></programlisting> | ||
417 | </listitem> | ||
418 | </orderedlist> | ||
419 | </section> | ||
420 | |||
421 | <section id="network-libvirt"> | ||
422 | <title>Networking using libvirt</title> | ||
423 | |||
424 | <para>Command <command>virsh net-create</command> starts a network. If | ||
425 | any networks are listed in the guest XML file, those networks must be | ||
426 | started before the guest is started. As an example, if the network is | ||
427 | defined in a file named example-net.xml, it is started as | ||
428 | follows:</para> | ||
429 | |||
430 | <programlisting>virsh net-create example-net.xml | ||
431 | <network> | ||
432 | <name>sriov</name> | ||
433 | <forward mode='hostdev' managed='yes'> | ||
434 | <pf dev='eno3'/> | ||
435 | </forward> | ||
436 | </network></programlisting> | ||
437 | |||
438 | <para><literal>libvirt</literal> is a virtualization API that supports | ||
439 | virtual network creation. These networks can be connected to guests and | ||
440 | containers by referencing the network in the guest XML file. It is | ||
441 | possible to have a virtual network persistently running on the host by | ||
442 | starting the network with command <command>virsh net-define</command> | ||
443 | instead of the previously mentioned <command>virsh | ||
444 | net-create</command>.</para> | ||
445 | |||
446 | <para>An example for the sample network defined in | ||
447 | <literal>meta-vt/recipes-example/virt-example/files/example-net.xml</literal>:</para> | ||
448 | |||
449 | <programlisting>virsh net-define example-net.xml</programlisting> | ||
450 | |||
451 | <para>Command <command>virsh net-autostart</command> enables a | ||
452 | persistent network to start automatically when the libvirt daemon | ||
453 | starts:</para> | ||
454 | |||
455 | <programlisting>virsh net-autostart example-net</programlisting> | ||
456 | |||
457 | <para>Guest configuration file (xml) must be updated to access newly | ||
458 | created network like so:</para> | ||
459 | |||
460 | <programlisting> <interface type='network'> | ||
461 | <source network='sriov'/> | ||
462 | </interface></programlisting> | ||
463 | |||
464 | <para>The following presented here are a few modes of network access | ||
465 | from guest using <command>virsh</command>:</para> | ||
466 | |||
467 | <itemizedlist> | ||
468 | <listitem> | ||
469 | <para><emphasis role="bold">vhost-user interface</emphasis></para> | ||
470 | |||
471 | <para>See the Open vSwitch chapter on how to create vhost-user | ||
472 | interface using Open vSwitch. Currently there is no Open vSwitch | ||
473 | support for networks that are managed by libvirt (e.g. NAT). As of | ||
474 | now, only bridged networks are supported (those where the user has | ||
475 | to manually create the bridge).</para> | ||
476 | |||
477 | <programlisting> <interface type='vhostuser'> | ||
478 | <mac address='00:00:00:00:00:01'/> | ||
479 | <source type='unix' path='/var/run/openvswitch/vhost-user1' mode='client'/> | ||
480 | <model type='virtio'/> | ||
481 | <driver queues='1'> | ||
482 | <host mrg_rxbuf='off'/> | ||
483 | </driver> | ||
484 | </interface></programlisting> | ||
485 | </listitem> | ||
486 | |||
487 | <listitem> | ||
488 | <para><emphasis role="bold">PCI passthrough | ||
489 | (SR-IOV)</emphasis></para> | ||
490 | |||
491 | <para>KVM hypervisor support for attaching PCI devices on the host | ||
492 | system to guests. PCI passthrough allows guests to have exclusive | ||
493 | access to PCI devices for a range of tasks. PCI passthrough allows | ||
494 | PCI devices to appear and behave as if they were physically attached | ||
495 | to the guest operating system.</para> | ||
496 | |||
497 | <para>Preparing an Intel system for PCI passthrough is done like | ||
498 | so:</para> | ||
499 | |||
500 | <itemizedlist> | ||
501 | <listitem> | ||
502 | <para>Enable the Intel VT-d extensions in BIOS</para> | ||
503 | </listitem> | ||
504 | |||
505 | <listitem> | ||
506 | <para>Activate Intel VT-d in the kernel by using | ||
507 | <literal>intel_iommu=on</literal> as a kernel boot | ||
508 | parameter</para> | ||
509 | </listitem> | ||
510 | |||
511 | <listitem> | ||
512 | <para>Allow unsafe interrupts in case the system doesn't support | ||
513 | interrupt remapping. This can be done using | ||
514 | <literal>vfio_iommu_type1.allow_unsafe_interrupts=1</literal> as | ||
515 | a boot kernel parameter.</para> | ||
516 | </listitem> | ||
517 | </itemizedlist> | ||
518 | |||
519 | <para>VFs must be created on the host before starting the | ||
520 | guest:</para> | ||
521 | |||
522 | <programlisting>$ echo 2 > /sys/class/net/eno3/device/sriov_numvfs | ||
523 | $ modprobe vfio_pci | ||
524 | $ dpdk-devbind.py --bind=vfio-pci 0000:03:10.0 | ||
525 | <interface type='hostdev' managed='yes'> | ||
526 | <source> | ||
527 | <address type='pci' domain='0x0' bus='0x03' slot='0x10' function='0x0'/> | ||
528 | </source> | ||
529 | <mac address='52:54:00:6d:90:02'/> | ||
530 | </interface></programlisting> | ||
531 | </listitem> | ||
532 | |||
533 | <listitem> | ||
534 | <para><emphasis role="bold">Bridge interface</emphasis></para> | ||
535 | |||
536 | <para>In case an OVS bridge exists on host, it can be used to | ||
537 | connect the guest:</para> | ||
538 | |||
539 | <programlisting> <interface type='bridge'> | ||
540 | <mac address='52:54:00:71:b1:b6'/> | ||
541 | <source bridge='ovsbr0'/> | ||
542 | <virtualport type='openvswitch'/> | ||
543 | <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> | ||
544 | </interface></programlisting> | ||
545 | |||
546 | <para>For further details on the network XML format, see <ulink | ||
547 | url="http://libvirt.org/formatnetwork.html">http://libvirt.org/formatnetwork.html</ulink>.</para> | ||
548 | </listitem> | ||
549 | </itemizedlist> | ||
550 | </section> | ||
551 | |||
552 | <section id="libvirt-guest-config-ex"> | ||
553 | <title>Libvirt guest configuration examples</title> | ||
554 | |||
555 | <section id="guest-config-vhost-user-interface"> | ||
556 | <title>Guest configuration with vhost-user interface</title> | ||
557 | |||
558 | <programlisting><domain type='kvm'> | ||
559 | <name>vm_vhost</name> | ||
560 | <uuid>4a9b3f53-fa2a-47f3-a757-dd87720d9d1d</uuid> | ||
561 | <memory unit='KiB'>4194304</memory> | ||
562 | <currentMemory unit='KiB'>4194304</currentMemory> | ||
563 | <memoryBacking> | ||
564 | <hugepages> | ||
565 | <page size='1' unit='G' nodeset='0'/> | ||
566 | </hugepages> | ||
567 | </memoryBacking> | ||
568 | <vcpu placement='static'>2</vcpu> | ||
569 | <cputune> | ||
570 | <shares>4096</shares> | ||
571 | <vcpupin vcpu='0' cpuset='4'/> | ||
572 | <vcpupin vcpu='1' cpuset='5'/> | ||
573 | <emulatorpin cpuset='4,5'/> | ||
574 | </cputune> | ||
575 | <os> | ||
576 | <type arch='x86_64' machine='pc'>hvm</type> | ||
577 | <kernel>/mnt/qemu/bzImage</kernel> | ||
578 | <cmdline>root=/dev/vda console=ttyS0,115200</cmdline> | ||
579 | <boot dev='hd'/> | ||
580 | </os> | ||
581 | <features> | ||
582 | <acpi/> | ||
583 | <apic/> | ||
584 | </features> | ||
585 | <cpu mode='host-model'> | ||
586 | <model fallback='allow'/> | ||
587 | <topology sockets='2' cores='1' threads='1'/> | ||
588 | <numa> | ||
589 | <cell id='0' cpus='0-1' memory='4194304' unit='KiB' memAccess='shared'/> | ||
590 | </numa> | ||
591 | </cpu> | ||
592 | <on_poweroff>destroy</on_poweroff> | ||
593 | <on_reboot>restart</on_reboot> | ||
594 | <on_crash>destroy</on_crash> | ||
595 | <devices> | ||
596 | <emulator>/usr/bin/qemu-system-x86_64</emulator> | ||
597 | <disk type='file' device='disk'> | ||
598 | <driver name='qemu' type='raw' cache='none'/> | ||
599 | <source file='/mnt/qemu/enea-image-virtualization-guest-qemux86-64.ext4'/> | ||
600 | <target dev='vda' bus='virtio'/> | ||
601 | </disk> | ||
602 | <interface type='vhostuser'> | ||
603 | <mac address='00:00:00:00:00:01'/> | ||
604 | <source type='unix' path='/var/run/openvswitch/vhost-user1' mode='client'/> | ||
605 | <model type='virtio'/> | ||
606 | <driver queues='1'> | ||
607 | <host mrg_rxbuf='off'/> | ||
608 | </driver> | ||
609 | </interface> | ||
610 | <serial type='pty'> | ||
611 | <target port='0'/> | ||
612 | </serial> | ||
613 | <console type='pty'> | ||
614 | <target type='serial' port='0'/> | ||
615 | </console> | ||
616 | </devices> | ||
617 | </domain></programlisting> | ||
618 | </section> | ||
619 | |||
620 | <section id="guest-config-pci-passthrough"> | ||
621 | <title>Guest configuration with PCI passthrough</title> | ||
622 | |||
623 | <programlisting><domain type='kvm'> | ||
624 | <name>vm_sriov1</name> | ||
625 | <uuid>4a9b3f53-fa2a-47f3-a757-dd87720d9d1d</uuid> | ||
626 | <memory unit='KiB'>4194304</memory> | ||
627 | <currentMemory unit='KiB'>4194304</currentMemory> | ||
628 | <memoryBacking> | ||
629 | <hugepages> | ||
630 | <page size='1' unit='G' nodeset='0'/> | ||
631 | </hugepages> | ||
632 | </memoryBacking> | ||
633 | <vcpu>2</vcpu> | ||
634 | <os> | ||
635 | <type arch='x86_64' machine='q35'>hvm</type> | ||
636 | <kernel>/mnt/qemu/bzImage</kernel> | ||
637 | <cmdline>root=/dev/vda console=ttyS0,115200</cmdline> | ||
638 | <boot dev='hd'/> | ||
639 | </os> | ||
640 | <features> | ||
641 | <acpi/> | ||
642 | <apic/> | ||
643 | </features> | ||
644 | <cpu mode='host-model'> | ||
645 | <model fallback='allow'/> | ||
646 | <topology sockets='1' cores='2' threads='1'/> | ||
647 | <numa> | ||
648 | <cell id='0' cpus='0' memory='4194304' unit='KiB' memAccess='shared'/> | ||
649 | </numa> | ||
650 | </cpu> | ||
651 | <on_poweroff>destroy</on_poweroff> | ||
652 | <on_reboot>restart</on_reboot> | ||
653 | <on_crash>destroy</on_crash> | ||
654 | <devices> | ||
655 | <emulator>/usr/bin/qemu-system-x86_64</emulator> | ||
656 | <disk type='file' device='disk'> | ||
657 | <driver name='qemu' type='raw' cache='none'/> | ||
658 | <source file='/mnt/qemu/enea-image-virtualization-guest-qemux86-64.ext4'/> | ||
659 | <target dev='vda' bus='virtio'/> | ||
660 | </disk> | ||
661 | <interface type='hostdev' managed='yes'> | ||
662 | <source> | ||
663 | <address type='pci' domain='0x0' bus='0x03' slot='0x10' function='0x0'/> | ||
664 | </source> | ||
665 | <mac address='52:54:00:6d:90:02'/> | ||
666 | </interface> | ||
667 | <serial type='pty'> | ||
668 | <target port='0'/> | ||
669 | </serial> | ||
670 | <console type='pty'> | ||
671 | <target type='serial' port='0'/> | ||
672 | </console> | ||
673 | </devices> | ||
674 | </domain></programlisting> | ||
675 | </section> | ||
676 | |||
677 | <section id="guest-config-bridge-interface"> | ||
678 | <title>Guest configuration with bridge interface</title> | ||
679 | |||
680 | <programlisting><domain type='kvm'> | ||
681 | <name>vm_bridge</name> | ||
682 | <uuid>4a9b3f53-fa2a-47f3-a757-dd87720d9d1d</uuid> | ||
683 | <memory unit='KiB'>4194304</memory> | ||
684 | <currentMemory unit='KiB'>4194304</currentMemory> | ||
685 | <memoryBacking> | ||
686 | <hugepages> | ||
687 | <page size='1' unit='G' nodeset='0'/> | ||
688 | </hugepages> | ||
689 | </memoryBacking> | ||
690 | <vcpu placement='static'>2</vcpu> | ||
691 | <cputune> | ||
692 | <shares>4096</shares> | ||
693 | <vcpupin vcpu='0' cpuset='4'/> | ||
694 | <vcpupin vcpu='1' cpuset='5'/> | ||
695 | <emulatorpin cpuset='4,5'/> | ||
696 | </cputune> | ||
697 | <os> | ||
698 | <type arch='x86_64' machine='q35'>hvm</type> | ||
699 | <kernel>/mnt/qemu/bzImage</kernel> | ||
700 | <cmdline>root=/dev/vda console=ttyS0,115200</cmdline> | ||
701 | <boot dev='hd'/> | ||
702 | </os> | ||
703 | <features> | ||
704 | <acpi/> | ||
705 | <apic/> | ||
706 | </features> | ||
707 | <cpu mode='host-model'> | ||
708 | <model fallback='allow'/> | ||
709 | <topology sockets='2' cores='1' threads='1'/> | ||
710 | <numa> | ||
711 | <cell id='0' cpus='0-1' memory='4194304' unit='KiB' memAccess='shared'/> | ||
712 | </numa> | ||
713 | </cpu> | ||
714 | <on_poweroff>destroy</on_poweroff> | ||
715 | <on_reboot>restart</on_reboot> | ||
716 | <on_crash>destroy</on_crash> | ||
717 | <devices> | ||
718 | <emulator>/usr/bin/qemu-system-x86_64</emulator> | ||
719 | <disk type='file' device='disk'> | ||
720 | <driver name='qemu' type='raw' cache='none'/> | ||
721 | <source file='/mnt/qemu/enea-image-virtualization-guest-qemux86-64.ext4'/> | ||
722 | <target dev='vda' bus='virtio'/> | ||
723 | </disk> | ||
724 | <interface type='bridge'> | ||
725 | <mac address='52:54:00:71:b1:b6'/> | ||
726 | <source bridge='ovsbr0'/> | ||
727 | <virtualport type='openvswitch'/> | ||
728 | <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> | ||
729 | </interface> | ||
730 | <serial type='pty'> | ||
731 | <target port='0'/> | ||
732 | </serial> | ||
733 | <console type='pty'> | ||
734 | <target type='serial' port='0'/> | ||
735 | </console> | ||
736 | </devices> | ||
737 | </domain></programlisting> | ||
738 | </section> | ||
739 | </section> | ||
740 | </section> | ||
741 | </chapter> \ No newline at end of file | ||