diff options
author | Miruna Paun <Miruna.Paun@enea.com> | 2017-09-25 13:57:48 +0200 |
---|---|---|
committer | Miruna Paun <Miruna.Paun@enea.com> | 2017-09-25 13:57:48 +0200 |
commit | 380e975b1b93e83705c8ed30197b1c23f8193814 (patch) | |
tree | 72e98d39867886b77c6008109080b4edb5ee410c /book-enea-nfv-core-installation-guide/doc/high_availability.xml | |
parent | 2df2d1adbab4c4fbfda61700945d85ca3ce53d74 (diff) | |
download | doc-enea-nfv-380e975b1b93e83705c8ed30197b1c23f8193814.tar.gz |
Create new version of NFV Core 1.0 Installation Guide
USERDOCAP-240
Signed-off-by: Miruna Paun <Miruna.Paun@enea.com>
Diffstat (limited to 'book-enea-nfv-core-installation-guide/doc/high_availability.xml')
-rw-r--r-- | book-enea-nfv-core-installation-guide/doc/high_availability.xml | 794 |
1 files changed, 794 insertions, 0 deletions
diff --git a/book-enea-nfv-core-installation-guide/doc/high_availability.xml b/book-enea-nfv-core-installation-guide/doc/high_availability.xml new file mode 100644 index 0000000..e489101 --- /dev/null +++ b/book-enea-nfv-core-installation-guide/doc/high_availability.xml | |||
@@ -0,0 +1,794 @@ | |||
1 | <?xml version="1.0" encoding="ISO-8859-1"?> | ||
2 | <chapter id="high_availability"> | ||
3 | <title>High Availability Guide</title> | ||
4 | |||
5 | <para>ENEA NFV Core 1.0 has been designed to provide high availability | ||
6 | characteristics that are needed for developing and deploying telco-grade NFV | ||
7 | solutions on top of our OPNFV based platform.</para> | ||
8 | |||
9 | <para>The High Availability subject in general is very wide and still an | ||
10 | important focus in both opensource communities and independent/proprietary | ||
11 | solutions market. ENEA NFV Core 1.0 aims to initially leverage the efforts | ||
12 | in the upstream OPNFV and OpenStack opensource projects, combining solutions | ||
13 | from both worlds in an effort to provide flexibility and a wide enough use | ||
14 | case coverage. ENEA has a long time expertise and proprietary solutions | ||
15 | addressing High Availability for telco applications, which are subject to | ||
16 | integrating with the NFV based solutions, however the initial scope for ENEA | ||
17 | NFV Core is to leverage as much as possible the OPNFV Reference Platform and | ||
18 | open source projects in general, such as it will be seen further ahead in | ||
19 | this chapter.</para> | ||
20 | |||
21 | <section id="levels"> | ||
22 | <title>High Availability Levels</title> | ||
23 | |||
24 | <para>The base for the feature set in ENEA NFV Core is divided into three | ||
25 | levels:</para> | ||
26 | |||
27 | <itemizedlist> | ||
28 | <listitem> | ||
29 | <para>Hardware Fault</para> | ||
30 | </listitem> | ||
31 | |||
32 | <listitem> | ||
33 | <para>NFV Platform HA</para> | ||
34 | </listitem> | ||
35 | |||
36 | <listitem> | ||
37 | <para>VNF High Availability</para> | ||
38 | </listitem> | ||
39 | </itemizedlist> | ||
40 | |||
41 | <para>The same division of levels of fault management can be seen in the | ||
42 | scope of the High Availability for OPNFV (Availability) project. OPNFV | ||
43 | also hosts the Doctor Project which is a fault management and maintenance | ||
44 | project to develop and realize the consequent implementation for the OPNFV | ||
45 | reference platform.</para> | ||
46 | |||
47 | <para>These two projects complement each other.</para> | ||
48 | |||
49 | <para>The Availability project addresses HA requirement and solutions from | ||
50 | the perspective of the three levels mentioned above and produces high | ||
51 | level requirements and API definitions for High Availability of OPNFV, HA | ||
52 | Gap Analysis Report for OpenStack and more recently works on optimizing | ||
53 | existing OPNFV test frameworks, such as Yardstick, and develops test cases | ||
54 | which realize HA specific use cases and scenarios such as derived from the | ||
55 | HA requirements.</para> | ||
56 | |||
57 | <para>The Doctor Project on the other hand aims to build fault management | ||
58 | and maintenance framework for high availability of Network Services on top | ||
59 | of virtualized infrastructure; the key feature is immediate notification | ||
60 | of unavailability of virtualized resources from VIM, to process recovery | ||
61 | of VNFs on them. The Doctor project has also collaborated with the | ||
62 | Availability project on identifying gaps in upstream project, mainly | ||
63 | OpenStack but not exclusive, and has worked towards implementing missing | ||
64 | features or improving the functionality, one good example being the Aodh | ||
65 | event based alarms, which allows for fast notifications when certain | ||
66 | predefined events occur. The Doctor project also produced an architecture | ||
67 | design and a reference implementation based on opensource components, | ||
68 | which will be presented later on in this document.</para> | ||
69 | </section> | ||
70 | |||
71 | <section id="doctor_arch"> | ||
72 | <title>Doctor Architecture</title> | ||
73 | |||
74 | <para>The Doctor documentation shows the detailed architecture for Fault | ||
75 | Management and NFVI Maintenance . The two are very similar so we will | ||
76 | focus on the Fault Management.</para> | ||
77 | |||
78 | <para>The architecture specifies a set of functional blocks:</para> | ||
79 | |||
80 | <itemizedlist> | ||
81 | <listitem> | ||
82 | <para>Monitor - monitors the virtualized infrastructure capturing | ||
83 | fault events in the Software and Hardware; for this particular | ||
84 | component we chose Zabbix which is integrated into the platform by | ||
85 | means of the Fuel Zabbix Plugin, available upstream.</para> | ||
86 | </listitem> | ||
87 | |||
88 | <listitem> | ||
89 | <para>Inspector - this component is able to receive notifications from | ||
90 | Monitor components and also OpenStack core components, which allows it | ||
91 | to create logic relationships between entities, identify affected | ||
92 | resources when faults occur, and communicates with Controllers to | ||
93 | update the states of the virtual and physical resources. For this | ||
94 | component ENEA NFV Core 1.0 makes use of Vitrage , an OpenStack | ||
95 | related project used for Root Cause Analysis, which has been adapted | ||
96 | to server as a Doctor Inspector. The integration into the platform is | ||
97 | realized with the help of a Fuel Plugin which has been developed | ||
98 | internally by ENEA.</para> | ||
99 | </listitem> | ||
100 | |||
101 | <listitem> | ||
102 | <para>Controller - OpenStack core components act as Controllers, which | ||
103 | are responsible for maintaining the resource map between physical and | ||
104 | virtual resources, they accept update requests from the Inspector and | ||
105 | are responsible for sending failure event notifications to the | ||
106 | Notifier. Components such as Nova, Neutron, Glance, Heat act as | ||
107 | Controllers in the Doctor Architecture.</para> | ||
108 | </listitem> | ||
109 | |||
110 | <listitem> | ||
111 | <para>Notifier - the focus of this component is on selecting and | ||
112 | aggregating failure events received from the controller based on | ||
113 | policies mandated by the Consumer. The role of the Notifier is | ||
114 | accomplished by the Aodh component in OpenStack.</para> | ||
115 | </listitem> | ||
116 | </itemizedlist> | ||
117 | |||
118 | <para>Besides the Doctor components there are a couple other blocks | ||
119 | mentioned in the architecture:</para> | ||
120 | |||
121 | <itemizedlist> | ||
122 | <listitem> | ||
123 | <para>Administrator - this represents the human role of administrating | ||
124 | the platform by means of dedicated interfaces, either visual | ||
125 | dashboards, like OpenStack Horizon or Fuel Dashboard, or via CLI | ||
126 | tools, like the OpenStack unified CLI that can be accessed | ||
127 | traditionally from one of the servers that act as OpenStack Controller | ||
128 | nodes. In the case of ENEA NFV Core 1.0, the Administrator can also | ||
129 | access the Zabbix dashboard for doing further configurations. The same | ||
130 | applies for the Vitrage tool, which comes with its own Horizon | ||
131 | dashboard which enables the user to visually inspect the faults | ||
132 | reported by the monitoring tools and also creates visual | ||
133 | representations of the virtual and physical resources, the | ||
134 | relationships between them and the fault correlation. For Vitrage, | ||
135 | users will usually want to configure additional usecases and describe | ||
136 | relationships between components, via template files written in yaml | ||
137 | format. More information about using Vitrage will be presented in a | ||
138 | following section.</para> | ||
139 | </listitem> | ||
140 | |||
141 | <listitem> | ||
142 | <para>Consumer - this block is vaguely described in the Doctor | ||
143 | Architecture and it's out of its scope. Doctor only deals with fault | ||
144 | detection and management, making sure faults are handled as soon as | ||
145 | possible after detection, identifies affected virtual resources and | ||
146 | updates the states of them, but since the actual VNFs are managed, | ||
147 | according to the ETSI architecture, by a different entity, Doctor does | ||
148 | not deal with recovery actions of the VNFs. The role of the Consumer | ||
149 | thus falls in the task of a VNF Manager and Orchestrator. ENEA NFV | ||
150 | Core 1.0 provides VNF management capabilities using Tacker, which is | ||
151 | an OpenStack project that implements a generic VNF Manager and | ||
152 | Orchestrator according to the ETSI MANO Architectural | ||
153 | Framework.</para> | ||
154 | </listitem> | ||
155 | </itemizedlist> | ||
156 | |||
157 | <para>The functional blocks overview in the picture below has been | ||
158 | complemented to show the components used for realizing the Doctor | ||
159 | Architecture:</para> | ||
160 | |||
161 | <mediaobject> | ||
162 | <imageobject role="fo"> | ||
163 | <imagedata contentwidth="600" fileref="images/functional_blocks.svg" | ||
164 | format="SVG" /> | ||
165 | </imageobject> | ||
166 | </mediaobject> | ||
167 | |||
168 | <section id="dr_fault_mg"> | ||
169 | <title>Doctor Fault Management</title> | ||
170 | |||
171 | <para>The architecture described in the Doctor project has been | ||
172 | demonstrated in various PoCs and demos, but always using sample | ||
173 | components for either the consumer or the monitor. ENEA has worked with | ||
174 | upstream projects, Doctor and Vitrage, to realize the goals of the | ||
175 | Doctor project by using real components, as described before.</para> | ||
176 | |||
177 | <para>The two pictures below show a typical fault management scenario, | ||
178 | as described in the Doctor documentation.</para> | ||
179 | |||
180 | <mediaobject> | ||
181 | <imageobject> | ||
182 | <imagedata contentwidth="600" fileref="images/dr_fault_mg.svg" /> | ||
183 | </imageobject> | ||
184 | </mediaobject> | ||
185 | |||
186 | <mediaobject> | ||
187 | <imageobject> | ||
188 | <imagedata contentwidth="600" fileref="images/dr_fault_mg_2.svg" /> | ||
189 | </imageobject> | ||
190 | </mediaobject> | ||
191 | |||
192 | <para>ENEA NFV Core 1.0 uses the same approach described above, but it's | ||
193 | worth going through each step and detail them.</para> | ||
194 | |||
195 | <orderedlist> | ||
196 | <listitem> | ||
197 | <para>When creating a VNF, the user will have to enable the | ||
198 | monitoring capabilities of Tacker, by passing a template which | ||
199 | specifies that an alarm will be created when the VM represented by | ||
200 | this VNF changes state. The support for alarm monitoring in Tacker | ||
201 | is captured in the Alarm Monitoring Framework spec in OpenStack | ||
202 | documentation. In a few words, Tacker should be able to create a VNF | ||
203 | and then create an Aodh alarm of type event which triggers when the | ||
204 | instance is in state ERROR. The action to take when this event | ||
205 | triggers is to perform an HTTP call, to an URL managed by Tacker. As | ||
206 | a result of this action, Tacker can detect when an instance has | ||
207 | failed (for whatever reasons) and will respawn it somewhere | ||
208 | else.</para> | ||
209 | </listitem> | ||
210 | |||
211 | <listitem> | ||
212 | <para>The subscribe response in this case is an empty operation, the | ||
213 | Notifier (Aodh) only has to confirm that the alarm has been | ||
214 | created.</para> | ||
215 | </listitem> | ||
216 | |||
217 | <listitem> | ||
218 | <para>The NFVI sends monitoring events for resources the VIM has | ||
219 | been subscribed to. Note: this subscription message exchange between | ||
220 | the VIM and NFVI is not shown in this message flow. This steps is | ||
221 | related to Vitrage's capability of receiving notifications from | ||
222 | OpenStack services, at this moment Vitrage supports notifications | ||
223 | from nova.host, nova.instances, nova.zone, cinder.volume, | ||
224 | neutron.network, neutron.port and heat.stack OpenStack | ||
225 | datasources.</para> | ||
226 | </listitem> | ||
227 | |||
228 | <listitem> | ||
229 | <para>This steps describes faults being detected by Zabbix which are | ||
230 | sent to the Inspector (Vitrage) as soon as detected, using a push | ||
231 | approach by means of sending an AMQP message to a dedicated message | ||
232 | queue managed by Vitrage. For example, if nova-compute fails on one | ||
233 | of the compute nodes, Zabbix will format a message specifying all | ||
234 | the needed details needed for processing the fault, e.g. a | ||
235 | timestamp, what host failed, what event occurred and others.</para> | ||
236 | </listitem> | ||
237 | |||
238 | <listitem> | ||
239 | <para>Database lookup to find the virtual resources affected by the | ||
240 | detected fault. In this step Vitrage will perform various | ||
241 | calculations to detect what virtual resources are affected by the | ||
242 | raw failure presented by Zabbix. Vitrage can be configured via | ||
243 | templates to correlate instances with the physical hosts they are | ||
244 | running on, so that if a compute node fails, then instances running | ||
245 | on that host will be affected. A typical usecase is to mark the | ||
246 | compute node down (a.k.a mark_host_down) and update the states of | ||
247 | all instances running on them, by issuing Nova API calls for each of | ||
248 | these instances. Step 5c) shows the Controller (Nova in this case) | ||
249 | acting upon the state change of the instance and issues an event | ||
250 | alarm to Aodh.</para> | ||
251 | </listitem> | ||
252 | |||
253 | <listitem> | ||
254 | <para>The Notifier will acknowledge the alarm event request from | ||
255 | Nova and will trigger the alarm(s) created by Tacker in step 1). | ||
256 | Since Tacker has configured the alarm to send an HTTP request, Aodh | ||
257 | will perform that HTTP call at the URL managed by Tacker.</para> | ||
258 | </listitem> | ||
259 | |||
260 | <listitem> | ||
261 | <para>The Consumer (Tacker) will react to the HTTP call and perform | ||
262 | the action configured by the user (e.g. respawn the VNF).</para> | ||
263 | </listitem> | ||
264 | |||
265 | <listitem> | ||
266 | <para>The action is sent to the Controller (Nova) so that the VNF is | ||
267 | recreated.</para> | ||
268 | </listitem> | ||
269 | </orderedlist> | ||
270 | |||
271 | <note> | ||
272 | <para>The ENEA NFV Core 1.0 Pre-Release fully covers the required | ||
273 | Doctor functionality only for the Vitrage and Zabbix | ||
274 | components.</para> | ||
275 | </note> | ||
276 | </section> | ||
277 | |||
278 | <section id="zabbix"> | ||
279 | <title>Zabbix Configuration for Push Notifications</title> | ||
280 | |||
281 | <para>Vitrage supports Zabbix datasource by means of regularly polling | ||
282 | the Zabbix agents, which need to be configured in advance. The Vitrage | ||
283 | plugin developed internally by ENEA can automatically configure Zabbix | ||
284 | so that everything works as expected.</para> | ||
285 | |||
286 | <para>However, polling is not fast enough for a telco usecase, so it is | ||
287 | necessary to configure pushed notifications for Zabbix . This requires | ||
288 | manual configuration on one of the controller nodes, since Zabbix uses a | ||
289 | centralized database which makes the configuration available on all the | ||
290 | other nodes.</para> | ||
291 | |||
292 | <para>The Zabbix configuration dashboard is available at the same IP | ||
293 | address where OpenStack can be reached, e.g. | ||
294 | http://<vip__zbx_vip_mgmt>/zabbix.</para> | ||
295 | |||
296 | <para>To forward zabbix events to Vitrage a new media script needs to be | ||
297 | created and associated with a user. Follow the steps below as a Zabbix | ||
298 | Admin user:</para> | ||
299 | |||
300 | <orderedlist> | ||
301 | <listitem> | ||
302 | <para>Create a new media type [Admininstration Media Types Create | ||
303 | Media Type]</para> | ||
304 | |||
305 | <itemizedlist> | ||
306 | <listitem> | ||
307 | <para>Name: Vitrage Notifications</para> | ||
308 | </listitem> | ||
309 | |||
310 | <listitem> | ||
311 | <para>Type: Script</para> | ||
312 | </listitem> | ||
313 | |||
314 | <listitem> | ||
315 | <para>Script name: zabbix_vitrage.py</para> | ||
316 | </listitem> | ||
317 | </itemizedlist> | ||
318 | </listitem> | ||
319 | |||
320 | <listitem> | ||
321 | <para>Modify the Media for the Admin user [Administration | ||
322 | Users]</para> | ||
323 | |||
324 | <itemizedlist> | ||
325 | <listitem> | ||
326 | <para>Type: Vitrage Notifications</para> | ||
327 | </listitem> | ||
328 | |||
329 | <listitem> | ||
330 | <para>Send to: rabbit://rabbit_user:rabbit_pass@127.0.0.1:5672/ | ||
331 | --- Vitrage message bus url (you need to search for this in | ||
332 | /etc/vitrage/vitrage.conf or /etc/nova/nova.conf | ||
333 | transport_url)</para> | ||
334 | </listitem> | ||
335 | |||
336 | <listitem> | ||
337 | <para>When active: 1-7,00:00-24:00</para> | ||
338 | </listitem> | ||
339 | |||
340 | <listitem> | ||
341 | <para>Use if severity: (all)</para> | ||
342 | </listitem> | ||
343 | |||
344 | <listitem> | ||
345 | <para>Status: Enabled</para> | ||
346 | </listitem> | ||
347 | </itemizedlist> | ||
348 | </listitem> | ||
349 | |||
350 | <listitem> | ||
351 | <para>Configure Action [Configuration Actions Create Action | ||
352 | Action]</para> | ||
353 | |||
354 | <itemizedlist> | ||
355 | <listitem> | ||
356 | <para>Name: Forward to Vitrage</para> | ||
357 | </listitem> | ||
358 | |||
359 | <listitem> | ||
360 | <para>Default Subject: {TRIGGER.STATUS}</para> | ||
361 | </listitem> | ||
362 | |||
363 | <listitem> | ||
364 | <para>Default Message: host={HOST.NAME1} hostid={HOST.ID1} | ||
365 | hostip={HOST.IP1} triggerid={TRIGGER.ID} | ||
366 | description={TRIGGER.NAME} rawtext={TRIGGER.NAME.ORIG} | ||
367 | expression={TRIGGER.EXPRESSION} value={TRIGGER.VALUE} | ||
368 | priority={TRIGGER.NSEVERITY} lastchange={EVENT.DATE} | ||
369 | {EVENT.TIME}</para> | ||
370 | </listitem> | ||
371 | </itemizedlist> | ||
372 | </listitem> | ||
373 | |||
374 | <listitem> | ||
375 | <para>To send events add under the Conditions tab: 'Maintenance | ||
376 | status not in 'maintenance'".</para> | ||
377 | </listitem> | ||
378 | |||
379 | <listitem> | ||
380 | <para>Finally, add an operation:</para> | ||
381 | |||
382 | <itemizedlist> | ||
383 | <listitem> | ||
384 | <para>Send to Users: Admin</para> | ||
385 | </listitem> | ||
386 | |||
387 | <listitem> | ||
388 | <para>Send only to: Vitrage Notifications</para> | ||
389 | </listitem> | ||
390 | </itemizedlist> | ||
391 | </listitem> | ||
392 | </orderedlist> | ||
393 | |||
394 | <para>Using these instructions, Zabbix will call the zabbix_vitrage.py | ||
395 | script, which is made readily available by the Fuel Vitrage Plugin, | ||
396 | passing the arguments described in step 3). The zabbix_vitrage.py script | ||
397 | will then interpret the parameters and format an AMQP message will be | ||
398 | sent to the vitrage.notifications queue, which is managed by the | ||
399 | vitrage-graph service.</para> | ||
400 | </section> | ||
401 | |||
402 | <section id="vitrage_config"> | ||
403 | <title>Vitrage Configuration</title> | ||
404 | |||
405 | <para>The Vitrage team has been collaborating with OPNFV Doctor Project | ||
406 | in order to support Vitrage as an Inspector Component. The Doctor | ||
407 | usecase for Vitrage is described in an OpenStack blueprint . | ||
408 | Additionally, ENEA NFV Core has complemented Vitrage with the capability | ||
409 | of setting states of failed instances by implementing an action type in | ||
410 | Vitrage which calls Nova APIs to set instances in error state. There is | ||
411 | also an action type which allows fencing failed hosts.</para> | ||
412 | |||
413 | <para>In order to make use of these features, Vitrage supports | ||
414 | additional configurations via yaml templates that must be placed in | ||
415 | /etc/vitrage/templates on the nodes have the Vitrage role.</para> | ||
416 | |||
417 | <para>The example below shows how to program Vitrage to mark failed | ||
418 | compute hosts as down and then to change the state of the instances to | ||
419 | Error, by creating Vitrage deduced alarms.</para> | ||
420 | |||
421 | <programlisting>metadata: | ||
422 | name: test_nova_mark_instance_err | ||
423 | description: test description | ||
424 | definitions: | ||
425 | entities: | ||
426 | - entity: | ||
427 | category: ALARM | ||
428 | type: zabbix | ||
429 | rawtext: Nova Compute process is not running on {HOST.NAME} | ||
430 | template_id: zabbix_alarm | ||
431 | - entity: | ||
432 | category: RESOURCE | ||
433 | type: nova.host | ||
434 | template_id: host | ||
435 | - entity: | ||
436 | category: RESOURCE | ||
437 | type: nova.instance | ||
438 | template_id: instance | ||
439 | relationships: | ||
440 | - relationship: | ||
441 | source: zabbix_alarm | ||
442 | relationship_type: on | ||
443 | target: host | ||
444 | template_id: nova_process_not_running | ||
445 | - relationship: | ||
446 | source: host | ||
447 | target: instance | ||
448 | relationship_type: contains | ||
449 | template_id : host_contains_instance | ||
450 | scenarios: | ||
451 | - scenario: | ||
452 | condition: nova_process_not_running and host_contains_instance | ||
453 | actions: | ||
454 | - action: | ||
455 | action_type: mark_down | ||
456 | action_target: | ||
457 | target: host | ||
458 | - action: | ||
459 | action_type: set_instance_state | ||
460 | action_target: | ||
461 | target: instance | ||
462 | - action: | ||
463 | action_type: set_state | ||
464 | action_target: | ||
465 | target: instance | ||
466 | properties: | ||
467 | state: ERROR</programlisting> | ||
468 | |||
469 | <para>For the action type of fencing a similar action item must be | ||
470 | added:</para> | ||
471 | |||
472 | <programlisting>- scenario: | ||
473 | condition: critical_problem_on_host | ||
474 | actions: | ||
475 | - action: | ||
476 | action_type: fence | ||
477 | action_target: | ||
478 | target: host</programlisting> | ||
479 | |||
480 | <para>After a template is configured, it is required to restart the | ||
481 | vitrage-api and vitrage-graph services:</para> | ||
482 | |||
483 | <programlisting>root@node-6:~# systemctl restart vitrage-api | ||
484 | root@node-6:~# systemctl restart vitrage-graph</programlisting> | ||
485 | </section> | ||
486 | |||
487 | <section id="vitrage_custom"> | ||
488 | <title>Vitrage Customizations</title> | ||
489 | |||
490 | <para>ENEA NFV Core 1.0 has added custom features for Vitrage which | ||
491 | allow two kinds of action:</para> | ||
492 | |||
493 | <orderedlist> | ||
494 | <listitem> | ||
495 | <para>Perform actions Northbound of the VIM</para> | ||
496 | |||
497 | <itemizedlist> | ||
498 | <listitem> | ||
499 | <para>Nova force host down on compute</para> | ||
500 | </listitem> | ||
501 | |||
502 | <listitem> | ||
503 | <para>Setting instance state to error in nova; this is used in | ||
504 | conjunction with an alarm created by Tacker, as described | ||
505 | before, should allow Tacker to detect when an instance is | ||
506 | affected and take proper actions.</para> | ||
507 | </listitem> | ||
508 | </itemizedlist> | ||
509 | </listitem> | ||
510 | |||
511 | <listitem> | ||
512 | <para>Perform actions Southbound of the VIM.</para> | ||
513 | |||
514 | <para>Vitrage templates allow us to program fencing actions for | ||
515 | hosts with failed services. In the event of that systemd is unable | ||
516 | to recover from a critical process or other type of sofware error | ||
517 | ocurs on Hardware supporting them, we can program a fencing of that | ||
518 | Node which will perform a reboot thus attempting to recover a failed | ||
519 | node.</para> | ||
520 | </listitem> | ||
521 | </orderedlist> | ||
522 | </section> | ||
523 | </section> | ||
524 | |||
525 | <section id="pm_high_avail"> | ||
526 | <title>Pacemaker High Availability</title> | ||
527 | |||
528 | <para>Many of the OpenStack solutions which offer High Availability | ||
529 | characteristics employ pacemaker for achieving highly available OpenStack | ||
530 | services. Traditionally pacemaker has been used for managing only the | ||
531 | control plane services, so it can effectively provide redundancy and | ||
532 | recovery for the Controller nodes only. One reason for this is that | ||
533 | Controller nodes and Compute nodes essentially have very different High | ||
534 | Availability requirements that need to be considered. Typically, for | ||
535 | Controller nodes, the services that run on them are stateless, with few | ||
536 | exceptions, where only one instance of a given service is allowed, but for | ||
537 | which redundancy is still desired, one good example being an AMQP service | ||
538 | (e.g. RabbitMQ). Compute nodes HA requirements depend on the type of | ||
539 | services that run on them, but typically it is desired that failures on | ||
540 | these nodes is detected as soon as possible so that the instances that run | ||
541 | on them can be either migrated, resurrected or restarted. One other aspect | ||
542 | is that sometimes failures on the physical hosts do not necessarily cause | ||
543 | a failure on the services (VNFs), but having these services incapacitated | ||
544 | can prevent accessing and controlling the services.</para> | ||
545 | |||
546 | <para>So Controller High Availability is one subject which is in general | ||
547 | well understood and experimented with, and the base of achieving this is | ||
548 | Pacemaker using Corosync underneath.</para> | ||
549 | |||
550 | <para>Extending the use of pacemaker to Compute nodes was thought as a | ||
551 | possible solution for providing VNF high availability, but this turns out | ||
552 | to be a problem which is not easy to solve. On one hand pacemaker as a | ||
553 | clustering tool can only scale properly up to limited number of nodes, | ||
554 | usually less than 128. This poses a problem for large scale deployments | ||
555 | where hundreds of compute nodes are required. On the other hand, Compute | ||
556 | node HA requires other considerations and calls for specially designed | ||
557 | solutions.</para> | ||
558 | |||
559 | <section id="pm_remote"> | ||
560 | <title>Pacemaker Remote</title> | ||
561 | |||
562 | <para>As mentioned earlier, pacemaker and corosync do not scale well | ||
563 | over a large cluster, because each node has to talk to everyone, | ||
564 | essentially creating a mesh configuration. Some solution to this problem | ||
565 | could be partitioning the cluster into smaller groups, but this solution | ||
566 | has its limitation and it's generally difficult to manage.</para> | ||
567 | |||
568 | <para>A better solution is using pacemaker-remote, a feature of | ||
569 | pacemaker which allows extending the cluster beyond the usual limits by | ||
570 | using the pacemaker monitoring capabilities, essentially creating a new | ||
571 | type of resource which enables adding light weight nodes to the cluster. | ||
572 | More information about pacemaker-remote can be found on the official | ||
573 | clusterlabs website.</para> | ||
574 | |||
575 | <para>Please note that at this moment pacemaker remote must be | ||
576 | configured manually after deployment. Here are the manual steps for | ||
577 | doing so:</para> | ||
578 | |||
579 | <orderedlist> | ||
580 | <listitem> | ||
581 | <para>Logon to the Fuel Master using the default credentials if not | ||
582 | changed (root/r00tme)</para> | ||
583 | </listitem> | ||
584 | |||
585 | <listitem> | ||
586 | <para>Type fuel node to obtain the list of nodes, their roles and | ||
587 | the IP addresses</para> | ||
588 | |||
589 | <programlisting>[root@fuel ~]# fuel node | ||
590 | id | status | name | cluster | ip | mac | roles / | ||
591 | | pending_roles | online | group_id | ||
592 | ---+--------+------------------+---------+-----------+-------------------+----------/ | ||
593 | -----------------+---------------+--------+--------- | ||
594 | 1 | ready | Untitled (8c:d4) | 1 | 10.20.0.4 | 68:05:ca:46:8c:d4 | ceph-osd,/ | ||
595 | controller | | 1 | 1 | ||
596 | 4 | ready | Untitled (8c:c2) | 1 | 10.20.0.6 | 68:05:ca:46:8c:c2 | ceph-osd,/ | ||
597 | compute | | 1 | 1 | ||
598 | 5 | ready | Untitled (8c:c9) | 1 | 10.20.0.7 | 68:05:ca:46:8c:c9 | ceph-osd,/ | ||
599 | compute | | 1 | 1 | ||
600 | 2 | ready | Untitled (8b:64) | 1 | 10.20.0.3 | 68:05:ca:46:8b:64 | / | ||
601 | controller, mongo, tacker | | 1 | 1 | ||
602 | 3 | ready | Untitled (8c:45) | 1 | 10.20.0.5 | 68:05:ca:46:8c:45 | / | ||
603 | controller, vitrage | | 1 | 1</programlisting> | ||
604 | </listitem> | ||
605 | |||
606 | <listitem> | ||
607 | <para>Each controller has a unique pacemaker authkey, we need to | ||
608 | keep one an propagate it to the other servers. Assuming node-1, | ||
609 | node-2 and node-3 are the controllers, execute the following from | ||
610 | the Fuel console:</para> | ||
611 | |||
612 | <programlisting>[root@fuel ~]# scp node-1:/etc/pacemaker/authkey . | ||
613 | [root@fuel ~]# scp authkey node-2:/etc/pacemaker/ | ||
614 | [root@fuel ~]# scp authkey node-3:/etc/pacemaker/ | ||
615 | [root@fuel ~]# scp authkey node-3:/etc/pacemaker/ | ||
616 | [root@fuel ~]# scp authkey node-4:~ | ||
617 | [root@fuel ~]# scp authkey node-5:~</programlisting> | ||
618 | </listitem> | ||
619 | |||
620 | <listitem> | ||
621 | <para>For each compute node, log on to it using the corresponding | ||
622 | IP.</para> | ||
623 | </listitem> | ||
624 | |||
625 | <listitem> | ||
626 | <para>Install the required packages:</para> | ||
627 | |||
628 | <programlisting>root@node-4:~# apt-get install pacemaker-remote resource-agents crmsh</programlisting> | ||
629 | </listitem> | ||
630 | |||
631 | <listitem> | ||
632 | <para>Copy the authkey from the Fuel master and make sure the right | ||
633 | permissions are set:</para> | ||
634 | |||
635 | <programlisting>[root@node-4:~]# cp authkey /etc/pacemaker | ||
636 | [root@node-4:~]# chown root:haclient /etc/pacemaker/authkey</programlisting> | ||
637 | </listitem> | ||
638 | |||
639 | <listitem> | ||
640 | <para>Add iptables rule for the default port (3121). Also save it to | ||
641 | /etc/iptables/rules.v4 to make it persistent:</para> | ||
642 | |||
643 | <programlisting>root@node-4:~# iptables -A INPUT -s 192.168.0.0/24 -p tcp -m multiport / | ||
644 | --dports 3121 -m comment --comment "pacemaker_remoted from 192.168.0.0/24" -j ACCEPT </programlisting> | ||
645 | </listitem> | ||
646 | |||
647 | <listitem> | ||
648 | <para>Start the pacemaker-remote service</para> | ||
649 | |||
650 | <programlisting>[root@node-4:~]# systemctl start pacemaker-remote.service</programlisting> | ||
651 | </listitem> | ||
652 | |||
653 | <listitem> | ||
654 | <para>Log on one of the controller nodes and configure the | ||
655 | pacemaker-remote resources:</para> | ||
656 | |||
657 | <programlisting>[root@node-1:~]# pcs resource create node-4.domain.tld remote | ||
658 | [root@node-1:~]# pcs constraint location node-4.domain.tld prefers / | ||
659 | node-1.domain.tld=100 node-2.domain.tld=100 node-3.domain.tld=100 | ||
660 | [root@node-1:~]# pcs constraint location node-4.domain.tld avoids node-5.domain.tld | ||
661 | [root@node-1:~]# pcs resource create node-5.domain.tld remote | ||
662 | [root@node-1:~]# pcs constraint location node-5.domain.tld prefers / | ||
663 | node-1.domain.tld=100 node-2.domain.tld=100 node-3.domain.tld=100 | ||
664 | [root@node-1:~]# pcs constraint location node-5.domain.tld avoids node-4.domain.tld</programlisting> | ||
665 | </listitem> | ||
666 | |||
667 | <listitem> | ||
668 | <para>Remote nodes should now appear online:</para> | ||
669 | |||
670 | <programlisting>[root@node-1:~]# pcs status | ||
671 | Cluster name: OpenStack | ||
672 | Last updated: Thu Aug 24 12:00:21 2017 Last change: Thu Aug 24 11:57:32 2017 / | ||
673 | by root via cibadmin on node-1.domain.tld | ||
674 | Stack: corosync | ||
675 | Current DC: node-1.domain.tld (version 1.1.14-70404b0) - partition with quorum | ||
676 | 5 nodes and 78 resources configured | ||
677 | |||
678 | Online: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld ] | ||
679 | RemoteOnline: [ node-4.domain.tld node-5.domain.tld ]</programlisting> | ||
680 | </listitem> | ||
681 | </orderedlist> | ||
682 | </section> | ||
683 | |||
684 | <section id="pm_fencing"> | ||
685 | <title>Pacemaker Fencing</title> | ||
686 | |||
687 | <para>ENEA NFV Core 1.0 makes use of the fencing capabilities of | ||
688 | Pacemaker to isolate faulty nodes and trigger recovery actions by means | ||
689 | of power cycling the failed nodes. Fencing is configured by creating | ||
690 | STONITH type resources for each of the servers in the cluster, both | ||
691 | Controller nodes and Compute nodes. The STONITH adapter for fencing the | ||
692 | nodes is fence_ipmilan, which makes use of the IPMI capabilities of the | ||
693 | Cavium ThunderX servers.</para> | ||
694 | |||
695 | <para>Here are the steps for enabling fencing capabilities in the | ||
696 | cluster:</para> | ||
697 | |||
698 | <orderedlist> | ||
699 | <listitem> | ||
700 | <para>Logon to the Fuel Master using the default credentials if not | ||
701 | changed (root/r00tme).</para> | ||
702 | </listitem> | ||
703 | |||
704 | <listitem> | ||
705 | <para>Type fuel node to obtain the list of nodes, their roles and | ||
706 | the IP addresses:</para> | ||
707 | |||
708 | <programlisting>[root@fuel ~]# fuel node | ||
709 | id | status | name | cluster | ip | mac | roles / | ||
710 | | pending_roles | online | group_id | ||
711 | ---+--------+------------------+---------+-----------+-------------------+----------/ | ||
712 | -----------------+---------------+--------+--------- | ||
713 | 1 | ready | Untitled (8c:d4) | 1 | 10.20.0.4 | 68:05:ca:46:8c:d4 | ceph-osd,/ | ||
714 | controller | | 1 | 1 | ||
715 | 4 | ready | Untitled (8c:c2) | 1 | 10.20.0.6 | 68:05:ca:46:8c:c2 | ceph-osd,/ | ||
716 | compute | | 1 | 1 | ||
717 | 5 | ready | Untitled (8c:c9) | 1 | 10.20.0.7 | 68:05:ca:46:8c:c9 | ceph-osd,/ | ||
718 | compute | | 1 | 1 | ||
719 | 2 | ready | Untitled (8b:64) | 1 | 10.20.0.3 | 68:05:ca:46:8b:64 | / | ||
720 | controller, mongo, tacker | | 1 | 1 | ||
721 | 3 | ready | Untitled (8c:45) | 1 | 10.20.0.5 | 68:05:ca:46:8c:45 | / | ||
722 | controller, vitrage | | 1 | 1 | ||
723 | </programlisting> | ||
724 | </listitem> | ||
725 | |||
726 | <listitem> | ||
727 | <para>Logon to each server to install additional packages:</para> | ||
728 | |||
729 | <programlisting>[root@node-1:~]# apt-get install fence-agents ipmitool</programlisting> | ||
730 | </listitem> | ||
731 | |||
732 | <listitem> | ||
733 | <para>Configure pacemaker fencing resources; this needs to be done | ||
734 | once on one of the controllers. The parameters will vary, depending | ||
735 | on the BMC addresses of each node and credentials.</para> | ||
736 | |||
737 | <programlisting>[root@node-1:~]# crm configure primitive ipmi-fencing-node-1 / | ||
738 | stonith::fence_ipmilan params pcmk_host_list="node-1.domain.tld" / | ||
739 | ipaddr=10.0.100.151 login=ADMIN passwd=ADMIN op monitor interval="60s" | ||
740 | [root@node-1:~]# crm configure primitive ipmi-fencing-node-2 / | ||
741 | stonith::fence_ipmilan params pcmk_host_list="node-2.domain.tld" / | ||
742 | ipaddr=10.0.100.152 login=ADMIN passwd=ADMIN op monitor interval="60s" | ||
743 | [root@node-1:~]# crm configure primitive ipmi-fencing-node-3 / | ||
744 | stonith::fence_ipmilan params pcmk_host_list="node-3.domain.tld" / | ||
745 | ipaddr=10.0.100.153 login=ADMIN passwd=ADMIN op monitor interval="60s" | ||
746 | [root@node-1:~]# crm configure primitive ipmi-fencing-node-4 / | ||
747 | stonith::fence_ipmilan params pcmk_host_list="node-4.domain.tld" / | ||
748 | ipaddr=10.0.100.154 login=ADMIN passwd=ADMIN op monitor interval="60s" | ||
749 | [root@node-1:~]# crm configure primitive ipmi-fencing-node-5 / | ||
750 | stonith::fence_ipmilan params pcmk_host_list="node-5.domain.tld" / | ||
751 | ipaddr=10.0.100.155 login=ADMIN passwd=ADMIN op monitor interval="60s"</programlisting> | ||
752 | </listitem> | ||
753 | |||
754 | <listitem> | ||
755 | <para>Activate fencing by enabling stonith property in pacemaker (by | ||
756 | default it is disabled); this also needs to be done only once, on | ||
757 | one of the controllers.</para> | ||
758 | |||
759 | <programlisting>[root@node-1:~]# pcs property set stonith-enabled=true</programlisting> | ||
760 | </listitem> | ||
761 | </orderedlist> | ||
762 | </section> | ||
763 | </section> | ||
764 | |||
765 | <section id="ops_resources_agents"> | ||
766 | <title>OpenStack Resource Agents</title> | ||
767 | |||
768 | <para>The OpenStack community has been working for some time on | ||
769 | identifying possible solutions for enabling High Availability for Compute | ||
770 | nodes, although initially the subject of HA on compute node was very | ||
771 | controversial as not being something that should concern the cloud | ||
772 | platform. Over time it became obvious that even on a true cloud platform, | ||
773 | where services are designed to run without being affected by the | ||
774 | availability of the cloud platform, fault management and recovery is still | ||
775 | very important and desirable. This is very much the case for NFV | ||
776 | applications, where, in the good tradition of telecom applications, the | ||
777 | operators must have complete engineering control over the resources it | ||
778 | owns and manages.</para> | ||
779 | |||
780 | <para>The work for compute node high availability is captured in an | ||
781 | OpenStack user story and documented upstream, showing proposed solutions, | ||
782 | summit talks and presentations.</para> | ||
783 | |||
784 | <para>A number of these solutions make use of OpenStack Resource Agents, | ||
785 | which are basically a set of specialized pacemaker resources which are | ||
786 | capable of identifying failures in compute nodes and can perform automatic | ||
787 | evacuation of the instances affected by these failures.</para> | ||
788 | |||
789 | <para>ENEA NFV Core 1.0 aims to validate and integrate this work and to | ||
790 | make this feature available in the platform to be used as an alternative | ||
791 | to the Doctor framework, where simple, autonomous recovery of the running | ||
792 | instances is desired.</para> | ||
793 | </section> | ||
794 | </chapter> \ No newline at end of file | ||