summaryrefslogtreecommitdiffstats
path: root/book-enea-nfv-core-installation-guide/doc/high_availability.xml
diff options
context:
space:
mode:
Diffstat (limited to 'book-enea-nfv-core-installation-guide/doc/high_availability.xml')
-rw-r--r--book-enea-nfv-core-installation-guide/doc/high_availability.xml28
1 files changed, 14 insertions, 14 deletions
diff --git a/book-enea-nfv-core-installation-guide/doc/high_availability.xml b/book-enea-nfv-core-installation-guide/doc/high_availability.xml
index 4fe02fe..6d1a9c7 100644
--- a/book-enea-nfv-core-installation-guide/doc/high_availability.xml
+++ b/book-enea-nfv-core-installation-guide/doc/high_availability.xml
@@ -305,7 +305,7 @@
305 305
306 <para>The Zabbix configuration dashboard is available at the same IP 306 <para>The Zabbix configuration dashboard is available at the same IP
307 address where OpenStack can be reached, e.g. 307 address where OpenStack can be reached, e.g.
308 <literal>http://&lt;vip__zbx_vip_mgmt&gt;/zabbix</literal>.</para> 308 <literal>http://10.0.6.42/zabbix</literal>.</para>
309 309
310 <para>To forward zabbix events to Vitrage, a new media script needs to 310 <para>To forward zabbix events to Vitrage, a new media script needs to
311 be created and associated with a user. Follow the steps below as a 311 be created and associated with a user. Follow the steps below as a
@@ -550,8 +550,8 @@ root@node-6:~# systemctl restart vitrage-graph</programlisting>
550 <title>Pacemaker High Availability</title> 550 <title>Pacemaker High Availability</title>
551 551
552 <para>Many of the OpenStack solutions which offer High Availability 552 <para>Many of the OpenStack solutions which offer High Availability
553 characteristics employ pacemaker for achieving highly available OpenStack 553 characteristics employ Pacemaker for achieving highly available OpenStack
554 services. Traditionally pacemaker has been used for managing only the 554 services. Traditionally Pacemaker has been used for managing only the
555 control plane services, so it can effectively provide redundancy and 555 control plane services, so it can effectively provide redundancy and
556 recovery for the Controller nodes only. A reason for this is that 556 recovery for the Controller nodes only. A reason for this is that
557 Controller nodes and Compute nodes essentially have very different High 557 Controller nodes and Compute nodes essentially have very different High
@@ -572,9 +572,9 @@ root@node-6:~# systemctl restart vitrage-graph</programlisting>
572 understood and experimented with, and the basis for this is Pacemaker 572 understood and experimented with, and the basis for this is Pacemaker
573 using Corosync underneath.</para> 573 using Corosync underneath.</para>
574 574
575 <para>Extending the use of pacemaker to Compute nodes was thought as a 575 <para>Extending the use of Pacemaker to Compute nodes was thought as a
576 possible solution for providing VNF high availability, but the problem 576 possible solution for providing VNF high availability, but the problem
577 turned out to be more complicated. On one hand, pacemaker as a clustering 577 turned out to be more complicated. On one hand, Pacemaker as a clustering
578 tool, can only scale properly up to a limited number of nodes, usually 578 tool, can only scale properly up to a limited number of nodes, usually
579 less than 128. This poses a problem for large scale deployments where 579 less than 128. This poses a problem for large scale deployments where
580 hundreds of compute nodes are required. On the other hand, Compute node 580 hundreds of compute nodes are required. On the other hand, Compute node
@@ -584,20 +584,20 @@ root@node-6:~# systemctl restart vitrage-graph</programlisting>
584 <section id="pm_remote"> 584 <section id="pm_remote">
585 <title>Pacemaker Remote</title> 585 <title>Pacemaker Remote</title>
586 586
587 <para>As mentioned earlier, pacemaker and corosync do not scale well 587 <para>As mentioned earlier, Pacemaker and corosync do not scale well
588 over a large cluster, since each node has to talk to every other, 588 over a large cluster, since each node has to talk to every other,
589 essentially creating a mesh configuration. A solution to this problem 589 essentially creating a mesh configuration. A solution to this problem
590 could be partitioning the cluster into smaller groups, but this has its 590 could be partitioning the cluster into smaller groups, but this has its
591 limitations and it is generally difficult to manage.</para> 591 limitations and it is generally difficult to manage.</para>
592 592
593 <para>A better solution is using <literal>pacemaker-remote</literal>, a 593 <para>A better solution is using <literal>pacemaker-remote</literal>, a
594 feature of pacemaker, which allows for extending the cluster beyond the 594 feature of Pacemaker, which allows for extending the cluster beyond the
595 usual limits by using the pacemaker monitoring capabilities. It 595 usual limits by using the Pacemaker monitoring capabilities. It
596 essentially creates a new type of resource which enables adding light 596 essentially creates a new type of resource which enables adding light
597 weight nodes to the cluster. More information about pacemaker-remote can 597 weight nodes to the cluster. More information about pacemaker-remote can
598 be found on the official clusterlabs website.</para> 598 be found on the official clusterlabs website.</para>
599 599
600 <para>Please note that at this moment pacemaker remote must be 600 <para>Please note that at this moment Pacemaker remote must be
601 configured manually after deployment. Here are the manual steps for 601 configured manually after deployment. Here are the manual steps for
602 doing so:</para> 602 doing so:</para>
603 603
@@ -629,7 +629,7 @@ controller, vitrage | | 1 | 1</programlisting>
629 </listitem> 629 </listitem>
630 630
631 <listitem> 631 <listitem>
632 <para>Each controller has a unique pacemaker authkey. One needs to 632 <para>Each controller has a unique Pacemaker authkey. One needs to
633 be kept and propagated to the other servers. Assuming node-1, node-2 633 be kept and propagated to the other servers. Assuming node-1, node-2
634 and node-3 are the controllers, execute the following from the Fuel 634 and node-3 are the controllers, execute the following from the Fuel
635 console:</para> 635 console:</para>
@@ -711,7 +711,7 @@ RemoteOnline: [ node-4.domain.tld node-5.domain.tld ]</programlisting>
711 <title>Pacemaker Fencing</title> 711 <title>Pacemaker Fencing</title>
712 712
713 <para>ENEA NFV Core 1.0 makes use of the fencing capabilities of 713 <para>ENEA NFV Core 1.0 makes use of the fencing capabilities of
714 pacemaker to isolate faulty nodes and trigger recovery actions by means 714 Pacemaker to isolate faulty nodes and trigger recovery actions by means
715 of power cycling the failed nodes. Fencing is configured by creating 715 of power cycling the failed nodes. Fencing is configured by creating
716 <literal>STONITH</literal> type resources for each of the servers in the 716 <literal>STONITH</literal> type resources for each of the servers in the
717 cluster, both Controller nodes and Compute nodes. The 717 cluster, both Controller nodes and Compute nodes. The
@@ -756,7 +756,7 @@ controller, vitrage | | 1 | 1</programlisting>
756 </listitem> 756 </listitem>
757 757
758 <listitem> 758 <listitem>
759 <para>Configure pacemaker fencing resources. This needs to be done 759 <para>Configure Pacemaker fencing resources. This needs to be done
760 once on one of the controllers. The parameters will vary, depending 760 once on one of the controllers. The parameters will vary, depending
761 on the BMC addresses of each node and credentials.</para> 761 on the BMC addresses of each node and credentials.</para>
762 762
@@ -779,7 +779,7 @@ ipaddr=10.0.100.155 login=ADMIN passwd=ADMIN op monitor interval="60s"</programl
779 779
780 <listitem> 780 <listitem>
781 <para>Activate fencing by enabling the <literal>stonith</literal> 781 <para>Activate fencing by enabling the <literal>stonith</literal>
782 property in pacemaker (disabled by default). This also needs to be 782 property in Pacemaker (disabled by default). This also needs to be
783 done only once, on one of the controllers.</para> 783 done only once, on one of the controllers.</para>
784 784
785 <programlisting>[root@node-1:~]# pcs property set stonith-enabled=true</programlisting> 785 <programlisting>[root@node-1:~]# pcs property set stonith-enabled=true</programlisting>
@@ -805,7 +805,7 @@ ipaddr=10.0.100.155 login=ADMIN passwd=ADMIN op monitor interval="60s"</programl
805 <para>The work for Compute node High Availability is captured in an 805 <para>The work for Compute node High Availability is captured in an
806 OpenStack user story and documented upstream, showing proposed solutions, 806 OpenStack user story and documented upstream, showing proposed solutions,
807 summit talks and presentations. A number of these solutions make use of 807 summit talks and presentations. A number of these solutions make use of
808 OpenStack Resource Agents, which are a set of specialized pacemaker 808 OpenStack Resource Agents, which are a set of specialized Pacemaker
809 resources capable of identifying failures in compute nodes and can perform 809 resources capable of identifying failures in compute nodes and can perform
810 automatic evacuation of the instances affected by these failures.</para> 810 automatic evacuation of the instances affected by these failures.</para>
811 811