diff options
Diffstat (limited to 'book-enea-nfv-core-installation-guide/doc/high_availability.xml')
-rw-r--r-- | book-enea-nfv-core-installation-guide/doc/high_availability.xml | 28 |
1 files changed, 14 insertions, 14 deletions
diff --git a/book-enea-nfv-core-installation-guide/doc/high_availability.xml b/book-enea-nfv-core-installation-guide/doc/high_availability.xml index 4fe02fe..6d1a9c7 100644 --- a/book-enea-nfv-core-installation-guide/doc/high_availability.xml +++ b/book-enea-nfv-core-installation-guide/doc/high_availability.xml | |||
@@ -305,7 +305,7 @@ | |||
305 | 305 | ||
306 | <para>The Zabbix configuration dashboard is available at the same IP | 306 | <para>The Zabbix configuration dashboard is available at the same IP |
307 | address where OpenStack can be reached, e.g. | 307 | address where OpenStack can be reached, e.g. |
308 | <literal>http://<vip__zbx_vip_mgmt>/zabbix</literal>.</para> | 308 | <literal>http://10.0.6.42/zabbix</literal>.</para> |
309 | 309 | ||
310 | <para>To forward zabbix events to Vitrage, a new media script needs to | 310 | <para>To forward zabbix events to Vitrage, a new media script needs to |
311 | be created and associated with a user. Follow the steps below as a | 311 | be created and associated with a user. Follow the steps below as a |
@@ -550,8 +550,8 @@ root@node-6:~# systemctl restart vitrage-graph</programlisting> | |||
550 | <title>Pacemaker High Availability</title> | 550 | <title>Pacemaker High Availability</title> |
551 | 551 | ||
552 | <para>Many of the OpenStack solutions which offer High Availability | 552 | <para>Many of the OpenStack solutions which offer High Availability |
553 | characteristics employ pacemaker for achieving highly available OpenStack | 553 | characteristics employ Pacemaker for achieving highly available OpenStack |
554 | services. Traditionally pacemaker has been used for managing only the | 554 | services. Traditionally Pacemaker has been used for managing only the |
555 | control plane services, so it can effectively provide redundancy and | 555 | control plane services, so it can effectively provide redundancy and |
556 | recovery for the Controller nodes only. A reason for this is that | 556 | recovery for the Controller nodes only. A reason for this is that |
557 | Controller nodes and Compute nodes essentially have very different High | 557 | Controller nodes and Compute nodes essentially have very different High |
@@ -572,9 +572,9 @@ root@node-6:~# systemctl restart vitrage-graph</programlisting> | |||
572 | understood and experimented with, and the basis for this is Pacemaker | 572 | understood and experimented with, and the basis for this is Pacemaker |
573 | using Corosync underneath.</para> | 573 | using Corosync underneath.</para> |
574 | 574 | ||
575 | <para>Extending the use of pacemaker to Compute nodes was thought as a | 575 | <para>Extending the use of Pacemaker to Compute nodes was thought as a |
576 | possible solution for providing VNF high availability, but the problem | 576 | possible solution for providing VNF high availability, but the problem |
577 | turned out to be more complicated. On one hand, pacemaker as a clustering | 577 | turned out to be more complicated. On one hand, Pacemaker as a clustering |
578 | tool, can only scale properly up to a limited number of nodes, usually | 578 | tool, can only scale properly up to a limited number of nodes, usually |
579 | less than 128. This poses a problem for large scale deployments where | 579 | less than 128. This poses a problem for large scale deployments where |
580 | hundreds of compute nodes are required. On the other hand, Compute node | 580 | hundreds of compute nodes are required. On the other hand, Compute node |
@@ -584,20 +584,20 @@ root@node-6:~# systemctl restart vitrage-graph</programlisting> | |||
584 | <section id="pm_remote"> | 584 | <section id="pm_remote"> |
585 | <title>Pacemaker Remote</title> | 585 | <title>Pacemaker Remote</title> |
586 | 586 | ||
587 | <para>As mentioned earlier, pacemaker and corosync do not scale well | 587 | <para>As mentioned earlier, Pacemaker and corosync do not scale well |
588 | over a large cluster, since each node has to talk to every other, | 588 | over a large cluster, since each node has to talk to every other, |
589 | essentially creating a mesh configuration. A solution to this problem | 589 | essentially creating a mesh configuration. A solution to this problem |
590 | could be partitioning the cluster into smaller groups, but this has its | 590 | could be partitioning the cluster into smaller groups, but this has its |
591 | limitations and it is generally difficult to manage.</para> | 591 | limitations and it is generally difficult to manage.</para> |
592 | 592 | ||
593 | <para>A better solution is using <literal>pacemaker-remote</literal>, a | 593 | <para>A better solution is using <literal>pacemaker-remote</literal>, a |
594 | feature of pacemaker, which allows for extending the cluster beyond the | 594 | feature of Pacemaker, which allows for extending the cluster beyond the |
595 | usual limits by using the pacemaker monitoring capabilities. It | 595 | usual limits by using the Pacemaker monitoring capabilities. It |
596 | essentially creates a new type of resource which enables adding light | 596 | essentially creates a new type of resource which enables adding light |
597 | weight nodes to the cluster. More information about pacemaker-remote can | 597 | weight nodes to the cluster. More information about pacemaker-remote can |
598 | be found on the official clusterlabs website.</para> | 598 | be found on the official clusterlabs website.</para> |
599 | 599 | ||
600 | <para>Please note that at this moment pacemaker remote must be | 600 | <para>Please note that at this moment Pacemaker remote must be |
601 | configured manually after deployment. Here are the manual steps for | 601 | configured manually after deployment. Here are the manual steps for |
602 | doing so:</para> | 602 | doing so:</para> |
603 | 603 | ||
@@ -629,7 +629,7 @@ controller, vitrage | | 1 | 1</programlisting> | |||
629 | </listitem> | 629 | </listitem> |
630 | 630 | ||
631 | <listitem> | 631 | <listitem> |
632 | <para>Each controller has a unique pacemaker authkey. One needs to | 632 | <para>Each controller has a unique Pacemaker authkey. One needs to |
633 | be kept and propagated to the other servers. Assuming node-1, node-2 | 633 | be kept and propagated to the other servers. Assuming node-1, node-2 |
634 | and node-3 are the controllers, execute the following from the Fuel | 634 | and node-3 are the controllers, execute the following from the Fuel |
635 | console:</para> | 635 | console:</para> |
@@ -711,7 +711,7 @@ RemoteOnline: [ node-4.domain.tld node-5.domain.tld ]</programlisting> | |||
711 | <title>Pacemaker Fencing</title> | 711 | <title>Pacemaker Fencing</title> |
712 | 712 | ||
713 | <para>ENEA NFV Core 1.0 makes use of the fencing capabilities of | 713 | <para>ENEA NFV Core 1.0 makes use of the fencing capabilities of |
714 | pacemaker to isolate faulty nodes and trigger recovery actions by means | 714 | Pacemaker to isolate faulty nodes and trigger recovery actions by means |
715 | of power cycling the failed nodes. Fencing is configured by creating | 715 | of power cycling the failed nodes. Fencing is configured by creating |
716 | <literal>STONITH</literal> type resources for each of the servers in the | 716 | <literal>STONITH</literal> type resources for each of the servers in the |
717 | cluster, both Controller nodes and Compute nodes. The | 717 | cluster, both Controller nodes and Compute nodes. The |
@@ -756,7 +756,7 @@ controller, vitrage | | 1 | 1</programlisting> | |||
756 | </listitem> | 756 | </listitem> |
757 | 757 | ||
758 | <listitem> | 758 | <listitem> |
759 | <para>Configure pacemaker fencing resources. This needs to be done | 759 | <para>Configure Pacemaker fencing resources. This needs to be done |
760 | once on one of the controllers. The parameters will vary, depending | 760 | once on one of the controllers. The parameters will vary, depending |
761 | on the BMC addresses of each node and credentials.</para> | 761 | on the BMC addresses of each node and credentials.</para> |
762 | 762 | ||
@@ -779,7 +779,7 @@ ipaddr=10.0.100.155 login=ADMIN passwd=ADMIN op monitor interval="60s"</programl | |||
779 | 779 | ||
780 | <listitem> | 780 | <listitem> |
781 | <para>Activate fencing by enabling the <literal>stonith</literal> | 781 | <para>Activate fencing by enabling the <literal>stonith</literal> |
782 | property in pacemaker (disabled by default). This also needs to be | 782 | property in Pacemaker (disabled by default). This also needs to be |
783 | done only once, on one of the controllers.</para> | 783 | done only once, on one of the controllers.</para> |
784 | 784 | ||
785 | <programlisting>[root@node-1:~]# pcs property set stonith-enabled=true</programlisting> | 785 | <programlisting>[root@node-1:~]# pcs property set stonith-enabled=true</programlisting> |
@@ -805,7 +805,7 @@ ipaddr=10.0.100.155 login=ADMIN passwd=ADMIN op monitor interval="60s"</programl | |||
805 | <para>The work for Compute node High Availability is captured in an | 805 | <para>The work for Compute node High Availability is captured in an |
806 | OpenStack user story and documented upstream, showing proposed solutions, | 806 | OpenStack user story and documented upstream, showing proposed solutions, |
807 | summit talks and presentations. A number of these solutions make use of | 807 | summit talks and presentations. A number of these solutions make use of |
808 | OpenStack Resource Agents, which are a set of specialized pacemaker | 808 | OpenStack Resource Agents, which are a set of specialized Pacemaker |
809 | resources capable of identifying failures in compute nodes and can perform | 809 | resources capable of identifying failures in compute nodes and can perform |
810 | automatic evacuation of the instances affected by these failures.</para> | 810 | automatic evacuation of the instances affected by these failures.</para> |
811 | 811 | ||