High Availability Cluster Plugin User Guide 3000-hac-v2.5-000023-B v2.5x
Transcription
High Availability Cluster Plugin User Guide 3000-hac-v2.5-000023-B v2.5x
High Availability Cluster Plugin User Guide v2.5x 3000-hac-v2.5-000023-B Copyright © 2012 Nexenta® Systems, ALL RIGHTS RESERVED Notice: No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or stored in a database or retrieval system for any purpose, without the express written permission of Nexenta Systems (hereinafter referred to as “Nexenta”). Nexenta reserves the right to make changes to this documentation at any time without notice and assumes no responsibility for its use. Nexenta products and services only can be ordered under the terms and conditions of Nexenta Systems’ applicable agreements. All of the features described in this documentation may not be available currently. Refer to the latest product announcement or contact your local Nexenta Systems sales office for information on feature and product availability. This documentation includes the latest information available at the time of publication. Nexenta is a registered trademark of Nexenta Systems in the United States and other countries. All other trademarks, service marks, and company names in this documentation are properties of their respective owners. All other trademarks, service marks, and company names in this documentation are properties of their respective owners. Product Versions Applicable to this Documentation: Product Versions supported NexentaStor NexentaStor v3.x ii High Availability Cluster Plugin User Guide Contents Preface 1 ................................................v Introduction ............................................1 About Introduction Product Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Server Monitoring and Failover Storage Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Exclusive Access to Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 SCSI-2 PGR for Additional Protection Service Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Additional Resources 2 Installation & Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 ......................................5 About Installation and Setup Prerequisites Adding Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Installing Plugins 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Sample Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Configuring the HA Cluster .................................9 About Configuring the HA Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 Binding the Nodes Together with SSH Configuring the HA Cluster 4 . . . . . . . . . . . . . . . . . . . . . . . . .4 . . . . . . . . . . . . . . . . . . . . . . . . . . .9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Configuring the Cluster’s Shared Volumes . . . . . . . . . . . . . . . . . . . . 11 About Configuring the Cluster’s Shared Volumes Configuring the Cluster’s Shared Volumes Importing the Current Node . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Heartbeat and Network Interfaces About Heartbeat and Network Interfaces Heartbeat Mechanism . . . . . . . . . . . . . . . . . . . . . . . 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Adding a Virtual IP or Hostname 5 . . . . . . . . . . . . . . . . . . . 11 . . . . . . . . . . . . . . . . . . . . . . . . . . 13 . . . . . . . . . . . . . . . . . . . . . . . . 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 High Availability Cluster Plugin User Guide iii Contents Configuring the Cluster and Heartbeat Interfaces Serial Link 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Configuring Storage Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 About Configuring Storage Failover Cluster Configuration Data Mapping Information NFS/CIFS Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Configuring iSCSI Targets for Failover . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Configuring Fibre Channel Targets for Failover 7 Advanced Setup About Advanced Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Setting Failover Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 System Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 About System Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Checking Status of Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Checking Cluster Failover Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Failure Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Service Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Replacing a Faulted Node Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 System Upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Upgrade Procedure 9 . . . . . . . . . . . . . . . . . . . . 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Adding Additional Virtual Hostnames 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Testing and Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 About Testing and Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Resolving Name Conflicts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Specifying Cache Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Manually Triggering a Failover Verifying DNS Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Verifying Moving Resources Between Nodes . . . . . . . . . . . . . . . . . . . . . . 33 Verifying Failing Service Back to Original Node Gathering Support Logs Index iv . . . . . . . . . . . . . . . . . . 14 . . . . . . . . . . . . . . . . . . . . 34 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 High Availability Cluster Plugin User Guide Preface This documentation presents information specific to Nexenta Systems, Inc. products. The information is for reference purposes and is subject to change. Intended Audience The information in this document is confidential and may not be disclosed to any third parties without the prior written consent of Nexenta Systems, Inc.. This documentation is intended for Network Storage Administrators. It assumes that you have experience NexentaStor and with data storage concepts, such as NAS, SAN, NFS, and ZFS. Documentation History The following table lists the released revisions of this documentation. Table 1: Documentation Revision History Revision Date Description 3000-hac-v2.5-000023-B December, 2012 GA Contacting Support • Visit the Nexenta Systems, Inc. customer portal http:// www.nexenta.com/corp/support/support-overview/account-management. Login and browse the customers knowledge base. Choose a method for contacting support: • Using the NexentaStor user interface, NMV (Nexenta Management View): a. Click Support. b. Complete the request form. c. Click Send Request. • Using the NexentaStor command line, NMC (Nexenta Management Console): a. At the command line, type support. b. Complete the support wizard. High Availability Cluster Plugin User Guide v Preface Comments Your comments and suggestions to improve this documentation are greatly appreciated. Send any feedback to [email protected] and include the documentation title, number, and revision. Refer to specific pages, sections, and paragraphs whenever possible. vi High Availability Cluster Plugin User Guide 1 Introduction This section includes the following topics: • About Introduction • Product Features • Server Monitoring and Failover • Storage Failover • Exclusive Access to Storage • Service Failover • Additional Resources About Introduction The following section explains high-availability and failover concepts. Product Features The Nexenta High Availability Cluster (HAC) plugin provides a storage volume-sharing service. You make one or more shared volumes highly available by detecting system failures and transferring ownership of shared volumes to the other server in the cluster pair. An HA Cluster consists of two NexentaStor appliances. Neither system is designated as the primary or secondary system. You can manage both systems actively for shared storage, although only one system owns each volume at a time. HA Cluster is based on RSF-1 (Resilient Server Facility), an industry-leading high-availability and cluster middleware application that ensures critical applications and services continue to run in the event of system failures. High Availability Cluster Plugin User Guide 1 Introduction Server Monitoring and Failover HA Cluster provides server monitoring and failover. Protection of services, such as iSCSI, involves cooperation with other modules such as the SCSI Target plugin. You can execute NMC commands on all appliances in the group. An HA Cluster consists of: • NexentaStor Appliances — Runs a defined set of services and monitors each other for failures. HAC connects these NexentaStor appliances through various communication channels, through which they exchange heartbeats that provide information about their states and the services that on them. • RSF-1 Cluster Service — Transferable unit consists of: • Application start-up and shutdown code • Network identity and appliance data You can migrate services between cluster appliances manually, or automatically, if one appliance fails. To view the existing groups of appliances, using NMC: Type: nmc:/$ show group To view the existing groups of appliances, using NMV: 1. Click Status > General. 2. In the Appliances panel, click <host_name>. Storage Failover The primary benefit of HA Cluster is to detect storage system failures and transfer ownership of shared volumes to the alternate NexentaStor appliance. All configured services fail over to the other server. HA Cluster ensures service continuity during exceptional events, including power outages, disk failures, appliances that run out of memory or crash, and other failures. Currently, the minimum time to detect that an appliance has failed is approximately 10 seconds. The failover and recovery time is largely dependent on the amount of time it takes to re-import the data volume on the alternate appliance. Best practices to reduce the failover time include using fewer zvols and file systems for each data volume. When using fewer file systems, you may want to use other properties, such as reservations and quotas, to control resource contention between multiple applications. In the default configuration, HA Cluster implements failover storage services if network connectivity is lost. HA Cluster automatically determines which network device to monitor based on the services that are bound to an interface. It checks all nodes in the cluster, so even if a node is not running 2 High Availability Cluster Plugin User Guide Introduction any services, HA Cluster continues to monitor the unused interfaces. If the state of one changes to offline, it prevents failover to this node for services that are bound to that interface. When the interface recovers, HA Cluster enables failover for that interface again. Other types of failure protection include link aggregation for network interfaces and MPxIO for protection against SAS link failures. For example, it does not move local Users that were configured for the NexentaStor appliance. Nexenta highly recommends that you use a directory service such as LDAP for that case. Exclusive Access to Storage You access a shared volume exclusively through the appliance that currently provides the corresponding volume-sharing service. To ensure this exclusivity, HA Cluster provides reliable fencing through the utilization of multiple types of heartbeats. Fencing is the process of isolating a node in a HA cluster, and/or protecting shared resources when a node malfunctions. Heartbeats, or pinging, allow for constant communication between the servers. The most important of these is the disk heartbeat in conjunction with any other type. Generally, additional heartbeat mechanisms increase reliability of the cluster's fencing logic; the disk heartbeats, however, are essential. HA Cluster can reboot the failed appliance in certain cases: • Failure to export the shared volume from an appliance that has failed to provide the (volume-sharing) service. This functionality is analogous to Stonith, the technique for fencing in computer clusters. In addition, NexentaStor RSF-1 cluster provides a number of other fail safe mechanisms: • When you start a (volume sharing) service, make sure that the IP address associated with that service is NOT attached to any interface. The cluster automatically detects and reports if an interface is using the IP address. If it is, the local service does not perform the start-up sequence. On disc systems which support a SCSI reservation, you can place a disc before accessing the file systems, and have the system set to panic if it loses the reservation. This feature also serves to protect the data on a disc system. HA Cluster also supports SCSI-2 reservations. ! Note: High Availability Cluster Plugin User Guide 3 Introduction SCSI-2 PGR for Additional Protection HA Cluster employs SCSI-2 PGR (persistent group reservations) for additional protection. PGR enables access for multiple nodes to a device and simultaneously blocks access for other nodes. You can enable PGR by issuing SCSI reservations on the devices in a volume before you import them. This feature enforces data integrity which prevents the pool from importing into two nodes at any one time. Nexenta recommends that you always deploy the HAC Cluster with a shared disk (quorum device) and at least one or more heartbeat channels (Ethernet or serial). This configuration ensures that the cluster always exclusive access that is independent of the storage interconnects used in the Cluster. ! SCSI-3 PGR is not supported for HA Cluster because it does not work with SATA drives and has certain other limitations. Note: Service Failover As discussed previously, system failures result in the failover of ownership of the shared volume to the alternate node. As part of the failover process, HA Cluster migrates the storage services that are associated with the shared volume(s) and restarts the services on the alternate node. Additional Resources Nexenta has various professional services offerings to assist with managing HA Cluster. Nexenta strongly encourages a services engagement to plan and install the plugin. Nexenta also offers training courses on high availability and other features. For service and training offerings, go to our website at: http://www.nexenta.com For troubleshooting cluster issues, contact: [email protected] For licensing questions, email: [email protected] HA cluster is based on the Resilient Server Facility from HighAvailability.com. For additional information on cluster concepts and theory of operations, visit their website at: http://www.high-availability.com For more advanced questions that are related to the product, check our FAQ for the latest information: http://www.nexenta.com/corp/frequently-asked-questions 4 High Availability Cluster Plugin User Guide 2 Installation & Setup This section includes the following topics: • About Installation and Setup • Prerequisites • Adding Plugins • Sample Network Architecture About Installation and Setup The HA Cluster plugin provides high-availability functionality for NexentaStor. You must install this software on each NexentaStor appliance in the cluster. This section describes how to set up and install HAC on both appliances. Prerequisites HA Cluster requires shared storage between the NexentaStor Clustered appliances. You must also set up: • One IP address for each cluster service unit (zvol, NAS folder or iSCSI LUN) • Multiple NICs (ethernet cards) on different subnets for cluster heartbeat and NMV management (This is a good practice, but not mandatory) • DNS entry for each service name in the cluster NexentaStor supports the use of a separate device as a transaction log for committed writes. HA Cluster requires that you make the ZFS Intent Log (ZIL) part of the same storage system as the shared volume. ! SCSI and iSCSI failover services use the SCSI Target plugin, which is included with the NexentaStor software. Note: High Availability Cluster Plugin User Guide 5 Installation & Setup Adding Plugins The HAC plugin installs just as any other NexentaStor plugin installs. Installing Plugins You can install plugins through both NMC and NMV. To install a plugin, using NMC: 1. Type: nmc:/$ setup plugin <plugin_name> Example: nmc:/$ setup plugin rsf-cluster 2. Confirm or cancel the installation. 3. Repeat Step 1 — Step 2 for the other node. To install a plugin, using NMV: 1. Click Settings > Appliance. 2. In the Administration panel, click Plugins. ! The plugins may not be immediately available from your NexentaStor repository. It can take up to six hours before the plugins become available. Note: 3. Click Add Plugin in the Remotely-available plugins section. 4. Confirm or cancel the installation. 5. Repeat Step 1 — Step 4 for the other node. Sample Network Architecture The cluster hardware setup needs: • Two x86/64-bit systems with a SAS-connected JBOD • Two network interface cards (not mandatory, but good practice) The following illustration is an example of an HA Cluster deployment of a Nexenta iSCSI environment. The host server attaches to iSCSI LUNS which are connected to the Nexenta appliances node A and node B. The Nexenta appliances use the Active/Passive function of the HA cluster. Node A services one group of iSCSI luns while node B presents a NAS storage LUN. 6 High Availability Cluster Plugin User Guide Installation & Setup Figure 2-1: High Availability Configuration EyEd^dKZ,/',s/>>/>/dz KE&/'hZd/KE >/Ed^/EdtKZ< ,K^d^ZsZ /^^/^E WZ/DZz Wd, EyEd WW>/E͗ ŶŽĚĞ >dZEdWd, EyEd WW>/E͗ ŶŽĚĞ ^,ZsK>hD High Availability Cluster Plugin User Guide 7 Installation & Setup This page intentionally left blank 8 High Availability Cluster Plugin User Guide 3 Configuring the HA Cluster This section includes the following topics: • About Configuring the HA Cluster • Binding the Nodes Together with SSH • Configuring the HA Cluster About Configuring the HA Cluster You can configure and manage the HA cluster through the appliance’s web interface, the Nexenta Management View (NMV), or the Nexenta Management Console (NMC). Binding the Nodes Together with SSH You must bind the two HA nodes together with the SSH protocol so that they can communicate. To bind the two nodes, using NMC: 1. Type the following on node A: nmc:/$ setup network ssh-bind root@<IP_address_nodeB> 2. When prompted, type the Super User password. 3. Type the following on node B: nmc:/$ setup network ssh-bind root@<IP_address_nodeA> 4. When prompted, type the Super User password. ! Note: If ssh-binding fails, you can manually configure the /etc/hosts/ file, which contains the Internet host table. (Type Setup appliance hosts to access the file.) High Availability Cluster Plugin User Guide 9 Configuring the HA Cluster Configuring the HA Cluster You need to configure multiple options for the HA cluster before you can use it successfully. You cannot assign a NexentaStor appliance to more than one HA cluster. ! Note: To configure an HA cluster, using NMV: 1. Select Settings > HA Cluster. 2. Type the Admin name and password. 3. Click Initialize. 4. Type or change the Cluster name. 5. Type a description, (optional). 6. Select the following parameters: • Enable Network Monitoring The Cluster monitors the network for nodes. • Configure Serial Heartbeat The nodes exchange serial heartbeat packets through a dedicated RS232 serial link between the appliances, using a custom protocol. 7. Click Configure. 8. Repeat Step 1 — Step 7 for the second node. To configure an HA cluster, using NMC: Type: nmc:/$ create group rsf-cluster Use the following options when prompted: 10 • Group name: <cluster_name> • Appliances: nodeA, nodeB • Description: <write a description> • Heartbeat disk: <disk_name> • Enable inter-appliance heartbeat through primary interfaces?: Yes • Enable inter-appliance heartbeat through serial ports?: No High Availability Cluster Plugin User Guide 4 Configuring the Cluster’s Shared Volumes This section includes the following topics: • About Configuring the Cluster’s Shared Volumes • Configuring the Cluster’s Shared Volumes About Configuring the Cluster’s Shared Volumes After setting up the HA cluster, you must create one or more shared volumes. Configuring the Cluster’s Shared Volumes After cluster initialization, NexentaStor automatically redirects you to the adding volume services page. The shared logical hostname is a name associated with the failover IP interface that is moved to the alternate node as part of the failover. ! If you receive an error indicating that the shared logical hostname is not resolvable, see Resolving Name Conflicts. Note: In the event of a system failure, once the volume is shared, the volume remains accessible to Users as long as one of the systems continues to run. Importing the Current Node Although the appliances can access all of the shared volumes, only the volumes that have been imported to the current appliance display in the list. If you want to create a new cluster service with a specific shared volume, you must import this volume to the current node. High Availability Cluster Plugin User Guide 11 Configuring the Cluster’s Shared Volumes To import the volume to the current node, using NMC: 1. Type: setup group rsf-cluster <cluster_name> shared volume add System response: Scanning for volumes accessible from all appliances 2. Validate the cluster interconnect and share the volume: • ...verify appliances interconnect: Yes • Initial timeout: 60 • Standard timeout: 60 Adding a Virtual IP or Hostname You can add virtual IPs/hostnames per volume service. To add a virtual IP address, using NMC: 1. Type: nmc:/$ setup group rsf-cluster <service name> vips add System response: nmc:/$ VIP____ 2. Type the virtual IP address. To add a virtual IP address, using NMV: 1. In the Cluster Settings panel, click Advanced. 2. Click Additional Virtual Hostnames. 3. Click Add a new virtual hostname. 4. Type values for the following: • Virtual Hostname • Netmask • Interface on node A • Interface on node B 5. Click Add. 12 High Availability Cluster Plugin User Guide 5 Heartbeat and Network Interfaces This section includes the following topics: • About Heartbeat and Network Interfaces • Heartbeat Mechanism • Configuring the Cluster and Heartbeat Interfaces • Serial Link About Heartbeat and Network Interfaces NexentaStor appliances in the HA Cluster constantly monitor the states and status of the other appliances in the Cluster through heartbeats. Because HA Cluster servers must determine that an appliance (member of the cluster) has failed before taking over its services, you configure the cluster to use several communication channels through which to exchange heartbeats. Heartbeat Mechanism In Nexenta, VDEV labels of devices in the shared volume perform the heartbeat function. If a shared volume consists of a few disks, NexentaStor uses VDEV labels for two disks for the heartbeat mechanism. You can specify which disks. Though the quorum disk option still remains in the configuration file, Nexenta recommends using the shared volume's labels. The heartbeat mechanism uses sectors 512 and 518 in the blank 8K space of the VDEV label on each of the shared disks. The loss of all heartbeat channels represents a failure. If an appliance wrongly detects a failure, it may attempt to start a service that is already running on another server, leading to so-called “split brain” syndrome. This can result in confusion and data corruption. Multiple, redundant heartbeats prevent this from occurring. High Availability Cluster Plugin User Guide 13 Heartbeat and Network Interfaces HA Cluster supports the following types of heartbeat communication channels: • Shared Disk/Quorum Device Accessible and writable from all appliances in the cluster or VDEV labels of the devices in the shared volume. • Network Interfaces Including configured interfaces, unconfigured interfaces, and link aggregates. • Serial Links If two NexentaStor appliances do not share any services, then they do not require direct heartbeats between them. However, each member of a cluster must transmit at least one heartbeat to propagate control and monitoring requests. The heartbeat monitoring logic is defined by two parameters: X and Y, where: • X equals the number of failed heartbeats the interface monitors before taking any action • Y represents the number of active heartbeats an interface monitors before making it available again to the cluster. The current heartbeat defaults are 3 and 2 heartbeats, respectively. NexentaStor also provides protection for network interfaces through link aggregation. You can set up aggregated network interfaces using NMC or NMV. Configuring the Cluster and Heartbeat Interfaces When you define the cluster, note that a NexentaStor appliance cannot belong to more than one cluster. To define the HA cluster, using NMC: Type: nmc:/$ create group rsf-cluster System response: Group name : cluster-example Appliances : nodeA, nodeB Description : some description Scanning for disks accessible from all appliances ... Heartbeat disk : c2t4d0 Enable inter-appliance heartbeat through dedicated heartbeats disk? No Enable inter-appliance heartbeat through primary interfaces? Yes Enable inter-appliance heartbeat through serial ports? No Custom properties : Bringing up the cluster nodes, please wait ... 14 High Availability Cluster Plugin User Guide Heartbeat and Network Interfaces Jun 20 12:18:39 nodeA RSF-1[23402]: [ID 702911 local0.alert] RSF-1 cold restart: All services stopped. RSF-1 cluster 'cluster-example' created. Initializing ..... done. To configure the Cluster, using NMV: 1. Click Settings > HA Cluster > Initialize. 2. Type in a Cluster Name and Description. 3. Select the following options: • Enable Network Monitoring • Configure Serial Heartbeat 4. Click Yes to create the initial configuration. Click OK. The cluster is initialized. You can add shared volumes to cluster. To change heartbeat properties, using NMC: Type: nmc:/$ setup group rsf-cluster <cluster_name> hb_properties System response: • Enable inter-appliance heartbeat through primary interfaces?: Yes • Enable inter-appliance heartbeat through serial ports?: No • Proceed: Yes To add additional hostnames to a volume service: 1. Click Advanced > Additional Virtual Hostnames. 2. Click Add a new virtual hostname. Serial Link HAC exchanges serial heartbeat packets through a dedicated RS232 serial link between any two appliances, using a custom protocol. To prevent routing problems affecting this type of heartbeat, do not use IP on this link. The serial link requires: • Spare RS232 serial ports on each HA Cluster server • Crossover, or null modem RS232 cable, with an appropriate connector on each end You can use null modem cables to connect pieces of the Data Terminal Equipment (DTE) together or attach console terminals. On each server, enable the relevant serial port devices but disable any login, modem or printer services running on it. Do not use the serial port for any other purpose. High Availability Cluster Plugin User Guide 15 Heartbeat and Network Interfaces To configure serial port heartbeats: Type Yes to the following question during HA cluster group creation: Enable inter-appliance heartbeat through serial ports? 16 High Availability Cluster Plugin User Guide 6 Configuring Storage Failover This section includes the following topics: • About Configuring Storage Failover • Cluster Configuration Data • Mapping Information • NFS/CIFS Failover • Configuring iSCSI Targets for Failover • Configuring Fibre Channel Targets for Failover About Configuring Storage Failover HA Cluster detects storage system failures and transfers ownership of shared volumes to the alternate NexentaStor appliance. HA Cluster ensures service continuity in the presence of service level exceptional events, including power outage, disk failures, appliance running out of memory or crashing, etc. Cluster Configuration Data When you configure SCSI targets in a cluster environment, make sure that you are consistent with configurations and mappings across the cluster members. HAC automatically propagates all SCSI Target operations. However, if the alternate node is not available at the time of the configuration change, problems can occur. By default, the operation results in a warning to the User that the remote update failed. You can also set HA Cluster to synchronous mode. In this case, the action fails completely if the remote update fails. To protect local configuration information that did not migrate, periodically save this configuration to a remote site (perhaps the alternate node) and then use NMC commands to restore it in the event of a failover. High Availability Cluster Plugin User Guide 17 Configuring Storage Failover To save the cluster, using NMC: Type: setup application configuration -save To restore the cluster, using NMC: Type: setup application configuration -restore Following are examples of using NMC commands to synchronize the cluster configuration, if one of the nodes is not current. In the following examples, node A contains the latest configuration and node B needs updating. Example: To run this command from node A: Type: nmc:/$ setup iscsi config restore-to nodeB The above command: ! • Saves the configuration of node A • Copies it to node B • Restores it on node B Restore operations are destructive. Only perform them during a planned downtime window. Note: Example: To run this command from node B: Type: nmc:/$ setup iscsi config restore-from nodeA The restore command saves key configuration data that includes: • Target groups • Host groups (stmf.config) • Targets • Initiators • Target portal groups (iscsi.conf) If you use CHAP authentication, and you configured the CHAP configuration through NMC or NMV, then you can safely save and restore the configuration. 18 High Availability Cluster Plugin User Guide Configuring Storage Failover Mapping Information Use SCSI Target to map zvols from the cluster nodes to client systems. It is critical that the cluster nodes contain the same mapping information. Mapping information is specific to the volume and is stored with the volume itself. You can perform manual maintenance tasks on the mapping information using the mapmgr command. NFS/CIFS Failover You can use HA Cluster to ensure the availability of NFS shares to users. However, note that HA Cluster does not detect the failure of the NFS server software. Configuring iSCSI Targets for Failover You can use HA Cluster to failover iSCSI volumes from one cluster node to another. The target IQN moves as part of the failover. Setting up iSCSI failover involves setting up a zvol in the shared volume. Note that you perform the process of creating a zvol and sharing it through iSCSI separately from the HA Cluster configuration. If you create iSCSI zvols before marking the zvol’s volume as a shared cluster volume, then when you share the cluster volume as an active iSCSI session, it may experience some delays. Depending on the network, application environment and active workload, you may also see command level failures or disconnects during this period. When you add a shared volume to a cluster which has zvols created as back up storage for iSCSI targets, it is vital that you configure all client iSCSI initiators, regardless of the operating system, to access those targets using the shared logical hostname that is specified when the volume service was created, rather than a real hostname associated with one of the appliances. Note that the cluster manages all aspects of the shared logical hostname configuration. Therefore, do not configure the shared logical hostname manually. Furthermore, unless the shared volume service is running, the shared logical hostname is not present on the network, however, you can verify it with the ICMP ping command. To configure iSCSI targets on the active appliance, using NMV: 1. Click Data Management > SCSI Target > zvols. 2. Create a virtual block device using the shared volume. Make the virtual block device >200MB. High Availability Cluster Plugin User Guide 19 Configuring Storage Failover HAC automatically migrates the newly created zvol to the other appliance on failover. Therefore, you do not have to duplicate it manually. 3. From the iSCSI pane, click iSCSI > Target Portal Groups and define a target portal group. ! Note: It is critical that the IPv4 portal address is the shared logical hostname specified when the volume service was created, instead of a real hostname associated with one of the appliances. HAC automatically replicates the newly created target portal group to the other appliance. To create an iSCSI target and add it to the target portal group, using NMV: 1. Click iSCSI > Targets. This limits zvol visibility from client initiators to the target portal group. The newly created iSCSI target is automatically replicated to the other appliance 2. Type a name and an alias. The newly created iSCSI target displays in the Targets page. To create a LUN mapping to the zvol, using NMV: 1. From the SCSI Target pane, click Mappings. This creates a LUN mapping to the zvol for use as backup storage for the iSCSI target. The newly created LUN mapping is automatically migrated to the other appliance on failover. 2. On the client, configure the iSCSI initiator to use both the IQN of the iSCSI target created and the shared logical hostname associated with both the volume service and the target portal group to access the zvol through iSCSI. Failover time varies depending on the environment. As an example, initiating failover for a pool containing six zvols, the observed failover time is 32 seconds. Nodes may stall while the failover occurs, but otherwise recover quickly. See Also: • “Managing SCSI Targets” in the NexentaStor User Guide Configuring Fibre Channel Targets for Failover As a prerequisite for configuring Fibre Channel targets, change the HBA port modes of both appliances from Initiator mode to Target mode. To change the HBA port mode, using NMV: 1. Click Data Management > SCSI Target Plus > Fibre Channel > Ports. 20 High Availability Cluster Plugin User Guide Configuring Storage Failover 2. Select Target from the dropdown menu. 3. Once you change the HBA port modes of both appliances from Initiator mode to Target mode, reboot both appliances so the Target mode changes can take effect. To configure Fibre Channel targets on the appliance, in NMV: 1. Click Data Management > SCSI Target on the appliance where the volume service is currently running. 2. In the zvols pane, click Create. 3. Create a zvol with the following characteristics: • Virtual block device: 200 MB • Use the shared volume: Yes The newly created zvol is automatically migrated to the other appliance on failover. 4. From the SCSI Target pane, click Mappings. This creates a LUN mapping to the zvol for use as backup storage for the iSCSI target. The newly created LUN mapping is automatically migrated to the other appliance on failover. High Availability Cluster Plugin User Guide 21 Configuring Storage Failover This page intentionally left blank 22 High Availability Cluster Plugin User Guide 7 Advanced Setup This section includes the following topics: • About Advanced Setup • Setting Failover Mode • Adding Additional Virtual Hostnames About Advanced Setup This section describes advanced functions of HA Cluster, such as setting the failover mode, adding virtual hostnames and volumes, and other miscellaneous options. Setting Failover Mode The failover mode defines whether or not an appliance attempts to start a service when it is not running. There are separate failover mode settings for each appliance that can run a service. You can set the failover modes to automatic or manual. In automatic mode, the appliance attempts to start the service when it detects that there is no available parallel appliance running in the cluster. In manual mode, it does not attempt to start the service, but it generates warnings when it is not available. If the appliance cannot obtain a definitive answer about the state of the service, or the service is not running anywhere else, the appropriate timeout must expire before you can take any action. The primary service failover modes are typically set to automatic to ensure that an appliance starts its primary service(s) on boot up. Note that putting a service into manual mode when the service is already running does not stop that service, it only prevents the service from starting on that appliance. To set the failover mode to manual, using NMV: 1. Click Advanced Setup > Cluster Operations > Set all Manual. High Availability Cluster Plugin User Guide 23 Advanced Setup 2. Click Yes to confirm. Until you select Set All Services to manual, automatic or stopped, the Restore and Clear Saved State buttons do not display. ! Note: Before HAC performs an operation, it saves the state of the services in the cluster, which you can later re-apply to the cluster using the restore button. Once HAC restores the service state, HAC clears the saved state. To set the failover mode to manual, using NMC: Type: nmc:/$ setup group rsf-cluster <cluster_name> sharedvolume <volume_name> manual To set the failover mode to automatic, using NMV: 1. Click Advanced Setup > Cluster Operations > Set all Automatic 2. Click Yes to confirm. To set the failover mode to manual, using NMC: Type: nmc:/$ setup group rsf-cluster <cluster_name> sharedvolume <volume_name> automatic To stop all services in the Cluster, using NMV: 1. Click Stop All Services. 2. Click Yes to confirm. Adding Additional Virtual Hostnames After you have initialized the cluster, you can add shared volumes and virtual hostnames to a cluster. See Also: • 24 Adding a Virtual IP or Hostname High Availability Cluster Plugin User Guide 8 System Operations This section includes the following topics: • About System Operations • Checking Status of Cluster • Checking Cluster Failover Mode • Failure Events • Service Repair • Replacing a Faulted Node • Maintenance • System Upgrades About System Operations There are a variety of commands and GUI screens to help you with daily cluster operations. There is a set of cluster-specific commands to supplement NMC. Checking Status of Cluster You can check the status of the Cluster at any time. To see a list of available commands, using NMC: Type: help keyword cluster ! Note: Although both cluster nodes can access a shared volume using the show volume command, that command only shows the volume if it's running on the node currently owning that volume. High Availability Cluster Plugin User Guide 25 System Operations You can use NMV to check the overall cluster status. To check the status of the Cluster, using NMV: Click Settings > HA Cluster. To check the status of the Cluster, using NMC: Type: show group rsf-cluster Checking Cluster Failover Mode You can configure HA Cluster to detect failures and alert the user, or to failover the shared volumes automatically. To check the current appliance configuration mode: Type: nmc:/$ show group rsf-cluster <cluster_name> System response: PROPERTY name appliances hbipifs netmon info generation refresh_timestamp hbdisks type creation : : : : : : : : : : VALUE HA-Cluster [nodeB nodeA] nodeB:nodeA: nodeA:nodeB: 1 Nexenta HA Cluster 1 1297423688.31731 nodeA:c2t1d0 nodeB:c2t0d0 rsf-cluster Feb 11 03:28:08 2011 SHARED VOLUME: ha-test svc-ha-test-ipdevs : rsf-data nodeB:e1000g1 nodeA:e1000g1 svc-ha-test-main-node : nodeA svc-ha-test-shared-vol-name : ha-test HA CLUSTER STATUS: HA-Cluster nodeA: ha-test e1000g1 nodeB: ha-test e1000g1 26 running 60 auto unblocked rsf-data 60 stopped 60 auto unblocked rsf-data 60 High Availability Cluster Plugin User Guide System Operations Failure Events NexentaStor tracks various appliance components, and their state. If and when failover occurs (or any service changes to a broken state), NexentaStor sends an email to the administrator describing the event. ! During the NexentaStor installation, you setup SMTP configuration and test that you can receive emails from the appliance. Note: Service Repair There are two broken states: • Broken_Safe A problem occurred while starting the service on the server, but it was stopped safely and you can run it elsewhere. • Broken_Unsafe A fatal problem occurred while starting or stopping the service on the server. The service cannot run on any other server in the cluster until it is repaired. To repair a shared volume which is in broken state, using NMC: nmc:/$ setup group rsf-cluster shared-volume repair This initiates and runs the repair process. To mark a service as repaired, using NMV: 1. Click Settings > HA Cluster. 2. In the Action column, set the action to repaired. 3. Click Confirm. Replacing a Faulted Node NexentaStor provides advanced capabilities to restore a node in a cluster, in case the state changes to out of service. There is no need to delete the cluster group on another node and reconfigure it and all of the cluster services. To replace a faulted node, using NMC: nmc:/$ setup group rsf-cluster <group_name> replace_node High Availability Cluster Plugin User Guide 27 System Operations After executing the command, the system asks you to choose which node to exclude from the cluster and which to use instead. The system checks the host parameters of the new node and if they match the requirements of the cluster group, it replaces the old one. ! Note: Before performing a replace node operation, you must set up the identical configuration on the new or restored hardware, which HAC uses to replace the old faulted node. Otherwise, the operation fails. Make the serial port heartbeats configuration the same, as well. Maintenance You can stop HAC from triggering a failover so you can perform maintenance. To stop HAC from triggering a failover, using NMC: Type: nmc:/$ setup group rsf-cluster <cluster_name> sharedvolume <volume_name> manual System Upgrades Occasionally, you may need to upgrade NexentaStor software on the appliance. Since this may require a reboot, manage it carefully in a cluster environment. HAC reminds the User that the cluster service is not available during the upgrade. Upgrade Procedure When upgrading, you have an active and a passive node. To upgrade both nodes, using NMC: 1. Login to the passive node and type: nmc:/$ setup appliance upgrade 2. After the upgrade successfully finishes, login to the active node and type the following to failover to the passive node: nmc:/$ setup group rsf-cluster <group_name> <shared_volume_name> <passive_node_name> failover 3. After failover finishes, the nodes swap. The active node becomes the passive node and vice versa. Type the following command on the current passive node: nmc:/$ setup appliance upgrade 28 High Availability Cluster Plugin User Guide System Operations 4. Type the following to run the failover command on the current active node and thereby, make it passive again: nmc:/$ setup group rsf-cluster <group name> <shared volume name> <active_node_name> High Availability Cluster Plugin User Guide 29 System Operations This page intentionally left blank 30 High Availability Cluster Plugin User Guide 9 Testing and Troubleshooting This section includes the following topics: • About Testing and Troubleshooting • Resolving Name Conflicts • Specifying Cache Devices • Manually Triggering a Failover • Verifying DNS Entries • Verifying Moving Resources Between Nodes • Verifying Failing Service Back to Original Node • Gathering Support Logs About Testing and Troubleshooting The following section contains various testing and trouble-shooting tips. Resolving Name Conflicts You must make the appliances in the HA cluster group resolvable to each other. This means they must be able to detect each other on the network and communicate. To achieve this, you can either configure your DNS server accordingly, or add records to /etc/hosts. If you do not want to edit /etc/ hosts manually, you can set this up when you create the shared volumes. (You have to enter a virtual shared service hostname and a virtual IP address.) Defining these parameters allows the software to modify /etc/hosts tables automatically on all HA-cluster group members. You can use a virtual IP address instead of a shared logical hostname. ! Note: High Availability Cluster Plugin User Guide 31 Testing and Troubleshooting You can add these host records automatically when you create the shared volumes. If a shared logical hostname cannot find a server, then you need to define the IP address for that server. Then it adds records to that server automatically. You can choose to manually configure your DNS server, or local hosts tables on the appliances. To see more information, using NMC: Type: nmc:/$ setup appliance hosts -h Alternatively, you could allow this cluster configuration logic to update your local hosts records automatically. To create a shared service, using NMC: Type: nmc:/$ setup group rsf-cluster HA-Cluster shared-volume add System response: Scanning for volumes accessible from all appliances ... To configure the shared cluster volume manually: 1. Using a text editor, open /etc/hosts file for node A: 2. Add the IP addresses for each node: Example: 172.16.3.20 nodeA nodeA.mydomain 172.16.3.21 nodeB nodeB.mydomain 172.16.3.22 rsf-data 172.16.3.23 rsf-data2 3. Repeat Step 1 — Step 2 for node B. Specifying Cache Devices NexentaStor allows you to configure specific devices in a data volume as cache devices. For example, using solid-state disks as cache devices can improve performance for random read workloads of mostly static content. To specify cache devices when you create the volume, using NMC: Type: nmc:/$ setup volume grow Cache devices are also available for shared volumes in the HA Cluster. However, note that if you use local disks as cache, they cannot fail over since they are not accessible by the alternate node. After failover, therefore, the volume is marked “Degraded” because of the missing devices. 32 High Availability Cluster Plugin User Guide Testing and Troubleshooting If local cache is critical for performance, then set up local cache devices for the shared volume on each cluster node when you first configure the volume. This involves setting up local cache on one node, and then manually failing over the volume to the alternate node so that you can add local cache there as well. This ensures the volume has cache devices available automatically after a failover, however, the data volume changes to “Degraded” and remains in that state because the cache devices are always unavailable. Additionally, Users can control the cache settings for zvols within the data volume. In a cluster, set the zvol cache policy “write-through” not “writeback”. To administer the cache policy, using NMC: Type: nmc:/$ setup volume <volume_name> zvol cache nmc:/$ setup zvol cache nmc:/$ show zvol cache Manually Triggering a Failover You can manually trigger a failover between systems when needed. Performing a failover from the current appliance to the specified appliance causes the volume sharing service to stop on the current appliance, and the opposite actions take place on the passive appliance. Additionally, the volume exports to the other node. To manually trigger a failover, using NMC: Type: nmc:/$ setup group rsf-cluster <cluster_name> <appliance_name> failover Verifying DNS Entries There is a name associated with the cluster that is referred to as a shared logical hostname. This hostname must be able to detect the clients that are accessing the cluster. One way to manage this is with DNS. Install a DNS management application. Through this software, you can view the host record of the shared cluster hostname to verify that it was setup properly. Verifying Moving Resources Between Nodes Use a manual failover to move a resource from one NexentaStor node to another node in the cluster. High Availability Cluster Plugin User Guide 33 Testing and Troubleshooting To move a shared volume from node A to node B: Type: nmc@nodeA:/$ setup group <group_name> <cluster_name> shared-volume show System response: HA CLUSTER STATUS: HA-Cluster nodeA: vol1-114 stopped manual unblocked 10.3.60.134 e1000g0 nodeB: vol1-114 running auto unblocked 10.3.60.134 e1000g0 20 8 20 8 Verifying Failing Service Back to Original Node The first system response below illustrates a failed failover. The second illustrates a successful failover. To verify failover back to the original node: Type: nmc:/$ setup group <group_name> <cluster_name> sharedvolume <volume_name> failover <appliance_name> System response: SystemCallError: (HA Cluster HA-Cluster): cannot set mode for cluster node 'nodeA': Service ha-test2 is already running on nodeA (172.16.3.20) Type: nmc:/$ setup group <group_name> <cluster_name> sharedvolume <volume_name> failover <appliance_name> System response: Waiting for failover operation to complete......... done. nodeB: ha-test2 running auto unblocked rsf-data2 e1000g1 60 60 Gathering Support Logs To view support logs, using NMV: Click View Log. To view support logs, using NMC: Type: nmc:/$ show log 34 High Availability Cluster Plugin User Guide Index A R adding hostname addresses RSF-1 cluster service 2 S 12 virtual IP addresses 12 additional resources 4 B basic concepts 2 binding nodes 9 C configuring shared volumes 11 configuring HA cluster 10 SCSI-2 PGR 4 service failover 4 shared volumes configuring 11 SSH binding the nodes 9 storage failover 2 support contact v E exclusive access to storage 3 F failover service 4 storage 2 G guide audience v H HA cluster configuring 10 I importing current node 11 N network architecture 6 nexentaStor appliances 2 node importing 11 P plugins installing 6 prerequisites 5 product features 1 High Availability Cluster Plugin User Guide 35 Index This page intentionally left blank 36 High Availability Cluster Plugin User Guide Global Headquarters 455 El Camino Real Santa Clara, California 95050 Nexenta EMEA Headquarters Camerastraat 8 1322 BC Almere Netherlands Houston Office 2203 Timberloch Pl. Ste. 112 The Woodlands, TX 77380 Nexenta Systems Italy Via Vespucci 8B 26900 Lodi Italy Nexenta Systems China Room 806, Hanhai Culture Building, Chaoyang District, Beijing, China 100020 Nexenta Systems Japan 〒 102-0083 Chiyodaku koujimachi 2-10-3 Tokyo, Japan Nexenta Systems Korea Chusik Hoesa 3001, 30F World Trade Center 511 YoungDongDa-Ro GangNam-Gu, 135-729 Seoul, Korea 3000-hac-v2.5-000023-B This page intentionally left blank 38 High Availability Cluster Plugin User Guide