This is the second post in a series of 3 related to geographical clusters (the first post was focusing on Solaris RAC, the last will be about cluster features such as fencing, quorum, cluster lock, etc.). It would be beneficial to read the first geo cluster post which aimed at Solaris as many issues, common to HP-UX and Solaris, will not be rehashed.
Solaris: to make my point straight, I think that the two major differences between the group of generic clusters Sun Cluster Services/ Veritas Cluster Services and the Oracle CRS clusterware are storage and membership. Their nature is however not the same: the storage chapter is essentially related to available options, with ASM being used as the sole option for mirroring in CRS only clusters. This may be a major drawback for large databases. Membership is a different story: to put it simply, Sun says that SCSI3-PR (persistent reservation) closes the door to data base corruptions in case of node eviction.
HP-UX: The story is not much different on HP-UX, although I would add one more major feature Service Guard has that does not exists on Solaris systems: the concept of a tie-breaking node.
HP-UX RAC storage options
As with Solaris, we have Service Guard, the generic cluster software from the hardware provider, Veritas Cluster Services and the Oracle clusterware alone.
10gR1 : The lack of mirroring for the OCR and voting disks makes a third party cluster almost compulsory (the “almost” is a word of caution because SAN mirroring technologies, based for example on virtualized luns might be used).
- RAC+ Service Guard (with Service Guard Extension for RAC):
- SLVM, the shared volume option of the legacy HP-UX LVM had been the only option since the days of Oracle Parallel Server. It only allows data file raw access; hence archivelogs must be on “local” file systems (non shared volumes). A number of limitations, such as having to bring down all nodes but one for Volume Group maintenance exists.
- Since HP-UX 11iv2 and Service Guard 11.16, and after announcing for 2 years a port of the acclaimed True Cluster Advanced File Systems, HP finally closed an agreement with Veritas for the integration of the Veritas Cluster File Systems (CFS) and Cluster Volume Manager (CVM) into Service Guard. This is essentially supported by some extra SG commands (cfsdgadm, cfsdgadm and cfsmntadm), but does not provide much more significant functionality on top of the Veritas Storage Foundation for HA in the storage options. All of the Oracle files, including OCR and vote can be put on the CFS. This is because the CFS can be brought up before the CRS starts.
- RAC + Veritas Cluster Services (VCS): The clusterware is Veritas based, but as you can guess, CFS and CVM are used in this option.
- RAC without any third party cluster: As on Solaris, the only storage options with the Oracle clusterware alone is the ASM, which conveys for large data bases the same mirroring problems as found on Solaris.
Membership, split brain and amnesia
Split brain and tie breaking
Just like Sun Cluster Services and the Oracle clusterware, Service Guard has a tiebreaker cluster lock to avoid a split of two subclusters of equal size, a situation bound to arise when network connectivity is completely lost between some of the nodes. This tiebreaker is storage based (either a physical disk for SCS, an LVM volume for SG or a file for the CRS). However, geo clusters made of two physical sites make this tiebreaker configuration impossible: on which site do you put the extra volume, disk or files? A failure of the site chosen for 2 of the 3 volumes will cause the loss of the entire cluster. A third site HAS to be used. Assuming Fiber Channel is not extended to the 3rd site, an iscsi or NFS configuration (on a supported nfs provider such as NetApp) may be the most convenient option if an IP link exists.
With the Quorum Server, Service Guard can provide a tie breaking service: this can be done either as a stand-alone Quorum Server or as a Quorum Service package running on a cluster which is external to the cluster for which Quorum Service is being provided. Although the 3rd site is still needed, a stand alone server reachable via IP will resolve the split brain. The CRS is nonetheless still a cause of concern: the voting disk on the 3rd site must be seen by the surviving nodes. The Quorum Server may be a cluster service which belongs to a second SG cluster, but its CFS storage is not accessible from the first cluster. As the end of the day, although the Quorum Server is a nice SG feature, it cannot solve the CRS voting disks which must still be accessed through isci or NFS.
The same paragraph I wrote for Solaris can be copied and pasted:
Both VCS and SG use SCSI3 persistent reservation via ioctl, and I/O fencing to prevent corruption. Each node registers a key (it is the same for all the node paths). Once node membership is established, the registration keys of all the nodes that do not form part of the cluster are removed by the surviving nodes of the cluster. This blocks write access to the shared storage from evicted nodes.