Christian Bilien’s Oracle performance and tuning blog

April 17, 2007

RAC geographical clusters and 3rd party clusters (Sun Solaris) (1/3)

Filed under: Oracle,RAC,Solaris — christianbilien @ 9:06 pm

As a word of introduction a geographical RAC cluster is a RAC where at least one node is physically located in a remote location, and DB access is still available should one of the sites go down.

I found that many customers wishing to implement a RAC geo cluster get confused by vendors when it comes to the RAC relationships (or should I say dependencies) with third party clusters. I also have the impression that some Oracle sales rep tend to participate to this confusion by encouraging troubled prospects in one way or in another, depending of their particular interest with a hardware/cluster 3rd party provider.

Let’s first say that I am here just addressing the RAC options. Assuming some other applications need clustering services, a third party cluster will be necessary (although some provisions, still in infancy, exist within the CRS to “clusterize” non-RAC services). I’ll also deliberately not discuss NAS storage as I never had the opportunity to work or even consider a RAC/NAS option (Pillar, NetApp, and a few others are trying to get into this market).

This first post is about RAC geo clusters on Solaris. RAC geo clusters on HP-UX will be covered here.

The Solaris compatibility matrix is located at https://metalink.oracle.com/metalink/plsql/f?p=140:1:2790593111784622179

I consider two cluster areas to be strongly impacted by the “third party cluster or not” choice: storage and membership strategy. Some may also argue about private interconnect protection against failure, but since IPMP may be used for the RAC-only option, and although some technical differences exist, I think that this is a matter of much less importance that storage and membership.

Storage:

  • 10gR1 was very special as it did not have any Oracle protection for the vote and ocr volumes. This lack of functionality had a big impact on geo clusters as some third party storage clustering was required for vote and OCR mirroring.
  • 10gR2: The options may not be the same for OCR/vote, data base files, binary and archivelog files. Although archivelog files on a clustered file system saves NFS mounts, binary and archivelogs may usually be located on their own “local” file system which may on the array, but only seen from one node. The real issues are on one hand the DB files, on the other hand the OCR and voting disk which are peculiar because they must be seen when the CRS starts, BEFORE the ASM or any Oracle dependent process can be started.
  • RAC+ Sun Cluster (SCS): The storage can either be a Solaris volume manager and raw devices or QFS, GFS is not supported. ASM may be used but offers little in my opinion compared to a volume manager. ASM used for mirroring suffers from the mirroring reconstruction that has to be performed when one of site is lost and the lack of any feature similar to a copy of modified blocks only (the way storage mirroring does).
  • RAC + Veritas Cluster Services (VCS): the Veritas cluster file system (the VxFs cluster version), running over the Cluster Volume Manager (the VxVm cluster version) is certainly a good solutions for those adverse to raw device/ASM. All of the Oracle files, including OCR and vote can be put on the CFS. This is because the CFS can be brought up before the CRS starts.
  • RAC without any third party cluster: ASM has to be used for storage mirroring. This is easier to manage and cheaper, although mirrored disk group reconstruction is a concern when volumes are high. I also like not to avoid the coexistence of two clusters (RAC on top of SCS or VCS).

Membership, split brain and amnesia

A number of membership issues are addressed differently by SCS/VCS and the CRS/CSS. It is beyond the scope of this post to explain fencing, split brain and amnesia. There are really two worlds here: on one hand, Oracle has a generic clusterware membership system across platforms, which avoids system and storage dependency, on the other hand VCS and SCS take advantage of SCSI persistent reservation ioctls. Veritas and Sun both advocate that Oracle’s node eviction strategy may create situations in which a node would be evicted from the cluster, but not forced to the boot yet. Other instances may then start recovering instances while the failed instance stills write to the shared storage. Oracle says that database corruption is prevented by using the the voting disk, network, and the control file to determine when a remote node is down. This is done in different, parallel, independent ways. I am not going to enter the war on one side or another, let’s just recall the basic strategies:

  • CSS: this process uses both the interconnects and the voting disks to monitor remote node. A node must be able to access strictly more than half of the voting disks at any time (this is the reason for the odd number of voting disks), which prevents split brain. The css miscount is 30s, which is the network heartbeat time allowance for not responding before eviction.
  • Both VCS and SCS use SCSI3 persistent reservation via ioctl, and I/O fencing to prevent corruption. Each node registers a key (it is the same for all the node paths). Once node membership is established, the registration keys of all the nodes that do not form part of the cluster are removed by the surviving nodes of the cluster. This blocks write access to the shared storage from evicted nodes.

One last bit: although not a mainstream technology (and it won’t improve now that RDS over Infiniband is an option on Linux and soon on Solaris), I believe SCS is needed to allow RSM over SCI/ SunFire Link to be used. The specs show quite an impressive latency of a few micro seconds.

6 Comments »

  1. […] @ 12:38 pm This is the second post in a series of 3 related to geographical clusters (the first post was focusing on Solaris RAC, the last will be about cluster features such as fencing, quorum, cluster lock, etc.). It would be […]

    Pingback by RAC geographical clusters and 3rd party clusters (HP-UX) (2/3) « Christian Bilien’s Oracle performance and tuning blog — April 21, 2007 @ 7:27 pm

  2. Hi,
    Do you think Sun Cluster 3.2 for RAC has a future? I know for Oracle 9i RAC, it’s popular to configure Oracle 9i RAC with Sun Cluster, but with Oracle 10gR2 RAC, Oracle has its own clusterware. So I am confused, do we really need Oracle 10gR2 RAC (Oracle 10g Clusterware) and Sun Cluster 3.2 installed together? Or just like someone said, just use Oracle 10g Clusterware to keep it simple and easy to manage, and decrease cost. What you recommend? Could you provide me some real world examples of Sun Cluster 3.2 with 10gR2 RAC?

    I know Sun Cluster 3.2 provides cluster services to those non-Oracle applications, but in case of Oracle RAC, is the Sun Cluster necessary? Thanks, I look forward to hearing from you.

    Comment by Amos — May 7, 2007 @ 7:47 pm

  3. Hi Amos,

    Thank you for your comment.If you are only interested by RAC and have no other clustering need :

    Geo Cluster (at least 2 nodes AND 2 storage arrays):

    I believe that at the end of the day it all boils down on whether you are comfortable with ASM or not, and especially when failure groups have to be reconstructed. ASM is not much intuitive for the beginner, it is also not so widespread so expertise may be lacking. If it is felt preferable to avoid ASM, I favor the Veritas Cluster file system. The main feature SCS and VCS would bring would be fencing/membership, but I have several years of RAC/Solaris experience without a 3rd party cluster, several failures and no data corruption.

    Local cluster: As raw devices can be used and mirroring is usually offloaded to the storage array, ASM can be avoided.

    Christian

    Comment by christianbilien — May 8, 2007 @ 7:44 am

  4. […] Oracle ISM and DISM: more than a no paging scheme (2/2)… but be careful with Solaris 8Oracle ISM and DISM: more than a no paging scheme (1/2)Two useful hidden parameters: _smm_max_size and _pga_max_size.RAC geographical clusters and 3rd party clusters (HP-UX) (2/3)RAC geographical clusters and 3rd party clusters (Sun Solaris) (1/3) […]

    Pingback by Spotlight on data base and storage array replication options (1/2) « Christian Bilien’s Oracle performance and tuning blog — June 14, 2007 @ 7:41 pm

  5. Hi Christian-

    I found your blog entry re: geographic RAC clusters very useful, as it’s something that we’re looking at currently.

    I am reviewing options right now, but it seems like the Symantec solution is the only one that works. My environment will consist of two sites about 3000 miles apart, and I’d like to be able to implement a failure aware solution for Oracle & Siebel. A cluster would be the ideal environment, but I’m worried about the distance causing heartbeat communication issues.

    What are your thoughts on Symantec HA solution? Have you worked with any other solutions that would offer high availability for an Oracle environment across such long distances?

    Cheers,

    Khoa

    Comment by Khoa — October 10, 2007 @ 4:42 am

  6. Khoa,

    Frankly I would be extremely cautious before considering setting up an infrastructure where a heartbeat is used on such a distance.

    However what you are describing may actually be a continental cluster (2 local clusters, manual or semi-automatic switchover) between the 2 sites. I know Symantec/VCS and HP/McServiceGuard have such an option, but I never used them nor do I know anyone using either of them.

    Another option would be a data guard without the broker: it is a cheap replication but you may potentially loose some data (because you would use asynch replication). Manual switchover intervention would also be required.

    Christian

    Comment by christianbilien — October 10, 2007 @ 2:58 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: