Friday 15 October 2010

All about Websphere MQ Clustering

How WMQ Clusters work
>>Role of the CLUSRCVR channel
Every queue manager in the cluster must have a cluster receiver definition, known as a CLUSRCVR.   The CLUSRCVR is very different to a standard WMQ receiver channel as it has more fields, such as the connection name and port that you would normally find in a sender channel.  This is because the CLUSRCVR channel is also used to build the auto-defined CLUSSDR channel when the queue managers start communicating with each other.
The CLUSRCVR definition advertises the queue manager within the cluster. It is used by other queue managers in the cluster to auto-define their corresponding cluster sender (CLUSSDR) channel definitions to that queue manager.  Consequently, once a connection has been established between 2 queue managers in a cluster, any changes to a CLUSSDR channel need to be made on the corresponding CLUSRCVR channel
Note: If you alter a cluster receiver channel, the changes are not immediately reflected in the corresponding automatically defined cluster sender channels. If the sender channels are running, they will not pick up the changes until the next time they restart. If the sender channels are in retry, the next time they hit a retry interval, they will pick up the changes.
>>Role of the CLUSSDR channel
The important thing to remember about CLUSSDR channels is that every queue manager must have at least one manually defined CLUSSDR and this must point to one of the full repository queue managers.   It doesn’t matter which repository queue manager it makes its initial connection with.  Once the initial connection has been made auto-defined cluster channels are defined as necessary to communicate within the cluster.
Note.  Even the manually defined cluster sender channels attributes are overwritten with information from the corresponding CLUSRCVR channel once communication has been established.  Manually defined cluster sender channels that have also been altered this way with auto definitions are shown as CLUSSDRB channels in the output of the DISPLAY CLUSQMGR(*) command. Channels that have been only been defined automatically are displayed as CLUSSDRA channels.
The manually defined CLUSSDR channels to the full repositories are used to transmit cluster information (as well as user data).  For this reason the repository queue managers must fully interconnected with a manually defined CLUSSDR channel to each of the repository queue managers in the cluster.  Under normal circumstances there should only be 2 repository queue managers in the cluster.
>>Role of the REPOSITORY queue managers.
Repository queue managers maintain a complete set of information about all of the queue managers and clustered objects within the cluster.  Partial repository queue managers build up their repositories by making enquiries to a full repository when they need to access a queue or queue manager within a cluster.  The full repository provides the partial repository with the information required to build an auto-defined cluster sender channel to the required queue manager.
You should always have at least 2 full repositories in the cluster so that in the event of the failure of a full repository, the cluster can still operate. If you only have one full repository and it loses its information about the cluster, then manual intervention on all queue managers within the cluster will be required in order to get the cluster working again. If there are two or more full repositories, then because information is always published to 2 full repositories, the failed full repository can be recovered with the minimum of effort.
It is possible to have more than 2 full repositories, but is not recommended unless there is a compelling reason to do so (such as a geographically dispersed cluster).
Subscriptions to cluster objects from partial repositories are only ever made to 2 full repositories, therefore adding additional repositories doesn’t give any benefit and can actually result in additional administration (as extra manual cluster sender channels are required to connect the full repositories) and there is a higher likelihood that the repositories become out of sync. 
Once a cluster has been setup, the amount of messages that are sent to the full
repositories from the partial repositories in the cluster is very small. Partial repositories will re-subscribe for cluster queue and cluster queue manager information every 30 days. Other than this, internal messages are not sent between the full and partial repositories unless a change occurs to a resource within the cluster, in which case the full repositories will notify the partial repositories that have previously registered an interest in that resource.
>>The Repository Process
The repository process, (amqrrmfa) is started with the queue manager and is essential for any cluster processing to take place.  It is not possible to restart the process independently of the queue manager and you should never end the process manually unless you are manually ending the whole queue manager.
When the repository process starts up the contents of the SYSTEM.CLUSTER.REPOSITORY.QUEUE is loaded into memory (cache).  It also processes messages on the SYSTEM.CLUSTER.COMMAND.QUEUE and the SYSTEM.CLUSTER.TRANSMIT.QUEUE.
If the process encounters an error reading from any of these queues it will try to restart itself unless the error is classified as severe.   If the process does end abnormally, the first thing to check for is for any preceding errors in the error log which will often confirm what the issue is e.g
AMQ9511: Messages cannot be put to a queue.
The attempt to put messages to queue 'SYSTEM.CLUSTER.COMMAND.QUEUE'
on queue manager 'MYQMGR' failed with reason code 2087.
>>What happens when a queue is opened for the first time?
>An application connects to a queue manager and issues an MQOPEN against a cluster queue.  The local repository cache is checked to see if there is already an entry for this queue.  In this scenario there is no information in the cache as it is the first time an MQOPEN request has been made for this queue on this queue manager.
>A message is put to the SYSTEM.CLUSTER.COMMAND.QUEUE requesting the repository task to subscribe for the queue.
>When the repository task is running, it has the SYSTEM.CLUSTER.COMMAND.QUEUE open for input and is waiting for messages to arrive. It reads the request from the queue.
>The repository task creates a subscription request. It places a record in the
repository cache indicating that a subscription has been made and this record is also hardened to the SYSTEM.CLUSTER.REPOSITORY.QUEUE. This queue is where the hardened version of the cache is kept and is used when the repository task starts, to repopulate the cache.
>The subscription request is sent to 2 full repositories. It is put to the SYSTEM.CLUSTER.TRANSMIT.QUEUE awaiting delivery to the
SYSTEM.CLUSTER.COMMAND.QUEUE on the full repository queue managers.
>The channel to the full repository queue manager is started automatically and the
message is delivered to the full repository. The full repository processes the message and stores the subscription request.
>The full repository queue manager sends back the information about the queue being opened to the SYSTEM.CLUSTER.COMMAND.QUEUE on the partial repository queue manager.
>The message is read from the SYSTEM.CLUSTER.COMMAND.QUEUE by the
repository task.
>The information about the queue is stored in the repository cache and hardened to the SYSTEM.CLUSTER.REPOSITORY.QUEUE
At this point the partial repository knows which queue managers host the queue. What it would then need to find out is information on the channels that the hosts of the queue have advertised to the cluster, so that it can create auto-defined cluster sender channels to them. To do this, more subscriptions would be made (if necessary) to the full repositories.

No comments:

Post a Comment