Cluster Manager Configuration

The Cluster Manager controls processing and publishing servers, and monitors the local cluster for availability, fencing off problematic servers, and initiating the shut down and switch-over of a processing cluster when a server quorum is lost. The parameters in create_config.py mainly apply to processing servers.

Cluster Manager Configuration Parameters lists the Cluster Manager parameters.

For more information about the MATRIXX environment variables, see the discussion about container directories and environment variables in MATRIXX Installation and Upgrade.

Table 1. Cluster Manager Configuration Parameters
Parameter Description Default Value
Engine Failure Wait Time The amount of time that the Cluster Manager waits to declare an engine failure after a heartbeat message has not been received. At the end of the wait time, the Cluster Manager initiates failover operations.

ClusterMgr:How long (in seconds) should the Cluster Manager wait before declaring an engine failure?

12
Cluster Failure Wait Time The amount of time that the Cluster Manager waits to declare an intra-engine failure after a heartbeat message has not been received. At the end of the wait time, the Cluster Manager initiates failover operations.

ClusterMgr:How long (in seconds) should the Cluster Manager wait before declaring an intra-engine cluster failure?

10
Intra-Cluster Fencing Agent (Processing and publishing servers) The name of the program that implements the fencing agent within one cluster, fence_agent_kill. This fencing agent kills the MTX service processes on the specified fenced servers. It uses SSH to access the remote fenced server and invokes a kill command to stop the server locally. This agent depends on SSH being configured for password-less access between any two hosts in the cluster. It is located in the ${MTX_BIN_DIR}/resource_agent.d/ directory.

create_config.info question: ClusterMgr:What is the name of the intra-cluster fencing agent to use?

${MTX_BIN_DIR}/resource_agent.d/fence_agent_kill
Processing Server Quorum (Processing pods) Sets the minimum number of processing pods that must be active for the engine to not fail over. The allowed values are:
  • 0 - (No quorum number is set) This is the default value. This value is required if starting the engine with the Fast Start option. See the discussion about start_engine.py for more information.
  • 1 — A quorum of 1 processing pod.
  • 2 — A quorum of 2 processing pods. This is valid for two to three processing pods.
  • 3 — A quorum of 3 processing pods.
  • -1 — This setting is retained for historical compatibility. It sets the quorum to the number of total pods divided by 2, plus one ((n/2) + 1). The result is that one pods is a quorum of 1. Two or three pods is a quorum of 2.

In Engine Operator-based deployments, configure a quorum of 2 (or any value greater than 1 that is appropriate for the number of processing pods in the deployment). Configure a quorum of 0 in Topology Operator-based deployments, where Cluster Monitor is enabled by default.

create_config.info question: ClusterMgr:What is the required number of blades for quorum?

Note: If the number of processing pods matches the specified quorum number, then only leader nodes write to transaction logs, and non-leader nodes do not write to the logs. This prevents multiple processing pods from logging duplicate transactions.
0
Standby Cluster Start up Wait Time (Processing servers) When a STANDBY cluster starts up, the amount of time to wait for its peer cluster to become ACTIVE before considering it failed and starting itself in ACTIVE HA state. A value of 0 disables this feature. In such cases, the secondary engine cluster waits forever and does not start in an ACTIVE HA state.

create_config.info question: ClusterMgr:How long (in seconds) should the Cluster Manager on a secondary cluster wait for a primary cluster?

0