Cluster Manager Configuration
The Cluster Manager controls processing and publishing servers, and monitors the local cluster for availability, fencing off problematic servers, and initiating the shut down and switch-over of a processing cluster when a server quorum is lost. The parameters in create_config.py mainly apply to processing servers.
Cluster Manager Configuration Parameters lists the Cluster Manager parameters.
For more information about the MATRIXX environment variables, see the discussion about container directories and environment variables in MATRIXX Installation and Upgrade.
Parameter | Description | Default Value |
---|---|---|
Engine Failure Wait Time | The amount of time that the Cluster Manager waits to declare an engine failure after a heartbeat message has
not been received. At the end of the wait time, the Cluster Manager initiates failover operations.
ClusterMgr:How long (in seconds) should the Cluster Manager wait before declaring an engine failure? |
12 |
Cluster Failure Wait Time | The amount of time that the Cluster Manager waits to declare an intra-engine failure after a heartbeat
message has not been received. At the end of the wait time, the Cluster Manager initiates failover operations.
ClusterMgr:How long (in seconds) should the Cluster Manager wait before declaring an intra-engine cluster failure? |
10 |
Intra-Cluster Fencing Agent | (Processing and publishing servers) The name of the program that implements the fencing
agent within one cluster, fence_agent_kill.
This fencing agent kills the MTX service processes on the specified
fenced servers. It uses SSH to access the remote fenced server and
invokes a kill command to stop the server
locally. This agent depends on SSH being configured for
password-less access between any two hosts in the cluster. It is
located in the ${MTX_BIN_DIR}/resource_agent.d/
directory. create_config.info question: ClusterMgr:What is the name of the intra-cluster fencing agent to use? |
${MTX_BIN_DIR}/resource_agent.d/fence_agent_kill |
Processing Server Quorum | (Processing pods) Sets the minimum number of processing pods that must be active for the engine to not fail over. The allowed values are:
In Engine Operator-based deployments, configure a quorum of 2 (or any value greater than 1 that is appropriate for the number of processing pods in the deployment). Configure a quorum of 0 in Topology Operator-based deployments, where Cluster Monitor is enabled by default. create_config.info question: ClusterMgr:What is the required number of blades for quorum? Note: If the number of processing pods matches the specified quorum number, then only leader nodes write to transaction logs, and non-leader nodes
do not write to the logs. This prevents multiple processing pods from logging duplicate transactions. |
0 |
Standby Cluster Start up Wait Time | (Processing servers) When a STANDBY cluster starts up, the amount of time to wait for its peer cluster to become ACTIVE before considering it failed and starting itself in ACTIVE HA state.
A value of 0 disables this feature. In such cases, the secondary engine cluster waits forever and does not start in an ACTIVE HA state. create_config.info question: ClusterMgr:How long (in seconds) should the Cluster Manager on a secondary cluster wait for a primary cluster? |
0 |