cluster_mgr_cli.py
The cluster_mgr_cli.py script provides a simple command line client that can be run on an off-engine server such as a Network Operation Center (NOC) to manage certain cluster operations. This script can retrieve cluster and peer cluster HA states, low-level information about cluster status, the Cluster Manager leader, and the cluster schema version. This script can also shut down a target cluster.
State | Description | SNMP ID |
---|---|---|
UNKNOWN | The state of the remote cluster is unknown. Either a connection to the remote peer was never established, or a connection was established and then lost within the failure detection timeout period (it is not yet considered failed). | 0 |
START | A cluster is in the process of starting, which involves waiting for a cluster quorum and other conditions to be met. | 1 |
PRE_INIT | A cluster is waiting for the latest pricing version to be loaded into the newly upgraded engine. This HA state is only transitioned to during a MATRIXX Engine software upgrade. | 2 |
INIT | A cluster is in the process of initializing its databases from a checkpoint file or a running cluster. The cluster transitions into an active or standby cluster.
Note: When a standby cluster is in the INIT state, a pending transactions message similar to the following might be
written to the mtx_debug.log
file:
This message is not indicative of issues. It is informational and can occur when a server in the
standby cluster receives a parallel balance transaction to replay but has not yet received the checkpoint transaction for the same balance set object. Because the
parallel balance transaction does not have an absolute balance to apply the difference, the transaction is saved as pending for a short period of time until it
receives the checkpoint transaction for this balance set object. During this period, the message is recorded and indicates that the standby cluster is synchronizing
with the active cluster. The number of pending transactions resolves as the synchronization completes. |
3 |
POST_INIT | A cluster was upgraded to a new software version, and it is undergoing the schema upgrade and conversion transformations to handle any caveats before entering a standby state. This HA state is only transitioned to during a MATRIXX Engine software upgrade. | 4 |
STANDBY_SYNC | For a standby cluster, the servers are synchronizing their databases by replaying transactions. This state indicates the state transition during an engine start-up,
switchover, or fail-over. For an active cluster, the servers are replaying transactions after an engine switchover to sync its databases. The STANDBY_SYNC state precedes the STANDBY state. |
5 |
STANDBY | A cluster is ready to replay transaction logs. During typical runtime operations, the cluster in the secondary engine is in a STANDBY HA state. |
6 |
ACTIVE_SYNC | The cluster was selected as the active cluster and is in the process of synchronizing its databases in real time from its queued replay transactions. This state is transitional from
STANDBY to ACTIVE. If an engine in a FAILED state is detected, a STANDBY engine transitions to ACTIVE_SYNC. If the FAILED engine has a processing pod that is still
able to process requests, the ACTIVE_SYNC engine never detects that all transactions have completed. In that case, the ACTIVE_SYNC engine shuts down after a
configurable timeout period. The duration of the timeout period is the product of the |
7 |
ACTIVE | A cluster is actively processing incoming network traffic. During typical runtime operations, the cluster in the primary engine is in an ACTIVE HA state. |
8 |
EXIT | The servers in a cluster are exiting so the cluster can be stopped without causing quorum issues. | 10 |
STOP | A cluster is stopping. | 11 |
FINAL | A cluster is stopped. | 12 |
FAILED | A cluster had a connection to a remote peer cluster and lost the connection permanently. This condition occurs when a connection cannot be restored within the failure-detection timeout period. The peer cluster is viewed by the cluster as failed. | 13 |
NONE | A pseudo state added for the Traffic Routing Agent to identify an engine cluster. This value is used when no peer cluster is configured for a MATRIXX Engine environment. | 14 |
OFFLINE | A cluster is not stopped, but after the process of replaying transactions is completed, ports on Traffic Routing Agent load-balancing instances (TRA-PROCs) are blocked, so that the cluster is isolated from the rest of the topology. A peer cluster in a STANDBY HA state, if present, transitions to an ACTIVE HA state, as if the cluster in the OFFLINE state has been stopped. | 15 |
Syntax
/opt/mtx/bin/cluster_mgr_cli.py [-h] -t target [ get cluster_state| | get cluster_ha_state |get excluded_nodes | clear excluded_nodes | get schema_version | get peer_clusters | shutdown cluster | switchover active_cluster ]
Supported Options
- -h, --help
- Prints help information about this script.
- -t, --target ipaddress:cli_port
- The Traffic Routing Agent virtual IP address (VIP) and port of the cluster_control virtual server for an engine cluster.
- get cluster_state
- Prints the HA state (SNMP ID) of the target cluster as the integer ID defined in the SNMP MIB file.
- get cluster_ha_state
- An alias for the
get cluster_state
command for backward compatibility. - get excluded_nodes
- Prints the node ID of the servers that failed to be fenced off from the cluster and, therefore, added to the Cluster Management Protocol (CMP) block list.
- clear excluded_nodes
- Removes the list of nodes from the CMP block list. This option enables them to rejoin the cluster and CMP. This option returns 0 upon success.
- get schema_version
- Prints the schema version of the target cluster.
- setto offline_cluster
- Puts the cluster in an offline, isolated state without fully stopping the cluster. A standby cluster, if present, becomes active. The offline state allows restoration of the cluster faster than restarting, during multi-cluster upgrade.
- clear offline_cluster
- Restores the cluster to an online, non-isolated state.
Unsupported Options
- get peer_clusters
- Prints information about each peer cluster in an active-standby HA configuration, including the cluster ID, cluster state, cluster substate, schema version, and peer cluster ID.
- shutdown cluster
- Requests an orderly shutdown of the target cluster.
Returns 0 upon success. Warning: Stopping a cluster results in an engine failover.
- switchover active_cluster
- Requests a switchover of the active and standby peer clusters. Returns
0 upon success.
Use the activate_engine.py or activate_cluster.py script instead of this command. These scripts perform more checks before initiating a switchover operation.
Important: If your production environment has three running engines, you cannot switch the active and standby states of two clusters. You must first stop the engine that is not part of the switchover operation.
Display the HA State of a Cluster
Display the HA state of the cluster using VIP 10.10.1.1:8
This output shows that the cluster is in the ACTIVE HA
state.