Monitor Transaction Replay Progress on the Standby Engine

Run the print_blade_stats.py script with the -R option to display the number of checkpoint log files during local transaction replay and the number of outstanding replay batches during remote replay that are outstanding when completing an InitDatabase request.

About this task

When the standby cluster is running, the number of outstanding batches to replay should be less than or equal to number of processing servers, and the number of outstanding checkpoint files to process should be zero. After a failover operation or engine start up, there can be one or more outstanding checkpoints to process. The checkpoint value returns to zero after the cluster restores the databases from a checkpoint. Note that when a standby cluster is configured but not running, the outstanding batch value is zero.

If you are monitoring replay statistics during runtime operations, perform this task on either server in the active processing cluster. If you are monitoring replay statistics when you are first starting the standby engine, perform this task on the server in the active processing cluster with the lowest server ID. This is the server that receives the InitDatabase request from the standby cluster.

Procedure

In a terminal, enter the following command to view the transaction replay statistics, where bladeId is the ID of the processing server in the active cluster.

print_blade_stats.py -b bladeId -R

Results

For an example of the output, see the discussion about print_blade_stats.py.

For an engine to change from being STANDBY to ACTIVE, the checkpoint replay file count must be 0. If you try to activate the engine beforehand, the operation is rejected.

Important: If transaction replay has failed on a STANDBY cluster, the failed transactions are logged to the ${MTX_SHARED_DIR}/bad directory and an error containing the string "failed to replay transaction" is written to the mtx_debug.log. It is important to monitor this directory because these transactions must be reprocessed. To reprocess failed transactions, restart the STANDBY cluster to re-sync its data with the ACTIVE cluster.

For more information about the MATRIXX environment variables, see the discussion about container directories and environment variables in MATRIXX Installation and Upgrade.