Transaction Replay

Transactions are replayed first from all MATRIXX Engine processing pods in an engine to the active publishing pod in that engine. This creates the first redundant set of transaction events.

In multi-engine installations, transactions are then streamed (replayed) from the active engine processing cluster to the first standby engine processing cluster and then to the active publishing pod on that engine. If a third engine is configured, the first standby processing cluster streams transactions to the second standby processing cluster and then to the active publishing pod on that engine. This process ensures that the data in all engines are synchronized in case the active engine fails and processing needs to be switched to a standby engine. Each transaction streamed is transactionally consistent with the original transaction message that was processed.

Transactions are replayed immediately in the following situations:

During runtime, stream event transactions from the active engine to the standby engines.
The processing pod then streams the transactions to the other engine pods to create a nearly duplicate set of transaction events. If the active publishing pod fails, the standby publishing pod can continue processing normally. If the next standby engine in line is missing any transactions, it requests them from the processing pod.
During runtime, to ensure resiliency against network instability between sites.
During transaction streaming, if a network connection between an active engine and standby engine becomes unstable, the standby engine requests the transactions for any missing GTCs once the network connection is restored.
After a standby engine starts, to sync its databases with the engine from which it receives transactions.
When a standby engine starts, it sends a request to its supporting engine to initialize its database set. In a two-engine environment, the request is sent to the active engine. In a three-engine environment, if the first standby engine is starting, it sends the request to the active engine. If the second standby engine is starting, it sends the request to the first standby engine. The request causes the supported engine to send batches of its in-memory database image (an in-memory checkpoint) to the engine supporting it. This operation is much faster than replaying a checkpoint file.
After a total system failure, when performing a cold restart of the primary engine to recover the database.

Figure 1 shows the transaction stream flow for processing and publishing clusters for transaction replay.

Figure 1. Transaction Stream Flow for Transaction Replay

Transaction steam flow for transaction replay

For information about analyzing transaction replay performance, see MATRIXX Administration.