Checkpoint and Transaction Replay Configuration

Use the configuration questions in create_config.info to configure database checkpoint and transaction replay behavior.

Checkpoint and Transaction Replay Configuration Parameters lists the information required for checkpointing on the checkpointing pod, real-time transaction replay on a standby cluster, and database object replay on a standby cluster during a restart operation of that cluster.

Table 1. Checkpoint and Transaction Replay Configuration Parameters
Parameter Description Default Value
Maximum Number of Checkpoints to Maintain The number of checkpoints to save to the shared storage before removing them. Checkpoints are removed in order of when they were created, starting with the oldest checkpoint.

create_config.info question: Engine 1:Cluster 1:What is the maximum number of checkpoints to maintain?

2
Database Checkpoint Interval The number of minutes to wait between creating live checkpoints.

create_config.info question: Engine 1:Cluster 1:What is the interval between database checkpoints (in minutes)?

The default value is adequate for most MATRIXX environments. If you do change this value, ensure that this setting is at least the sum of your checkpoint creation time plus 30 minutes. Calculate your checkpoint creation time by comparing the checkpoint start and end times in your debug log (for example by running grep handleCheckpointMsg | egrep "start |end" on the log). Your checkpoint creation time increases with the size of your database.

60
Network Interface for Transaction Messages This is the interface used for sending transaction messages and other multi-cast messages to cluster members.

create_config.info question: Engine 1:Cluster 1:What network interface do you want to use for the transaction data plant?

None
Cluster Management Virtual IP Address The VIP address defined in the Traffic Routing Agent for handling cluster management and transaction replay management.

create_config.info question: Engine 1:Cluster 1: What is the cluster management virtual service address used on this cluster?

None
Delay Time in Seconds for Replaying a Transaction Log The number of seconds to delay any timeouts during replay that might occur due to slow system processing. This parameter is used for testing on slower systems and must not be used in production environments because it sets a delay in the synchronization between primary and secondary clusters.

create_config.info question: Engine 1:Cluster 1:What is the delay time in seconds to use for replaying a transaction log?

0
Timeout in Seconds for Replaying a Transaction Log The number of seconds to wait before replaying transactions. This value is used for replaying transactions during a cold restart of an engine, an in-memory database (IMDB) replay by a standby cluster, and a real-time replay by the publishing pod in the active cluster. If a transaction log or a replay batch does not complete within timeout time, it retries the log or replay batch. During the database initialization process for a cold restart replay or an IMDB replay, if the replay operation does not make any more progress after twice the length of the timeout time, (allowing it to retry again due to the timeout), it fails the operation. If it is a cold restart, it shuts down the pod itself. If it is an IMDB replay, it sends a failed InitDatabase response to the publishing pod in the peer engine so it can shut down itself.

The timeout is also used for initiating the InitDatabase request from a standby cluster. If the connection to the active cluster is not open within this timeout time, the publishing pod on the STANDBY cluster fails the InitDatabase process and shut down itself. Is such cases, you must restart the pod manually.

To keep a standby cluster as up-to-date as possible with the cluster it is supporting (either the active cluster or another standby cluster, as in a 3-engine configuration) this value must be small.

create_config.info question: Engine 1:Cluster 1:What is the timeout in seconds to use for replaying a transaction log?

30
Maximum Number of Retries in Replaying a Transaction Log If a transaction log in a checkpoint operation or a real-time replay operation has been retried for more than a maximum number of times and fails, the Transaction Server initiates the shut down of the cluster, logs a critical error, and stops the cluster to avoid data inconsistencies between engines. In such cases, the engine must be restarted manually after the issue is investigated and fixed.

A value of 0 disables this feature.

Note: If a single transaction record from a transaction log in a checkpoint operation or a real-time replay operation has been retried for more than a maximum number of times and fails, the Transaction Server logs a critical error and a warning "Creating a possibly in-consistent checkpoint" and continues processing.

create_config.info question: Engine 1:Cluster 1:What is the maximum number of retries to use for replaying a transaction log?

10
Transaction Log Replay Compression Sets the compression level at which transaction data is compressed before being replayed on a standby cluster. By default, data in transaction logs are compressed before they are sent from an active cluster to a standby cluster, which slightly delays the replay operation. These are the same batches that are logged to disk but, but you can now set a different compression level for them. Each greater compression level increases the delay in real-time replay to the standby cluster due to compression/decompression time. Valid value are 0–9, with 9 being the highest level of compression. A value of 0 disables this feature. The default is 5.

create_config.info question: Engine 1:Cluster 1:What is the compression level to use for replaying a transaction log?

5
Enable Binary Checkpoint Creation Specifies whether to enable binary checkpoints. A value of y enables this feature.

create_config.info question: Do you want to enable binary checkpoint creation (y/n)?

n
Number of Writing Threads for Binary Checkpoint Creation The number of writing threads to use for binary checkpoint creation.

create_config.info question: How many writing threads do you want to use for binary checkpoint creation?

8
Enable Binary Checkpoint Restore Specifies whether to enable binary checkpoint restore. A value of y enables this feature.

create_config.info question: Do you want to enable binary checkpoint restore (y/n)?

n
Number of Reading Threads for Binary Checkpoint Restore The number of reading threads to use for binary checkpoint creation.

create_config.info question: How many reading threads do you want to use for binary checkpoint restore?

8

Resolving Pending Transactions During Runtime

The following commands change how each Transaction Server on the Parallel-MATRIXX protocol proactively tries to resolve any pending transactions during runtime:

  • s/<resolve_pending_transaction_interval_in_micros>.*</resolve_pending_transaction_interval_in_micros>1000000</
  • s/<resolve_pending_transaction_maximum_retries>.*</resolve_pending_transaction_maximum_retries>10</

They look for any transaction that has been idle for more than the configured time interval (default is every 60 seconds) and try to resolve any that are pending. If the same transaction cannot be resolved after the maximum number of retries (default is 3), it logs a critical message and stop trying to resolve the transaction.

When a pending transaction is resolved, the following message is written to the mtx_debug.log file:

LM_INFO 19090|19138 2015-06-30 14:39:08.008947 [transaction_server_1:1:1:1(4700.33153)] | TransactionCtxFactory::Release: pending transaction with transaction ID: [6:-:1:14901]|[1:1:2:1:0:1]|575|0 is resolved after 2 retries

If the same transaction cannot be resolved after the maximum number of retries, it logs a critical message and stop trying to resolve the transaction. This might require a manual restart of the pod that has not approved the transaction commit.