Database Checkpoints
Database checkpoints contain an exact snapshot of the in-memory databases (IMDBs) and configuration files at a specific time. Periodic database checkpoints are used to recover data from whole cluster failures and to reinstate an active processing cluster from a standby processing cluster after a disaster recovery operation. Use on-demand checkpoints for data analysis, setting up test environments, and troubleshooting. You confirm checkpoints using the validateCheckpoint.jar utility.
Database checkpoints are created by the checkpointing pod from its local IMDB and configuration files from /opt/mtx/conf and /opt/mtx/custom. Database checkpoints are not created by the Parallel-MATRIXX™ protocol, so creating checkpoints does not affect transaction processing.
- The latest database checkpoint on the shared storage.
- Replaying input transaction log files on its shared storage that were not included in the database checkpoint.
The checkpointing pod replays input transaction log files from its local SSD until it creates a database checkpoint. During this process, the checkpointing pod suspends transaction replay so that the database checkpoint is consistent. The checkpointing pod resumes transaction replay immediately after the database checkpoint is created on its local SSD. The database checkpoint is then moved to shared storage. The checkpointing pod continuously replays transactions and creates database checkpoints while runs.
If the checkpointing pod detects that pricing is being replayed while the pod creates a database checkpoint, the pod delays checkpoint creation and tries again later after pricing completes replaying.
Checkpoint Replay File Count
. If the value is less than 3
, create an on-demand database checkpoint (most of the time, the value is
0
). - asn1_dictionary*
- cdr_dictionary*
- create_config*
- diameter_dictionary*
- mdc_config*
- mtx_config*
- mtx_pricing*
- process_control*
- sysmon_config*
- topology*
- version*
Understanding Database Checkpoint Files
MATRIXX Engine writes database checkpoint files to the ${MTX_SHARED_DIR}/checkpoints directory and appends the filename with the software version and the time the checkpoint finished writing, for example:
p /mtx_ckpt_v5050.1.1358493520.
The contents.gz file in the ${MTX_SHARED_DIR}/checkpoints directory has the transaction count for all files of a database checkpoint containing the transaction count of each database in the IMDB.
Each database might have many checkpoint files that use a database-name prefix; for example,
subscriber_db_
xxx.log.gz
.
The following shows example content of a contents.gz file in MDC format:
DataContainer:
containerId=MtxCheckpointContent(393,5050,1)
idx name type L A M P value
0 CheckpointFileCount UINT32 0 0 0 1 7
1 TotalCheckpointTxnCount UINT64 0 0 0 1 38484
2 CheckpointFileList STRUCT 1 0 0 1 {
DataContainer:
containerId=MtxTxnLogFileStats(392,5050,1)
idx name type L A M P value
0 FileName STRING 0 0 0 1 subscriber_db
1 TxnCount UINT64 0 0 0 1 20000
,
DataContainer:
containerId=MtxTxnLogFileStats(392,5050,1)
idx name type L A M P value
0 FileName STRING 0 0 0 1 balance_set_db
1 TxnCount UINT64 0 0 0 1 5000
,
DataContainer:
containerId=MtxTxnLogFileStats(392,5050,1)
idx name type L A M P value
0 FileName STRING 0 0 0 1 activity_db
1 TxnCount UINT64 0 0 0 1 10002
,
DataContainer:
containerId=MtxTxnLogFileStats(392,5050,1)
idx name type L A M P value
0 FileName STRING 0 0 0 1 sched_db
1 TxnCount UINT64 0 0 0 1 0
,
DataContainer:
containerId=MtxTxnLogFileStats(392,5050,1)
idx name type L A M P value
0 FileName STRING 0 0 0 1 event_db
1 TxnCount UINT64 0 0 0 1 2
,
DataContainer:
containerId=MtxTxnLogFileStats(392,5050,1)
idx name type L A M P value
0 FileName STRING 0 0 0 1 alert_db
1 TxnCount UINT64 0 0 0 1 0
,
DataContainer:
containerId=MtxTxnLogFileStats(392,5050,1)
idx name type L A M P value
0 FileName STRING 0 0 0 1 pricing_db
1 TxnCount UINT64 0 0 0 1 3480
}
6 GlobalTxnCounter UINT64 0 0 0 1 43045
Using Binary Database Checkpoints
- All pods restore from a binary checkpoint from shared storage.
- Some pods use a binary checkpoint to restore in parallel. Other pods start using DBSync, which ensures that all pods that come up late are restored.
52 6
transaction_server.1.database.event.storage
, where:52
is the memory pool ID.6
is the number of the memory segment (and files for this pool).transaction_server.1.database.event.storage
is the shared memory name needed to match the database memory pool with files on the disk.
You can enable and configure binary checkpointing by answering create_config.info file questions. For information about configuring binary checkpointing, see the discussion about checkpoint and transaction replay configuration in MATRIXX Configuration.
Using Parallel Checkpointing
You can use MDC checkpointing and binary checkpointing in parallel by configuring the create_config.info
questions shown in the following example, per engine, per cluster:
Engine 1:Cluster 3:Do you want to enable binary checkpoint creation (y/n)?y
Engine 1:Cluster 3:How many writing threads do you want to use for binary checkpoint creation?8
Engine 1:Cluster 3:Do you want to enable binary checkpoint restore (y/n)?y
Engine 1:Cluster 3:How many reading threads do you want to use for binary checkpoint restore?8
Analyzing Database Checkpoints
You use the validateCheckpoint.jar utility to analyze a MATRIXX checkpoint, find any errors in the database, and produce a validation report listing internal database statistics and any errors. For MATRIXX Engine in production, MATRIXX Support recommends that you run the checkpoint validation process daily to verify that the output is free of errors, which ensures database integrity. If validateCheckpoint.jar reports any errors, contact a MATRIXX Support representative to help troubleshoot them.
Configuring Checkpoint Intervals
You configure periodic checkpoints by setting the create_config.info file parameters for configuring database checkpoints. The questions in this file specify the interval for automatic checkpoint creation and the number of checkpoints to save for rerating and disaster recovery scenarios. For more information about configuring checkpoints, see the discussion about checkpoint and transaction replay configuration in MATRIXX Installation and Upgrade.
If automatic database checkpoint creation fails, MATRIXX Engine waits one fourth the time specified for the database checkpoint interval before trying another database checkpoint. For example, if the database checkpoint interval is set to 60 minutes (the default), when a database checkpoint fails, MATRIXX Engine waits 15 minutes and then tries again to create a database checkpoint.
You create on-demand checkpoints by running the create_checkpoint.py script. For details, see the discussion about creating a database checkpoint manually.
Exporting Data to Other Formats
Use the MATRIXX data_export.jar utility to export checkpoint and MATRIXX Event File (MEF) data to another data format. Once exported, that data is available for post-processing operations and analytics. The data_export.jar utility transforms the MDC data to comma-separated value (CSV) files. It also generates files that create SQL RDBMS tables and that can load the data from the CSV files into the RDBMS tables.
For more information about exporting data, see the discussion about exporting subscription data in MATRIXX Integration.
For more information about the MATRIXX environment variables, see the discussion about container directories and environment variables in MATRIXX Installation and Upgrade.