Address Data Consistency

Determine whether checkpoint validation must be completed before lifting the traffic bypass. If the steps in this procedure do not have to be completed before bypass lift, these tasks must be completed when time or resource availability allow, such that they do not interfere with service restoration.

About this task

Most of these tasks can be performed in parallel to optimize recovery time.
Important: The time it takes to validate a checkpoint is specific to your installation and varies according to database size. The checkpoint validation process can be time consuming for large deployments. If you determine validation must be completed before bypass lift, you must accept responsibility for any resulting impact on meeting the service level agreement (SLA).

Procedure

  1. Verify automated checkpoint execution has started from the status of the checkpointing pod, the debug log, or the existence of the process at the command line.
  2. Verify that the checkpoint has completed from the pod, the debug log, or the checkpoints directory of shared storage.
  3. According to your local operational practices and accounting for the potential impact of the large memory footprint of a large deployment, validate the checkpoint with the following command:
    java -jar /opt/mtx/bin/validateCheckpoint.jar -s path_to_checkpoint

    Notify team of estimated and actual execution times.

  4. Share outputs to MATRIXX Support, who can advise on whether it is safe to proceed, and provide fixes where required. Lower priority fixes should not interfere with the critical path to service restoration, and may be deferred until after service restoration.
  5. Repeat these steps as needed until all validations and any critical fixes have been implemented, returning to the lower priority fixes later.