recover_transaction_logs.py

The recover_transaction_logs.py script cleans up orphaned transaction log files that exist due to a total system outage and aids in replaying the most recent checkpoint.

Important: Run this script from the lowest server ID publishing server in a publishing cluster.
You can run the recover_transaction_logs.py script in interactive mode or best effort mode.
  • In interactive mode, if errors are encountered at any step during the recovery process and the script cannot correct them, the user is prompted to correct the errors and then resume the restore operation from where it failed. In this mode, all relevant information is written to the /var/log/mtx/recover_transaction_logs.log file and to the terminal to guide the user. Interactive mode is the default mode.
  • In best effort mode, the recovery process exits at the step during which it encountered the last error or, if no errors are encountered, before writing the checkpoint restart file and starting the engine. It does not try to correct any errors. Most information is redirected to the recover_transaction_logs.log file instead of being written to the terminal. If the script encounters an error, the user must look at the log messages to see the last step of failure. After correcting the errors the user can also resume the execution from the point of last failure. You run the recover_transaction_logs.py script in best effort mode by specifying the -B (--best-effort) option. Use this mode only when you expect the recovery process to be successful and cannot check the terminal for errors.

To recover the database, the recover_transaction_logs.py script calls the check_engine_start_prereqs.py script to verify the start up requirements for the local transaction log file directories. This includes running the fsck utility to check the validity of the file systems. The script then collects all orphaned and in-progress transaction logs, cleans them up, and analyzes the them for consistency and gaps. After the analysis is complete, it can be instructed to write a checkpoint restart file and to start the engine from that checkpoint to recover the database set. These completion steps are only performed when the script runs in interactive mode and when the user confirms the operations by answering yes when prompted.

Syntax

recover_transaction_logs.py [-h | --f startFrom | -l localTxnLogDir | -s sharedDir | -t | C | -B | -v | -D | -m | -n]

Options

-h, --help
Show this help message and exit.
-f startFrom, --start-from=startFrom
Specify the step from which to resume the database recovery process. This is required when the last run of the script failed. After errors have been fixed, specify this option with the last step that failed. Default=1 (from beginning). To resume the recovery process from the earlier point of failure, use the command "recover_transaction_logs.py -f $?".
-l localTxnLogDir, --local-txn-log-dir=localTxnLogDir
Specify the local transaction log file directory. Default=./local
-s sharedDir, --shared-dir=sharedDir
Specify the shared directory. Default=/mtx/mtx/shared
-t, --dry-run
Perform a dry run. The default value is False.
-C, --clean-up
Clean up the transient files and logs generated during the recovery process. This option cannot be used with the -f (--start-from) option and is typically only required during or after running the script in --best-effort mode. In interactive mode (the default), the script cleans up all temporary files after starting the engine and the processing cluster enters an HA ACTIVE state. The default value is False.
-B, --best-effort
Run non interactively and exit on failure or completion. The default value is False.
-v, --verbose
Show verbose output.
-D, --DEBUG
Write debugging output.
-m doStat, --do-stat=DoStat
Check if the specified log directory path exists and if it does not, create it.
-n check_and_mount, --check-and-mount=checkAndMount
Check if the shared directory is mounted and if it is not, mount it. Default=None.