restart_engine.py

The restart_engine.py script stops the servers in an engine and then starts them.

By default, the restart_engine.py script stops the engine as a whole, so all servers are aware that its cluster is stopping and they coordinate an orderly exit in unison. It then restarts each processing server, then each publishing server, then the checkpointing server. The restart order is to ensure correct engine monitoring and checkpoint operations.

Note: Before performing the restart operation, the restart_engine.py script prompts the user whether to continue because the result can cause a loss of service when the target engine is running the ACTIVE processing cluster. Enter a y or Y to continue.

This script:

Runs the check_engine_start_prereqs.py script on the first publishing server in the cluster to check the validity of the shared storage before allowing the engine to start. If the shared storage validation fails, the check_engine_start_prereqs.py FAILED error is written to the mtx_debug.log file. In such cases, specific issues and possible workarounds are written to the ${MTX_LOG_DIR}/check_engine_start_prereqs.log file. If you are certain the file system is valid, you can specify the --no_fsck option when running the script.
Probes for any stale NFS mounts on processing servers before starting. If this script finds a stale NFS mount, it logs an error and keeps the engine from starting. The stale NFS mount error is similar to this:
```
'LM_ERROR 2022-02-02 11:31:12.746 15901|15901 [check_engine_start_prereqs.py] | Nfs mount /mnt/mtx/shared-e1 is stale. Check if Publishing cluster NFS export shared
```

If the engine being restarted is the last standby engine in an engine chain, the processing cluster does not shut down after a publishing cluster failure. Therefore, start_engine.py stops the checkpoint server if it is running, starts the publishing cluster, and then starts the checkpoint server. For example:

In a two-engine chain, (E1–>E2), the second (standby) engine (E2) processing cluster does not shut down when the publishing cluster stops responding.
In a three-engine chain (E1–>E2–>E3), the second standby engine (E2) processing cluster shuts down when the E2 publishing cluster stops responding. However, the E3 standby engine processing cluster does not shut down when the E3 publishing cluster shuts down because it is the last standby engine in the chain.

In those cases, the following warning message appears in mtx_debug.log:

The publishing cluster is not responsive and there is no protecting engine available. The processing cluster will not be shut down.

If the engine being restarted is not the last standby engine, the processing cluster is shut down in the course of restarting the engine and the following message appears in mtx_debug.log:

The publishing cluster is not responsive but there is a protecting engine. The processing cluster will be shut down.

Syntax

restart_engine.py [-h] [--delay_in_seconds] [--by_blade] [--rolling] [--no_fsck] [--no_prompt]  [-d] [--debug_config_file_parsing] [--debug_function_calls] [--debug_output_formatting][--debug_threading_code] -e engineId [-p {0|1}] [--plain] [--skip_local] [--ssh_debug] [-u username]

Options

The restart_engine.py script has the following options in addition to the common engine administration script options.

--delay_in_seconds: The amount of time to wait in seconds between restarting each logical server when the --rolling option is specified. The default is 0 seconds.
--by_blade: The restart_engine.py script has the following options in addition to the general engine command line options.
Init.d-based shutdown flag. When specified, the engine is stopped, before being restarted, by using OS-level /etc/init.d scripts to stop the servers individually. The default is to stop the engine by using a cluster management command to stop its clusters. In this case, the Cluster Manager initiates the shutdown of all servers at one time.
test: test
--rolling: Rolling restart flag. default=False. When specified, this script restarts a cluster by stopping and starting a server before stopping and starting the next server in the cluster, until all servers have been restarted. If not specified, this script stops all logical servers before starting all logical servers. To use this option, there must be more servers in the cluster than the configured quorum, otherwise it causes the cluster to lose quorum and shut down.
--no_fsck: The validity of the file systems are not checked by the check_engine_start_prereqs.py script before being mounted.
--no_prompt: Run the script without prompting the user to confirm whether to continue with the operation. The prompting of the user is done to keep from running the script on an engine that is running the HA ACTIVE processing cluster. Using the --no_prompt option might cause a loss of service. See the preceding note.

Restart Without Running fsck

Restart the local engine without running fsck on the file systems:

restart_engine.py --no-fsck
********************************************************************************
WARNING: This action could cause a loss of service.
******************************************************************************** 
Do you want to continue (y/n)?y

Restart Engine ID 1

restart_engine.py -e 1
********************************************************************************
WARNING: This action could cause a loss of service.
******************************************************************************** 
Do you want to continue (y/n)?y