restart_engine.py
The restart_engine.py script stops the servers in an engine and then starts them.
By default, the restart_engine.py script stops the engine as a whole, so all servers are aware that its cluster is stopping and they coordinate an orderly exit in unison. It then restarts each processing server, then each publishing server, then the checkpointing server. The restart order is to ensure correct engine monitoring and checkpoint operations.
Note: Before performing the restart operation, the
restart_engine.py script prompts the user whether to continue
because the result can cause a loss of service when the target engine is running the
ACTIVE processing cluster. Enter a y or Y
to continue.
This script:
- Runs the
check_engine_start_prereqs.py script on the first
publishing server in the cluster to check the validity of the shared storage
before allowing the engine to start. If the shared storage validation fails, the
check_engine_start_prereqs.py FAILED
error is written to the mtx_debug.log file. In such cases, specific issues and possible workarounds are written to the ${MTX_LOG_DIR}/check_engine_start_prereqs.log file. If you are certain the file system is valid, you can specify the--no_fsck
option when running the script. - Probes for any stale NFS mounts on
processing servers before starting. If this script finds a stale NFS mount, it
logs an error and keeps the engine from starting. The stale NFS mount error is
similar to this:
'LM_ERROR 2022-02-02 11:31:12.746 15901|15901 [check_engine_start_prereqs.py] | Nfs mount /mnt/mtx/shared-e1 is stale. Check if Publishing cluster NFS export shared
- In a two-engine chain, (E1–>E2), the second (standby) engine (E2) processing cluster does not shut down when the publishing cluster stops responding.
- In a three-engine chain (E1–>E2–>E3), the second standby engine (E2) processing cluster shuts down when the E2 publishing cluster stops responding. However, the E3 standby engine processing cluster does not shut down when the E3 publishing cluster shuts down because it is the last standby engine in the chain.
The publishing cluster is not responsive and there is no protecting engine available. The processing cluster will not be shut down.
If the engine being restarted is not the last standby engine, the processing cluster is
shut down in the course of restarting the engine and the following message appears in
mtx_debug.log:
The publishing cluster is not responsive but there is a protecting engine. The processing cluster will be shut down.
Syntax
restart_engine.py [-h] [--delay_in_seconds] [--by_blade] [--rolling] [--no_fsck] [--no_prompt] [-d] [--debug_config_file_parsing] [--debug_function_calls] [--debug_output_formatting][--debug_threading_code] -e engineId [-p {0|1}] [--plain] [--skip_local] [--ssh_debug] [-u username]
Options
The restart_engine.py script has the following options in addition to the common engine administration script options.
- --delay_in_seconds
- The amount of time to wait in seconds between restarting each logical server when the
--rolling
option is specified. The default is 0 seconds. - --by_blade
-
The restart_engine.py script has the following options in addition to the general engine command line options.
Init.d-based shutdown flag. When specified, the engine is stopped, before being restarted, by using OS-level /etc/init.d scripts to stop the servers individually. The default is to stop the engine by using a cluster management command to stop its clusters. In this case, the Cluster Manager initiates the shutdown of all servers at one time. - test
- test
- --rolling
- Rolling restart flag. default=False. When specified, this script restarts a cluster by stopping and starting a server before stopping and starting the next server in the cluster, until all servers have been restarted. If not specified, this script stops all logical servers before starting all logical servers. To use this option, there must be more servers in the cluster than the configured quorum, otherwise it causes the cluster to lose quorum and shut down.
- --no_fsck
- The validity of the file systems are not checked by the check_engine_start_prereqs.py script before being mounted.
- --no_prompt
- Run the script without prompting the user to confirm whether to continue with the operation. The prompting of the user is done to keep from running the script on an engine that is running the HA ACTIVE processing cluster. Using the --no_prompt option might cause a loss of service. See the preceding note.
Restart Without Running fsck
Restart the local engine
without running fsck on the file systems:
restart_engine.py --no-fsck
********************************************************************************
WARNING: This action could cause a loss of service.
********************************************************************************
Do you want to continue (y/n)?y
Restart Engine ID 1
restart_engine.py -e 1
********************************************************************************
WARNING: This action could cause a loss of service.
********************************************************************************
Do you want to continue (y/n)?y