start_engine.py
The start_engine.py
script starts all servers in an engine at the same time. Run as the mtx
user. By default, this script uses the --fast_start option. You can
specify a prompt to warn you before using a fast start if necessary.
- Runs the
check_engine_start_prereqs.py script on the first
publishing server in the cluster to check the validity of the shared storage
before allowing the engine to start. If the shared storage validation fails, the
check_engine_start_prereqs.py FAILED
error is written to the mtx_debug.log file. In such cases, specific issues and possible workarounds are written to the ${MTX_LOG_DIR}/check_engine_start_prereqs.log file. If you are certain the file system is valid, you can specify the--no_fsck
option when running the script. - Probes for any stale NFS mounts on
processing servers before starting. If this script finds a stale NFS mount, it
logs an error and keeps the engine from starting. The stale NFS mount error is
similar to this:
'LM_ERROR 2022-02-02 11:31:12.746 15901|15901 [check_engine_start_prereqs.py] | Nfs mount /mnt/mtx/shared-e1 is stale. Check if Publishing cluster NFS export shared
- In a two-engine chain, (E1–>E2), the second (standby) engine (E2) processing cluster does not shut down when the publishing cluster stops responding.
- In a three-engine chain (E1–>E2–>E3), the second standby engine (E2) processing cluster shuts down when the E2 publishing cluster stops responding. However, the E3 standby engine processing cluster does not shut down when the E3 publishing cluster shuts down because it is the last standby engine in the chain.
The publishing cluster is not responsive and there is no protecting engine available. The processing cluster will not be shut down.
If the engine being restarted is not the last standby engine, the processing cluster is
shut down in the course of restarting the engine and the following message appears in
mtx_debug.log:
The publishing cluster is not responsive but there is a protecting engine. The processing cluster will be shut down.
The TRA-PROC and TRA-PUB servers must be started before MATRIXX Engine is started. Starting this Traffic Routing Agent also starts CCF Network Enablers, if installed.
Syntax
start_engine.py [-h] [-d] [ --fast_start= 0 | 1 ] [ --no_fast_start_prompt= {true | false } ] [--debug_config_file_parsing] [--debug_function_calls] [--debug_output_formatting][--debug_threading_code] -e engineId [-w 0|1 -t seconds] [--plain] [-s] [-u username] [--ssh_debug] [--no_fsck]
Options
The start_engine.py script has the following option in addition to the common engine administration script options:
- -f { 0 | 1 } , --fast_start= { 0 | 1 }
- Fast Start option; true by default. 1 is true; 0 is false. Starts the processing server running immediately on the in-memory databases in the checkpointing server, as opposed to
rebuilding its own in-memory database. Before executing, it provides the highest GTCs it finds in the checkpointing server and the highest GTC it finds in the processing
cluster.Note: Engine Fast Start requires that the Cluster Manager quorum setting be disabled. For more information, see the discussion about Cluster Manager configuration.
- --no_fast_start_prompt= { true | false }
- Used with
--fast_start
. Default is true (no prompt). Ignored in container environments because the question is never asked. True causes the script to prompt you before starting the fast start option. This can safeguard from running the script on an engine that is running the HA ACTIVE processing cluster. Example prompt:Checkpoint server GTC=111045 and left-over txn log max GTC=120340 Do you want to continue Engine fast start (y/n)?
- --no_fsck
- The validity of the file systems are not checked by the check_engine_start_prereqs.py script before being mounted.
For descriptions of the server command line options, see the discussion about command line options for server scripts in MATRIXX Administration.
Start the Local Engine With Checks
start_engine.py
Start the Local Engine Without Checks
start_engine.py -e 1 --no_fsck
Sample Output
On an active engine, you would see output like this:
2020-12-08 15:03:43| 10.10.186.63| LogicalBlade(3:3:1)| Starting mtx (via systemctl): [ OK ]
2020-12-08 15:03:43| 10.10.186.63| LogicalBlade(3:3:1)| LogicalBlade(3:3:1) is starting.
2020-12-08 15:03:43| 10.10.186.63| Cluster(3:3)| Cluster(3:3) is starting.
Cluster(3:3) is starting.
... waiting for the ['init', 'post-init', 'active', 'standby'] Cluster HA state. Try 25 of 120
... waiting for the ['init', 'post-init', 'active', 'standby'] Cluster HA state. Try 30 of 120
... waiting for the ['init', 'post-init', 'active', 'standby'] Cluster HA state. Try 35 of 120
Cluster(3:1) HA state is now init
Cluster(3:1) restores database locally
Cluster(3:1) will be started as Active.
18 files are waiting to be processed.
14 files are waiting to be processed.
10 files are waiting to be processed. Try 1 of 120
10 files are waiting to be processed. Try 4 of 120
5 files are waiting to be processed.
4 files are waiting to be processed.
0 files are waiting to be processed.
Cluster(3:1) has completed database initialization.
Cluster(3:1) HA state is now active
Cluster(3:2) HA state is now active
Cluster(3:3) HA state is now active
Engine(3) has been started.
On a standby engine you would see output like this:
2020-12-08 15:03:43| 10.10.186.63| LogicalBlade(3:3:1)| Starting mtx (via systemctl): [ OK ]
2020-12-08 15:03:43| 10.10.186.63| LogicalBlade(3:3:1)| LogicalBlade(3:3:1) is starting.
2020-12-08 15:03:43| 10.10.186.63| Cluster(3:3)| Cluster(3:3) is starting.
Cluster(3:3) is starting.
... waiting for the ['init', 'post-init', 'active', 'standby'] Cluster HA state. Try 5 of 120
... waiting for the ['init', 'post-init', 'active', 'standby'] Cluster HA state. Try 10 of 120
... waiting for the ['init', 'post-init', 'active', 'standby'] Cluster HA state. Try 15 of 120
... waiting for the ['init', 'post-init', 'active', 'standby'] Cluster HA state. Try 20 of 120
Cluster(3:1) HA state is now init
LogicalBlade(2:1:3) System schema version: 5211
LogicalBlade(2:1:3) System schema version: 5211
Cluster(2:1) sends database initialization data to Cluster(3:1).
Cluster(3:1) will be started as Standby.
1863478 objects are waiting to be processed.
1682963 objects are waiting to be processed.
1504377 objects are waiting to be processed.
1328323 objects are waiting to be processed.
1133609 objects are waiting to be processed.
950928 objects are waiting to be processed.
770590 objects are waiting to be processed.
586355 objects are waiting to be processed.
407797 objects are waiting to be processed.
232674 objects are waiting to be processed.
59586 objects are waiting to be processed.
Cluster(3:1) has completed database initialization.
Cluster(3:1) HA state is now standby
Cluster(3:2) HA state is now init
LogicalBlade(3:1:1) System schema version: 5211
Cluster(3:1) sends database initialization data to Cluster(3:2).
Cluster(3:2) will be started as Active.
Cluster(3:2) has completed database initialization.
Cluster(3:2) HA state is now active
Cluster(3:3) HA state is now active
Engine(3) has been started.