start_engine.py

The start_engine.py script starts all servers in an engine at the same time. Run as the mtx user. By default, this script uses the --fast_start option. You can specify a prompt to warn you before using a fast start if necessary.

This script:

Runs the check_engine_start_prereqs.py script on the first publishing server in the cluster to check the validity of the shared storage before allowing the engine to start. If the shared storage validation fails, the check_engine_start_prereqs.py FAILED error is written to the mtx_debug.log file. In such cases, specific issues and possible workarounds are written to the ${MTX_LOG_DIR}/check_engine_start_prereqs.log file. If you are certain the file system is valid, you can specify the --no_fsck option when running the script.
Probes for any stale NFS mounts on processing servers before starting. If this script finds a stale NFS mount, it logs an error and keeps the engine from starting. The stale NFS mount error is similar to this:
```
'LM_ERROR 2022-02-02 11:31:12.746 15901|15901 [check_engine_start_prereqs.py] | Nfs mount /mnt/mtx/shared-e1 is stale. Check if Publishing cluster NFS export shared
```

If the engine being restarted is the last standby engine in an engine chain, the processing cluster does not shut down after a publishing cluster failure. Therefore, start_engine.py stops the checkpoint server if it is running, starts the publishing cluster, and then starts the checkpoint server. For example:

In a two-engine chain, (E1–>E2), the second (standby) engine (E2) processing cluster does not shut down when the publishing cluster stops responding.
In a three-engine chain (E1–>E2–>E3), the second standby engine (E2) processing cluster shuts down when the E2 publishing cluster stops responding. However, the E3 standby engine processing cluster does not shut down when the E3 publishing cluster shuts down because it is the last standby engine in the chain.

In those cases, the following warning message appears in mtx_debug.log:

The publishing cluster is not responsive and there is no protecting engine available. The processing cluster will not be shut down.

If the engine being restarted is not the last standby engine, the processing cluster is shut down in the course of restarting the engine and the following message appears in mtx_debug.log:

The publishing cluster is not responsive but there is a protecting engine. The processing cluster will be shut down.

Note: The start_engine.py script does not start the engine if an active peer engine has a later system schema version.

The TRA-PROC and TRA-PUB servers must be started before MATRIXX Engine is started. Starting this Traffic Routing Agent also starts CCF Network Enablers, if installed.

Syntax

start_engine.py [-h] [-d] [ --fast_start= 0 | 1 ] [ --no_fast_start_prompt= {true | false } ] [--debug_config_file_parsing] [--debug_function_calls] [--debug_output_formatting][--debug_threading_code] -e engineId [-w 0|1 -t seconds] [--plain] [-s] [-u username] [--ssh_debug] [--no_fsck]

Options

The start_engine.py script has the following option in addition to the common engine administration script options:

-f { 0 | 1 } , --fast_start= { 0 | 1 }

Fast Start option; true by default. 1 is true; 0 is false. Starts the processing server running immediately on the in-memory databases in the checkpointing server, as opposed to rebuilding its own in-memory database. Before executing, it provides the highest GTCs it finds in the checkpointing server and the highest GTC it finds in the processing cluster.

Note: Engine Fast Start requires that the Cluster Manager quorum setting be disabled. For more information, see the discussion about Cluster Manager configuration.

--no_fast_start_prompt= { true | false }

Used with --fast_start. Default is true (no prompt). Ignored in container environments because the question is never asked. True causes the script to prompt you before starting the fast start option. This can safeguard from running the script on an engine that is running the HA ACTIVE processing cluster. Example prompt:

Checkpoint server GTC=111045 and left-over txn log max GTC=120340
Do you want to continue Engine fast start (y/n)?

--no_fsck

The validity of the file systems are not checked by the check_engine_start_prereqs.py script before being mounted.

For descriptions of the server command line options, see the discussion about command line options for server scripts in MATRIXX Administration.

Start the Local Engine With Checks

start_engine.py

Start the Local Engine Without Checks

start_engine.py -e 1 --no_fsck

Sample Output

On an active engine, you would see output like this:

2020-12-08 15:03:43| 10.10.186.63| LogicalBlade(3:3:1)| Starting mtx (via systemctl):  [  OK  ]
2020-12-08 15:03:43| 10.10.186.63| LogicalBlade(3:3:1)| LogicalBlade(3:3:1) is starting.
2020-12-08 15:03:43| 10.10.186.63| Cluster(3:3)| Cluster(3:3) is starting.
Cluster(3:3) is starting. 
... waiting for the ['init', 'post-init', 'active', 'standby'] Cluster HA state. Try 25 of 120
... waiting for the ['init', 'post-init', 'active', 'standby'] Cluster HA state. Try 30 of 120
... waiting for the ['init', 'post-init', 'active', 'standby'] Cluster HA state. Try 35 of 120
Cluster(3:1) HA state is now init
Cluster(3:1) restores database locally
Cluster(3:1) will be started as Active.
18 files are waiting to be processed.
14 files are waiting to be processed.
10 files are waiting to be processed. Try 1 of 120
10 files are waiting to be processed. Try 4 of 120
5 files are waiting to be processed.
4 files are waiting to be processed.
0 files are waiting to be processed.
Cluster(3:1) has completed database initialization.
Cluster(3:1) HA state is now active
Cluster(3:2) HA state is now active
Cluster(3:3) HA state is now active
Engine(3) has been started.

On a standby engine you would see output like this:

2020-12-08 15:03:43| 10.10.186.63| LogicalBlade(3:3:1)| Starting mtx (via systemctl):  [  OK  ]
2020-12-08 15:03:43| 10.10.186.63| LogicalBlade(3:3:1)| LogicalBlade(3:3:1) is starting.
2020-12-08 15:03:43| 10.10.186.63| Cluster(3:3)| Cluster(3:3) is starting.
Cluster(3:3) is starting.
... waiting for the ['init', 'post-init', 'active', 'standby'] Cluster HA state. Try 5 of 120
... waiting for the ['init', 'post-init', 'active', 'standby'] Cluster HA state. Try 10 of 120
... waiting for the ['init', 'post-init', 'active', 'standby'] Cluster HA state. Try 15 of 120
... waiting for the ['init', 'post-init', 'active', 'standby'] Cluster HA state. Try 20 of 120
Cluster(3:1) HA state is now init
LogicalBlade(2:1:3) System schema version: 5211
LogicalBlade(2:1:3) System schema version: 5211
Cluster(2:1) sends database initialization data to Cluster(3:1).
Cluster(3:1) will be started as Standby.
1863478 objects are waiting to be processed.
1682963 objects are waiting to be processed.
1504377 objects are waiting to be processed.
1328323 objects are waiting to be processed.
1133609 objects are waiting to be processed.
950928 objects are waiting to be processed.
770590 objects are waiting to be processed.
586355 objects are waiting to be processed.
407797 objects are waiting to be processed.
232674 objects are waiting to be processed.
59586 objects are waiting to be processed.
Cluster(3:1) has completed database initialization.
Cluster(3:1) HA state is now standby
Cluster(3:2) HA state is now init
LogicalBlade(3:1:1) System schema version: 5211
Cluster(3:1) sends database initialization data to Cluster(3:2).
Cluster(3:2) will be started as Active.
Cluster(3:2) has completed database initialization.
Cluster(3:2) HA state is now active
Cluster(3:3) HA state is now active
Engine(3) has been started.