GTC Sync Health Check
The gtc-sync-health-check container continuously checks the GTC replay between engines in the sub-domain, and between the processing and publishing clusters of each MATRIXX Engine.
A GTC out-of-sync is not detected unless one of the following conditions is met:
- All engines have been started.
- An active engine is running with any other engines disabled after the maximum configured number of attempts to auto-heal.
The GTC sync health check reports an error if either of the following conditions arise:
- A constant GTC value for the configured period of time, in which case all values detected during the period must be identical, nonzero, and at least one of the current GTC and last replayed GTC must be static.
- A GTC value exceeding the configured maximum value which does not decrease.
Multiple errors can be detected during the same period of time, for example in different engines. The errors are reported to the sub-domain health check brain container for it to make a decision on what action to take, if any.
Having established that an active engine and standby engine are present, and that the transactions per second are not zero, a five-minute cycle of GTC monitoring is performed. (All time intervals are configurable.) The following GTC detections are performed, where engine 1 is the active engine and engine 2 is the standby engine.
When one engine is available:
- Engine 1 processing — engine 1 publishing
When two engines are available:
- Engine 1 processing — engine 2 processing
- Engine 1 processing — engine 1 publishing
- Engine 2 processing — engine 2 publishing
The current GTC to last-replay GTC gap between the pods are checked every 10 seconds, for 5 minutes. A gap exceeding a configured maximum value during the detection period, that does not decrease, indicates an out-of-sync error. A gap that does not change for the entire detection where either the current GTC or last replay GTC is not moving also indicates an out-of-sync error.
The two components of the health check communicate using a state file created and updated by the gtc-sync-health-check container. This gtc_sync_state file is stored in shared logging storage. If there is no persistent storage enabled then the state file is saved within the sub-domain health checker pod. The file is empty if there have been no GTC out-of-sync detections.