MATRIXX Traffic Routing Agent DR Features

MATRIXX Traffic Routing Agent (TRA) provides an isolation layer between the network and the MATRIXX Engines configured as a Disaster Recovery (DR) group. It is installed on a pod located between the network and the MATRIXX Engine DR group and provides a single virtual IP address to which the network sends traffic independent of which engine is active at any given time.

The TRA-SI, TRA-DR, or a TRA-RT with SI/DR functionality perform message-based load balancing of Diameter requests over a pool of (active) engine nodes (pods). The TRA-PROC and TRA-PROC with RCP perform load balancing between the processing pods of an engine.

The TRA-SI or TRA-DR constantly poll the Cluster Manager leader in each engine for the engine DR states. When it detects a state change, for example, after a fail over or switchover of the active engine, it switches the network traffic to the newly active engine while maintaining the same incoming address used by the network device to send traffic. Therefore, the network device remains unaware of the change and does not need to reroute traffic.

TRA-SI or TRA-DR also resolve split-brain scenarios, where two engines are in the active state at the same time due to network connectivity issues. When a TRA-SI or TR-DR detects a split-brain conflict between two engines, the active TRA node resolves the conflict and shuts down the appropriate engine. The TRA ensures that a single engine remains active to process traffic. This allows processing to remain uninterrupted and helps preserve the consistency of the database set.

The network or BSS does not need to be configured to reroute processing requests to a newly active engine after a failover operation. All network requests and requests from MATRIXX gateways and web apps are rerouted to this engine so continued subscriber management operations are guaranteed. If the active cluster (engine) fails completely without a graceful switch-over by the Cluster Manager, the Cluster Manager on the standby engine recognizes the failure and switches its cluster to become the active cluster.

All TRA pods are configured into redundant active-standby pairs to provide their own high availability (HA) and support automatic virtual IP (VIP) address migration upon failure. When the active node fails or is shut down, the first (or only) standby node becomes the new active node and the VIP is migrated to the newly active node. Therefore, a failover of a TRA-SI or TRA-DR does not require reconfiguration of the upstream network elements. It also has its own fencing mechanism, so if one TRA node fails, the other node stops it and takes over as the active node.

The TRA-SI and TRA-DR include an SNMP Agent so traps are sent to the Network Operations Center (NOC), notifying it of any switch-over operations between its own HA pair or between an HA set of engines.