3G/4G MATRIXX Policy Failover
A MATRIXX Policy failover can be due to a hard failure or can be planned.
When a hard failure occurs, only one engine remains functional. During a planned failover, both engines remain functional.
Sy Hard Failure
During a hard failure of the primary site, the Policy and Charging Rules Function (PCRF) receives no responses from the MATRIXX Engine. Examples of a hard failure include forced shutdown of the engine or a catastrophic failure. In this case, the Diameter Watchdog timers and session timers expire and the Diameter client must direct new and existing Diameter messages to the secondary site. The secondary site includes the existing information from the failed primary site and responds to existing and new Diameter messages from the PCRF. The Diameter error code 5012 (unable to comply) may be sent from the MATRIXX Engine and the Diameter Watchdog may be present while the primary site fails or shuts down.
- The PCRF sends a Diameter SLR command and establishes an Sy session for the subscriber. The request also includes the SL-Request-Type AVP that is set to the value INITIAL_REQUEST (0).
- The MATRIXX Engine sends a Diameter SLA command to the PCRF. The MATRIXX Engine at Site A fails and is not functional.
- The PCRF decides to modify the list of subscribed policy counters and sends a Diameter SLR with command SL-Request-Type AVP set to the value INTERMEDIATE_REQUEST (1).
- The Tx timer expires with no response and the PCRF sends the Diameter SLR to the MATRIXX Engine at site B.
- The MATRIXX Engine sends a successful Diameter SLA command to the PCRF. After no response is received from repeated watchdog requests (DWR) from site A, the PCRF must initiate a failover to site B.
Sy Planned Failover
During a planned failover, the secondary site (Site B) becomes the primary site and the primary site (Site A) becomes the secondary site. After the switch, the PCRF receives a Diameter 3002 (DIAMETER_UNABLE_TO_DELIVER) message from Site A. After receiving this Diameter result code, the PCRF must resend the Diameter message to Site B. The Site B MATRIXX Engine responds with a success message. Any SLA Initial messages sent to Site A for new sessions will be rejected and the PCRF should send all SLR Initial messages to Site B.
- The PCRF sends a Diameter SLR command and establishes an Sy session for the subscriber. The request includes the SL-Request-Type AVP set to the value INITIAL_REQUEST (0).
- The MATRIXX Engine sends a Diameter SLA command to the PCRF. Site A changes roles and is now a secondary site. It will respond successfully to Watchdog requests and CER messages but will respond with Diameter result code 3002 (DIAMETER_UNABLE_TO_DELIVER) to Diameter policy requests.
- The PCRF modifies the list of subscribed policy counters and sends a Diameter SLR with command SL-Request-Type AVP set to the value INTERMEDIATE_REQUEST (1).
- The Site A MATRIXX Engine returns an SLA with Diameter result code 3002 (DIAMETER_UNABLE_TO_DELIVER).
- The PCRF resends the SLR command to the Site B MATRIXX Engine.
- The Site B MATRIXX Engine successfully responds with an SLA.
Diameter Gateway Disconnect
Configure the network connection to the Diameter Gateway to disconnect from the standby cluster during an engine switch-over operation, and then reconnect when the cluster becomes active. This disconnect/reconnect operation enables the Sy interface to re-establish a connection after an engine switch-over.
Figure 4 provides another illustration of the high availability Diameter Gateway disconnect failover. The black bar between the secondary MATRIXX Engine and the Traffic Routing Agent (TRA) indicates that all attempts to establish Diameter connections are rejected.
When Diameter connections on the STANDBY engine are disabled, only the ACTIVE engine accepts Diameter connections from the network. In the event of a switch-over, when the ACTIVE engine becomes the STANDBY, the existing connections to the new STANDBY (the previously ACTIVE engine before the switch-over) are closed and a TCP RST is sent to the Diameter client. The RST sent to the network client is generated by the TRA.