External Diameter Gateway Prometheus Metrics

The following metrics are available for the External Diameter Gateway.

Diameter Inbound Messages

Diameter Inbound Message Metrics lists the metrics for live monitoring of Diameter messages.

Table 1. Diameter Inbound Message Metrics
Metric Name Description
diameter_inbound_message_seconds This is the metric representing the recorded latency of inbound Diameter messages in seconds. Each observation is stored individually, including quantiles of p50, p95, p99.
diameter_inbound_message_seconds_count This metric is the total count of recorded Diameter messages.
diameter_inbound_message_seconds_max This is the metric representing the maximum of recorded latency of inbound Diameter messages in seconds.
diameter_inbound_message_seconds_sum This is the metric representing the sum of recorded latency of inbound Diameter messages in seconds.

Each metric is tagged to create unique metric entries for different types of Diameter messages. Inbound Metric Tags lists the relevant tags.

Table 2. Inbound Metric Tags
Tags Description
app The ApplicationId of the Diameter message.
cmd The command code of the Diameter message.
code The ResultCode of the Diameter message.
originHost The OriginHost of the Diameter message.
requestType The RequestType field in Diameter messages is relevant specifically for Gx, Gy, Ro, and Rx interfaces. This enables the monitoring of metrics for CCR-Initial, CCR-Update, and CCR-Terminate. Additionally, in the context of the Sy interface, the RequestType assists in distinguishing between Initial and Intermediate Subscriber Location Report (SLR) requests.
serviceContextId The service context ID of the Diameter message; relevant specifically for Gy/Ro.
ratingGroup The ratingGroup of the Diameter message if present; relevant for Gy/Ro.
quantile Quantiles of P50, P95 and P99.

Server Response Times

Server Response Time Metrics lists the metrics exposed in the External Diameter Gateway for performance monitoring of External Diameter Gateway internals for server response times.

Table 3. Server Response Time Metrics
Metric Name Description
diameter_server_response_time_seconds This is the metric representing the recorded latency of the server response in seconds. Each observation is stored individually, including quantiles of p50, p95, p99.
diameter_server_response_time_seconds_count This metric is the total count of recorded Diameter messages.
diameter_server_response_time_seconds_max This is the metric representing the maximum of recorded latency of the server response in seconds.
diameter_server_response_time_seconds_sum This is the metric representing the sum of recorded latency of the server response in seconds.

Each metric is tagged to create unique metric entries for different types of diameter messages. Server Response Tags lists the relevant tags.

Table 4. Server Response Tags
Tags Description
app The ApplicationId of the Diameter message.
cmd The command code of the Diameter message.
code The ResultCode of the Diameter message.
thread The Vertx EventLoop thread name.
quantile Quantiles of P50, P95 and P99.

Client Response Times

Client Response Time Metrics lists the metrics exposed in the External Diameter Gateway for performance monitoring of External Diameter Gateway internals for Diameter Peer Client response times.

Table 5. Client Response Time Metrics
Metric Name Description
diameter_client_response_time_seconds This is the metric representing the recorded latency of the Diameter Peer Client response in seconds. Each observation is stored individually, including quantiles of p50, p95, p99.
diameter_client_response_time_seconds_count This metric is the total count of recorded Diameter messages.
diameter_client_response_time_seconds_max This is the metric representing the maximum of recorded latency of the Diameter Peer Client response in seconds.
diameter_client_response_time_seconds_sum This is the metric representing the sum of recorded latency of the Diameter Peer Client response in seconds.

Each metric is tagged to create unique metric entries for different types of diameter messages. Client Response Tags lists the relevant tags.

Table 6. Client Response Tags
Tags Description
app The ApplicationId of the Diameter message.
cmd The command code of the Diameter message.
code The ResultCode of the Diameter message
thread The Vertx EventLoop thread name.
quantile Quantiles of P50, P95 and P99.

Example Prometheus Queries

The following examples describe Prometheus queries that gather various metrics.

Average Latency for a Specific Diameter Message

This query monitors the average latency of the External Diameter Gateway over a 30-second interval, specifically targeting messages with the ApplicationId and command code of Credit Control. It aggregates the data by grouping it according to applicationId, commandCode, and requestType.

(sum by(app, cmd, requestType) (irate(diameter_inbound_message_seconds_sum{app= "Credit_Control", cmd="Credit_Control"}[30s]))) / (sum by(app, cmd, requestType) (irate(diameter_inbound_message_seconds_count{app= "Credit_Control", cmd="Credit_Control"}[30s]))) * 1000

Figure 1 shows an example dashboard for average latency.
Figure 1. Average Latency
A grafana dashboard showing average latency statistics.

Quantiles for Latency

This query monitors the latency of the External Diameter Gateway over a 30 second interval. It targets messages with the ApplicationId and Command Code of Credit Control. It aggregates the data by grouping it according to ApplicationId, commandCode and quantiles of 50%, 95%, and 99%.

sum by(app, cmd, quantile) (irate(diameter_inbound_message_seconds{app="Credit_Control", cmd="Credit_Control"}[1m])) * 1000

Figure 2 shows an example dashboard for average latency with quantiles.
Figure 2. Average Latency with Quantiles

TPS Based on Request Type

This query monitors the current transactions per second (TPS) of the External Diameter Gateway over a 30 second interval, specifically targeting messages with the ApplicationId and command code of Credit Control. It aggregates the data by grouping it according to applicationId, commandCode, and requestType. Use this to monitor TPS for CCR-I, CCR-U, and CCR-T.

sum by(requestType, app, cmd) (irate(diameter_inbound_message_seconds_count{app="Credit_Control", cmd="Credit_Control"}[30s]))

Figure 3 shows an example dashboard for average based on the request type.
Figure 3. Average Latency by Request Type
A grafana dashboard with metrics by request type.

Total Error Rate within a Time Window

This query calculates the total number of Diameter messages that have a ResultCode other than Success within a 30-second time-frame. It consolidates the data by categorizing it based on the application and command.

sum by(app, cmd) (irate(diameter_inbound_message_seconds_count{code!="2001"}[30s]))

Figure 4 shows an example dashboard with metrics for error rate.
Figure 4. Total Error Rate
A Grafana dashboard with metrics for total error rate within a specific window.