CAF Metrics

Prometheus can be used to monitor the CDR Aggregation Function (CAF) application and the Kafka cluster.

All metrics exposed by Kafka can be accessed through the Java Management Extensions (JMX) interface provided by Prometheus. Kafka and CAF metrics can be retrieved from the JVMs.

Note: To retrieve Kafka cluster metrics, start the Kafka broker start script with KAFKA_OPTS set to include the JMX Exporter as a javaagent.

For example:

# Set KAFKA_OPTS
export KAFKA_OPTS="$KAFKA_OPTS -javaagent:${KAFKA_HOME}/libs/jmx_prometheus_javaagent-0.15.0.jar=localhost:9091:${KAFKA_HOME}/config/kafka-jmx-beans.yml"
 
# Start Kafka broker
${KAFKA_HOME}/bin/kafka-server-start ${KAFKA_HOME}/config/server.properties

In the example, kafka-jmx-beans.yml is used to configure the metrics exposed from the Kafka cluster. For more details about usage, see the discussion about Prometheus JMX Exporter at the Prometheus website.

Once the metrics have been reported to Prometheus, they can then be made available to Grafana dashboards by adding Prometheus as a data source. For more information about metrics exposed by Kafka using JMX, see Kafka metrics documentation on the Apache Kafka website.

CAF Custom Metrics describes available CAF Prometheus metrics.

Table 1. CAF Custom Metrics
Prometheus Metric Name Description
kafka_stream_CDR_Aggregation_processedRequests_total{CDR_Aggregation_id="ChargingDataRequests"} Total number of charging data requests processed by the CAF instance.
kafka_stream_CDR_Aggregation_processedRequests_rate{CDR_Aggregation_id="ChargingDataRequests"} Rate of charging data requests processed by the CAF instance per second.
kafka_stream_CDR_Aggregation_chargingNotifyRequests_processed_total{CDR_Aggregation_id="ChargingNotifyRequests"} Total number of charging notify requests processed by the CAF instance.
kafka_stream_CDR_Aggregation_chargingNotifyRequests_processed_rate{CDR_Aggregation_id="ChargingNotifyRequests"} Rate of charging notify requests processed by the CAF instance per second.
kafka_stream_CDR_Aggregation_chargingDataRequests_duplicateCount_total{CDR_Aggregation_id="ChargingDataRequests"} Total number of duplicate charging data request messages handled by the CAF instance.
kafka_stream_CDR_Aggregation_chargingDataRequests_duplicateCount_rate{CDR_Aggregation_id="ChargingDataRequests"} Rate of duplicate charging data request messages handled by the CAF instance per second
kafka_stream_CDR_Aggregation_sessionReleased_total The total number of aggregate CDRs released by the CAF instance. This can be filtered using CDR_Aggregation_id for specific closure reasons.
kafka_stream_CDR_Aggregation_sessionReleased_rate Rate of aggregate CDRs released by the CAF instance per second. This can be filtered using CDR_Aggregation_id for specific closure reasons.
kafka_stream_CDR_Aggregation_state_store_size The size of the aggregated store in-memory, per task. Tags include spring_id, task_id, thread_id, instance, and job.
kafka_stream_CDR_Aggregation_state_store_deleted The total number of records deleted from the state store, per task. Tags include: spring_id, task_id, thread_id, instance, and job.
kafka_stream_CDR_Aggregation_state_store_duration Time taken for iteration through the store when deleting old records or closing idle records differentiated by process tag. Tags include: spring_id, task_id, thread_id, instance, job, and process.
kafka_stream_CDR_Aggregation_state_store_thread_sleep_total The total count of times the stream thread has been paused, differentiated by process tag. Tags include: spring_id, task_id, thread_id, instance, job, and process.

While these metrics are available in the JMX console, to expose these to Prometheus, update the MTX_CAF_OPTS variable to include the Prometheus JMX exporter.

Recommended Metrics for Grafana

CAF Dashboard Metrics describes the recommended metrics for CAF dashboards:

Table 2. CAF Dashboard Metrics
Source Metric
CAF kafka_stream_CDR_Aggregation_processedRequests_total

kafka_stream_CDR_Aggregation_processedRequests_rate

kafka_stream_CDR_Aggregation_chargingNotifyRequests_processed_total

kafka_stream_CDR_Aggregation_chargingNotifyRequests_processed_rate

kafka_stream_CDR_Aggregation_chargingDataRequests_duplicateCount_total

kafka_stream_CDR_Aggregation_chargingDataRequests_duplicateCount_rate

kafka_stream_CDR_Aggregation_sessionReleased_total

kafka_stream_CDR_Aggregation_sessionReleased_rate

kafka_stream_CDR_Aggregation_state_store_size

kafka_stream_CDR_Aggregation_state_store_deleted

kafka_stream_CDR_Aggregation_state_store_duration

kafka_stream_CDR_Aggregation_state_store_thread_sleep_total

Kafka Client (Consumer) kafka_consumer_fetch_manager_records_lag_max

kafka_consumer_fetch_manager_records_lag

kafka_consumer_fetch_manager_bytes_consumed_rate

kafka_consumer_fetch_manager_records_consumed_rate

kafka_consumer_fetch_manager_fetch_rate

Kafka Client (Producer) kafka_producer_record_send_total

kafka_producer_record_error_total

kafka_producer_record_retry_total

kafka_producer_request_latency_avg

kafka_producer_buffer_available_bytes