Recommended Kafka Producer Metrics
Use these standard Kafka producer metrics to monitor the 5G event streaming performance.
Producer Metrics describes the recommended producer metrics. The MBean for these metrics is kafka.producer:type=producer-metrics,client-id=([-.w]+).
Name | Description |
---|---|
response-rate | The average number of responses received per second. |
request-rate | The average number of requests sent per second. |
request-latency-avg | The average request latency, in milliseconds. |
outgoing-byte-rate | The average number of outgoing/incoming bytes per second. |
io-wait-time-ns-avg | The average length of time the I/O thread spent waiting for a socket, in nanoseconds. |
batch-size-avg | The average number of bytes sent per partition per request. |
Response Rate
For producers, the response-rate metric reports the rate of responses received from brokers. Brokers respond to producers when the data has been received. Depending on your configuration, received can mean one of three things:
- The message was received, but not committed (
request.required.acks == 0
). - The leader has written the message to disk (
request.required.acks == 1
). - The leader has received confirmation from all replicas that the data has been
written to disk (
request.required.acks == all
).
Producer data is not available for consumption until the required number of
acknowledgments have been received. To diagnose low response rates, check the
request.required.acks
configuration directive on your brokers.
Choosing the right value for request.required.acks
is entirely use
case dependent. The tradeoff is between availability and consistency.
Request Rate
The request-rate metric reports the rate at which producers send data to brokers. A request rate indicating issue-free operation varies depending on the use case. Check peaks and drops to ensure continuous service availability. If rate-limiting is not enabled, traffic spikes can cause brokers to slow down as they process a rapid influx of data.
Request Latency Average
The request-latency-avg metric is the amount of time between when
KafkaProducer.send()
is called and the producer receives a
response from the broker.
Producers do not necessarily send each message as soon as it is created. The
linger.ms
value for the producer determines the maximum wait
time before sending a message batch. This can allow collection of a larger batch of
messages before sending them in a single request. The default value of
linger.ms
is zero milliseconds. Setting this to a higher value
can increase latency, but it can also help improve throughput as the producer can
send multiple messages without incurring network overhead for each one.
Latency has a strong correlation with throughput. If you increase
linger.ms
to improve throughput, watch request latency to
ensure it does not rise beyond an acceptable limit. Modifying the value of
batch.size
in your producer configuration can lead to
significant gains in throughput. Determining an optimal batch size is largely
use-case dependent, but in general, increase batch size if you have available
memory.
Outgoing Byte Rate
As with Kafka brokers, watch the outgoing-byte-rate metric of Kafka producers for producer network throughput. Observing traffic volume over time is essential for determining whether you must make network infrastructure changes. Monitoring producer network traffic informs decisions on infrastructure changes, and provides a perspective on the production rate of producers, making it easier to identify sources of excessive traffic.
I/O Wait Time
If producers are producing more data than they can send, they end up waiting for network resources. But if producers are not rate-limited or reaching bandwidth maximums, issues become harder to identify. Because disk access tends to be the slowest segment of any processing task, checking the io-wait-time-ns-avg metric on your producers is a good place to start.
I/O wait time is the percentage of time spent performing I/O while the CPU is idle. Excessive wait times might indicate producers are unable to get the data fast enough. If you are using traditional hard drives for storage, you may want to consider SSDs instead.
Batch Size
To use network resources more efficiently, Kafka producers group messages into
batches before sending them. The producer waits to accumulate an amount of data
defined by batch.size
(16 KB by default), up to the maximum
specified in linger.ms
(0 milliseconds by default.) The
batch-size-avg metric shows fluctuations in the average size. If batches sent by a
producer are consistently smaller than the value of batch.size
, any
time your producer spends lingering is wasted waiting for more data that never
arrives. Consider reducing your linger.ms
setting if the value of
batch-size-avg is lower than your configured batch.size
.
Examples
Figure 1 shows a Grafana dashboard displaying production of ASN.1-encoded records by the ASN.1 Streamer application as reflected in producer metrics.
Figure 1 highlights the following events:
- The load test starts and the consumer starts fetching and consuming records.
- The load test stops. The rate of consumption of records and bytes have been relatively constant.
- The consumer stops processing of all records.