Supported Alerts for Prometheus Metrics
Supported Alerts lists the alerts that snmp-notifier supports and the Prometheus metrics used to generate them.
Alert Name | Prometheus Metric Used for the Alert Rule | Description |
---|---|---|
sysClusterNodeJoinedCkpt | kube_pod_container_status_running | Indicates that the checkpointing pod is starting up. |
sysClusterNodeJoinedPubl | kube_pod_container_status_running | Indicates that the publishing pod is starting up. |
sysClusterNodeJoinedProc | kube_pod_container_status_running | Indicates that the processing pod is starting up. |
sysClusterNodeExitedCkpt | kube_pod_container_status_terminated | Indicates that the checkpointing pod is terminated. |
sysClusterNodeExitedPubl | kube_pod_container_status_terminated | Indicates that the publishing pod is terminated. |
sysClusterNodeExitedProc | kube_pod_container_status_terminated | Indicates that the processing pod is terminated. |
sysClusterNodeServiceUp | up | Indicates that all engine pods (processing, publishing, and checkpointing) are up. |
sysClusterNodeServiceDown | up | Indicates that all engine pods (processing, publishing, and checkpointing) are down. |
sysTraNodeServiceUpAlert | up | Indicates that all Traffic Routing Agent(TRA) pods (tra-ag, tralb) are up. |
sysTraNodeServiceDownAlert | up | Indicates that all TRA pods (tra-ag, tralb) are down. |
sysClusterPeerActiveError | sysPeerClusterClusterState | Indicates that the peer is also in the active state. |
sysClusterPeerConnected | sysPeerClusterClusterState | Indicates that the peer is in the standby state. |
sysClusterPeerDisconnected | sysPeerClusterClusterState | Indicates that the peer cluster is unavailable. |
sysProcessingErrorAlert | sysProcessingErrors | Indicates that processing errors have crossed a threshold. |
sysMemoryAvailableThresholdCrossingAlert | sysMemoryAvailableThresholdMb | Indicates that available memory is less than a threshold. |
txnDatabaseMemoryUsedThresholdCrossingAlert | txnDatabaseMemoryUsedKb txnDatabaseMemoryFreeKb |
Indicates that txnDatabase available memory percentage usage has crossed a threshold. |
txnGtcOutOfSyncAlert | txnReplayCurrentGlobalTxnCounter txnReplayLastReplayGlobalTxnCounter |
Indicates that the GTC value gap is more than one million (or a threshold specified for the system). |
MemoryUsageAlert | sysTotalMemoryPoolInUseMb sysTotalMemoryPoolSizeMb |
Indicates that the percentage usage crosses a threshold for system memory dedicated to databases and buffer pools (mtxbufs). |
EngineMemoryUsageAlert | statSysInfoPhysicalMemoryFreeMb statSysInfoPhysicalMemoryCachedMb statSysInfoPhysicalMemoryBuffersMb statSysInfoPhysicalMemoryTotalMb |
Indicates that overall engine memory usage, including TRA pods, is high. |
EngineDiskUsageHighAlert | statSysInfoDiskAvailablePct | Indicates that disk usage has crossed a threshold value for the engine and TRA pods. |
NodeHeartbeatMsgLostAlert | sysClusterNodeHeartbeatMsgReceivedCount sysClusterNodeHeartbeatMsgSentCount |
Indicates that the system is losing node heartbeat messages. |
EngineOneStateAlert | sysPeerClusterClusterState | Indicates that engine 1 is not in the active or the standby state. |
EngineTwoStateAlert | sysPeerClusterClusterState | Indicates that engine 2 is not in the active or the standby state. |
SiteStatusAlert | sysClusterEngineActiveDateTime | Indicates that the site is down. |
SecondaryEngineNotInStandbyAlert | sysPeerClusterClusterState | Indicates that the second engine is not in the standby state. |
SystemCpuUsageAlert | system_cpu_usage | CPU usage alert for non engine application pods. |
TransactionThresholdAlert | txnMsgCount | Indicates that the number of transactions crossed a threshold over a specified duration. |
ActiveMQStatusAlert | org_apache_activemq_Broker_Active | Indicates that the ActiveMQ pod is down. |
diamConnectionStatsReceivedErrors | diamConnectionStatsReceivedErrorCount | Indicates that the number of received Diameter errors crossed a threshold over a specified duration. |
diamConnectionStatsSentError | diamConnectionStatsSentErrorCount | Indicates that the number of sent Diameter errors crossed a threshold over a specified duration. |
diamReceivedErrorLimit | diamConnectionStatsReceivedErrorCount diamConnectionStatsReceivedMsgCount |
Indicates that the percentage of received errors crossed a threshold. |
diamSentErrorLimit | diamConnectionStatsSentErrorCount diamConnectionStatsSentMsgCount |
Indicates that the percentage of sent errors crossed a threshold. |
GatewayProxyFailureAlert | mtx_proxy_error_count_total mtx_proxy_request_count_total |
Indicates that the Gateway Proxy error threshold has been reached. |
Alert Mapping to OIDs maps alert names to
OIDs.
Alert Name | OID |
---|---|
sysClusterNodeJoinedCkpt | 1.3.6.1.4.1.35838.1.1.2.1.8 |
sysClusterNodeJoinedPubl | 1.3.6.1.4.1.35838.1.1.2.1.8 |
sysClusterNodeJoinedProc | 1.3.6.1.4.1.35838.1.1.2.1.8 |
sysClusterNodeExitedCkpt | 1.3.6.1.4.1.35838.1.1.2.1.9 |
sysClusterNodeExitedPubl | 1.3.6.1.4.1.35838.1.1.2.1.9 |
sysClusterNodeExitedProc | 1.3.6.1.4.1.35838.1.1.2.1.9 |
sysClusterNodeServiceUp | 1.3.6.1.4.1.35838.1.1.2.1.10 |
sysClusterNodeServiceDown | 1.3.6.1.4.1.35838.1.1.2.1.11 |
sysTraNodeServiceUpAlert | 1.3.6.1.4.1.35838.1.2.1.1.4.1 |
sysTraNodeServiceDownAlert | 1.3.6.1.4.1.35838.1.2.1.1.4.1 |
sysClusterPeerActiveError | 1.3.6.1.4.1.35838.1.1.2.1.12 |
sysClusterPeerConnected | 1.3.6.1.4.1.35838.1.1.2.1.13 |
sysClusterPeerDisconnected | 1.3.6.1.4.1.35838.1.1.2.1.14 |
sysProcessingErrorAlert | 1.3.6.1.4.1.35838.1.1.2.1.7 |
sysMemoryAvailableThresholdCrossingAlert | 1.3.6.1.4.1.35838.1.1.2.1.8 |
txnDatabaseMemoryUsedThresholdCrossingAlert | 1.3.6.1.4.1.35838.1.1.2.5.1 |
txnGtcOutOfSyncAlert | 1.3.6.1.4.1.35838.1.1.2.5.3 |
MemoryUsageAlert | 1.3.6.1.4.1.35838.1.4.2.1.9 |
EngineMemoryUsageAlert | 1.3.6.1.4.1.35838.1.4.2.1.9 |
EngineDiskUsageHighAlert | 1.3.6.1.4.1.35838.1.4.2.1.9 |
NodeHeartbeatMsgLostAlert | 1.3.6.1.4.1.35838.1.4.2.1.9 |
EngineOneStateAlert | 1.3.6.1.4.1.35838.1.4.2.1.9 |
EngineTwoStateAlert | 1.3.6.1.4.1.35838.1.4.2.1.9 |
SiteStatusAlert | 1.3.6.1.4.1.35838.1.4.2.1.9 |
SecondaryEngineNotInStandbyAlert | 1.3.6.1.4.1.35838.1.4.2.1.9 |
SystemCpuUsageAlert | 1.3.6.1.4.1.35838.1.4.2.1.9 |
TransactionThresholdAlert | 1.3.6.1.4.1.35838.1.4.2.1.9 |
ActiveMQStatusAlert | 1.3.6.1.4.1.35838.1.4.2.1.9 |
diamConnectionStatsReceivedErrors | 1.3.6.1.4.1.35838.1.4.2.1.9 |
diamConnectionStatsSentError | 1.3.6.1.4.1.35838.1.4.2.1.9 |
diamReceivedErrorLimit | 1.3.6.1.4.1.35838.1.4.2.1.9 |
diamSentErrorLimit | 1.3.6.1.4.1.35838.1.4.2.1.9 |
GatewayProxyFailureAlert | 1.3.6.1.4.1.35838.1.4.2.1.9 |