Supported Alerts for Prometheus Metrics

Supported Alerts lists the alerts that snmp-notifier supports and the Prometheus metrics used to generate them.

Table 1. Supported Alerts
Alert Name Prometheus Metric Used for the Alert Rule Description
sysClusterNodeJoinedCkpt kube_pod_container_status_running Indicates that the checkpointing pod is starting up.
sysClusterNodeJoinedPubl kube_pod_container_status_running Indicates that the publishing pod is starting up.
sysClusterNodeJoinedProc kube_pod_container_status_running Indicates that the processing pod is starting up.
sysClusterNodeExitedCkpt kube_pod_container_status_terminated Indicates that the checkpointing pod is terminated.
sysClusterNodeExitedPubl kube_pod_container_status_terminated Indicates that the publishing pod is terminated.
sysClusterNodeExitedProc kube_pod_container_status_terminated Indicates that the processing pod is terminated.
sysClusterNodeServiceUp up Indicates that all engine pods (processing, publishing, and checkpointing) are up.
sysClusterNodeServiceDown up Indicates that all engine pods (processing, publishing, and checkpointing) are down.
sysTraNodeServiceUpAlert up Indicates that all Traffic Routing Agent(TRA) pods (tra-ag, tralb) are up.
sysTraNodeServiceDownAlert up Indicates that all TRA pods (tra-ag, tralb) are down.
sysClusterPeerActiveError sysPeerClusterClusterState Indicates that the peer is also in the active state.
sysClusterPeerConnected sysPeerClusterClusterState Indicates that the peer is in the standby state.
sysClusterPeerDisconnected sysPeerClusterClusterState Indicates that the peer cluster is unavailable.
sysProcessingErrorAlert sysProcessingErrors Indicates that processing errors have crossed a threshold.
sysMemoryAvailableThresholdCrossingAlert sysMemoryAvailableThresholdMb Indicates that available memory is less than a threshold.
txnDatabaseMemoryUsedThresholdCrossingAlert txnDatabaseMemoryUsedKb

txnDatabaseMemoryFreeKb

Indicates that txnDatabase available memory percentage usage has crossed a threshold.
txnGtcOutOfSyncAlert txnReplayCurrentGlobalTxnCounter

txnReplayLastReplayGlobalTxnCounter

Indicates that the GTC value gap is more than one million (or a threshold specified for the system).
MemoryUsageAlert sysTotalMemoryPoolInUseMb

sysTotalMemoryPoolSizeMb

Indicates that the percentage usage crosses a threshold for system memory dedicated to databases and buffer pools (mtxbufs).
EngineMemoryUsageAlert statSysInfoPhysicalMemoryFreeMb

statSysInfoPhysicalMemoryCachedMb

statSysInfoPhysicalMemoryBuffersMb

statSysInfoPhysicalMemoryTotalMb

Indicates that overall engine memory usage, including TRA pods, is high.
EngineDiskUsageHighAlert statSysInfoDiskAvailablePct Indicates that disk usage has crossed a threshold value for the engine and TRA pods.
NodeHeartbeatMsgLostAlert sysClusterNodeHeartbeatMsgReceivedCount

sysClusterNodeHeartbeatMsgSentCount

Indicates that the system is losing node heartbeat messages.
EngineOneStateAlert sysPeerClusterClusterState Indicates that engine 1 is not in the active or the standby state.
EngineTwoStateAlert sysPeerClusterClusterState Indicates that engine 2 is not in the active or the standby state.
SiteStatusAlert sysClusterEngineActiveDateTime Indicates that the site is down.
SecondaryEngineNotInStandbyAlert sysPeerClusterClusterState Indicates that the second engine is not in the standby state.
SystemCpuUsageAlert system_cpu_usage CPU usage alert for non engine application pods.
TransactionThresholdAlert txnMsgCount Indicates that the number of transactions crossed a threshold over a specified duration.
ActiveMQStatusAlert org_apache_activemq_Broker_Active Indicates that the ActiveMQ pod is down.
diamConnectionStatsReceivedErrors diamConnectionStatsReceivedErrorCount Indicates that the number of received Diameter errors crossed a threshold over a specified duration.
diamConnectionStatsSentError diamConnectionStatsSentErrorCount Indicates that the number of sent Diameter errors crossed a threshold over a specified duration.
diamReceivedErrorLimit diamConnectionStatsReceivedErrorCount

diamConnectionStatsReceivedMsgCount

Indicates that the percentage of received errors crossed a threshold.
diamSentErrorLimit diamConnectionStatsSentErrorCount

diamConnectionStatsSentMsgCount

Indicates that the percentage of sent errors crossed a threshold.
GatewayProxyFailureAlert mtx_proxy_error_count_total

mtx_proxy_request_count_total

Indicates that the Gateway Proxy error threshold has been reached.
Alert Mapping to OIDs maps alert names to OIDs.
Table 2. Alert Mapping to OIDs
Alert Name OID
sysClusterNodeJoinedCkpt 1.3.6.1.4.1.35838.1.1.2.1.8
sysClusterNodeJoinedPubl 1.3.6.1.4.1.35838.1.1.2.1.8
sysClusterNodeJoinedProc 1.3.6.1.4.1.35838.1.1.2.1.8
sysClusterNodeExitedCkpt 1.3.6.1.4.1.35838.1.1.2.1.9
sysClusterNodeExitedPubl 1.3.6.1.4.1.35838.1.1.2.1.9
sysClusterNodeExitedProc 1.3.6.1.4.1.35838.1.1.2.1.9
sysClusterNodeServiceUp 1.3.6.1.4.1.35838.1.1.2.1.10
sysClusterNodeServiceDown 1.3.6.1.4.1.35838.1.1.2.1.11
sysTraNodeServiceUpAlert 1.3.6.1.4.1.35838.1.2.1.1.4.1
sysTraNodeServiceDownAlert 1.3.6.1.4.1.35838.1.2.1.1.4.1
sysClusterPeerActiveError 1.3.6.1.4.1.35838.1.1.2.1.12
sysClusterPeerConnected 1.3.6.1.4.1.35838.1.1.2.1.13
sysClusterPeerDisconnected 1.3.6.1.4.1.35838.1.1.2.1.14
sysProcessingErrorAlert 1.3.6.1.4.1.35838.1.1.2.1.7
sysMemoryAvailableThresholdCrossingAlert 1.3.6.1.4.1.35838.1.1.2.1.8
txnDatabaseMemoryUsedThresholdCrossingAlert 1.3.6.1.4.1.35838.1.1.2.5.1
txnGtcOutOfSyncAlert 1.3.6.1.4.1.35838.1.1.2.5.3
MemoryUsageAlert 1.3.6.1.4.1.35838.1.4.2.1.9
EngineMemoryUsageAlert 1.3.6.1.4.1.35838.1.4.2.1.9
EngineDiskUsageHighAlert 1.3.6.1.4.1.35838.1.4.2.1.9
NodeHeartbeatMsgLostAlert 1.3.6.1.4.1.35838.1.4.2.1.9
EngineOneStateAlert 1.3.6.1.4.1.35838.1.4.2.1.9
EngineTwoStateAlert 1.3.6.1.4.1.35838.1.4.2.1.9
SiteStatusAlert 1.3.6.1.4.1.35838.1.4.2.1.9
SecondaryEngineNotInStandbyAlert 1.3.6.1.4.1.35838.1.4.2.1.9
SystemCpuUsageAlert 1.3.6.1.4.1.35838.1.4.2.1.9
TransactionThresholdAlert 1.3.6.1.4.1.35838.1.4.2.1.9
ActiveMQStatusAlert 1.3.6.1.4.1.35838.1.4.2.1.9
diamConnectionStatsReceivedErrors 1.3.6.1.4.1.35838.1.4.2.1.9
diamConnectionStatsSentError 1.3.6.1.4.1.35838.1.4.2.1.9
diamReceivedErrorLimit 1.3.6.1.4.1.35838.1.4.2.1.9
diamSentErrorLimit 1.3.6.1.4.1.35838.1.4.2.1.9
GatewayProxyFailureAlert 1.3.6.1.4.1.35838.1.4.2.1.9