Using a Generic SNMP Trap
To send a generic SNMP trap, you can use any process, rather than the standard Process Controller, Cluster Manager, SNMP agent, or TRA. The trap has error information, with specific alerts that relate back to specific error numbers. You can use the sysGenericErrorMessage SNMP trap to send out a system level alert with message text in the payload.
Generic Trap Locations
Configure the trap generation period in mtx_config.xml:
<snmp_agent> <trap_generate_period_msec>10000</trap_generate_period_msec>
You can change the configuration to control how soon the trap generates; the default is 10 seconds. The same type of trap only generates once in this period.
For more information about mtx_config.xml, see the discussion about MATRIXX configuration specification (mtx_config.xml) in MATRIXX Installation and Upgrade.
SNMP uses the generic trap locations specified in Generic Trap Locations.
Component/Module | Task::Function | Message | Action |
---|---|---|---|
MtxChrg | AbortThreadData.onThreadTimeout | Thread linuxThreadId_ has exceeded
threadQuarantineTimeoutInMillis_
ms timeout while processing message. Placing server into quarantine. Example:
|
Check messages/OIDs in the log on the pod for latency issues. |
Quarantining thread linuxThreadId_ would exceed server limit of threadQuarantineLimit_ quarantined threads. Terminating server. | Check messages/OIDs in the log on the pod for latency issues and check system health. | ||
MtxEventLoader | EventLoaderDispatcherTask::checkIdleGtcTimeouts | Have not received a GTC in the last
idleGtcErrorTimeout_.count() minutes.
Example:
|
Check system health. |
EventLoaderDispatcherTask::dispatcherLoop | Failed to read Event Repository for missing GTC ranges. This can happen when a publishing pod becomes active. The Dispatcher reads the LoaderTraceCollection for any gaps to fill. | Check MongoDB. | |
MtxStream | MefV2GeneratorTask::publishMefv2FilesToTarget | Could not publish event files:
::strerror(savedErrno)
savedErrno
and Could not publish event files. Exit status= publishCommand.getExitStatus(). Example:
|
Check the publishing target. |
MefV2GeneratorTask::createPublishedMefList | MEFv2 event recovery. Could not execute create_published_mef_list.py on publish target: ::strerror(savedErrno) savedErrno and MEFv2 event recovery. Could not execute create_published_mef_list.py on publish target publishTargetHostName_. Error due to errString. | Requires manual MEFv2 recovery. | |
MefV2GeneratorTask::pubTriggerCallbackHandler | Mef V2 Publisher did not make any progress for kPubMonitorTimeoutMillis milliseconds. | Check system health. | |
MtxTrafficMgr | CmpLeaderNodePool::getNextSvcStateOnNodeUp | "duplicate CMP " << str << " nodes, count=" <<
count <<
FQN Example:
|
Restart the previous active publishing pod. |
MtxTxn | CheckpointWriterTask::writeCheckpoint | The checkpointing server is out-of-sync with the last system
snapshot. Please check for other errors to determine why. A duplicate Checkpoint was created for GTC= prevCkptGtc_. Example:
|
Check system health. |
TransactionManagerTask::resolvePendingTransactionIfAny | Number of retries to resolve transaction ID txnID, GTC=txnCtxP- getGlobalTxnCounter() reaches maximum value resolveTxnMaxRetries_. | Restart the pod. | |
TransactionManagerTask::handleSharedStorageEvent | Failed to execute nfs unmount from Standby server= myBladeId. Note: Please unmount nfs and mount shared storage
manually. |
Unmount NFS and mount shared storage. | |
Failed to mount the shared storage even after fsck on Active publishing server= myBladeId. Note: Please manually mount the
shared storage. |
Mount shared storage. | ||
TransactionSortedLoggingTask::logWriteBufferAbrtCbHandler | TransactionSortedLoggingTask::logWriteBufferAbrtCbHandler:atPtr->getStepString(),
Step: atPtr->getStepString(). Timeout: timeoutMs msec. |
Restart the publishing cluster. | |
TransactionSortedLoggingTask::diskWriteAbrtCbHandler | TransactionSortedLoggingTask::diskWriteAbrtCbHandler: atPtr->getStepString(), Step: atPtr->getStepString(). Timeout: timeoutMs msec. | Restart the engine. | |
TransactionStreamTask::peerClusterHaStateUpdated | Got Publishing cluster
cl::name(toClusterHaState) state, aborting
transaction stream. To start transaction stream need to restart the publishing cluster. Note: This can
happen during high load when LogWriteBuffer is not
available. Example:
|
||
TransactionStreamTask::handleTxnStreamClusterStateMsg | Got HA peer engine= haPeerEcbId
cl::name(clusterState) state, aborting transaction stream. To start transaction stream need to restart the engine. Note: This means the sorted transaction log
writing to the local disk is slow. Verify if any non-MATRIXX
processes are writing to disk. |
||
TransactionManagerTask::coordinatorCommit | Fatal error in committing transaction, NACK this
transaction.\n Note: Only when the server
is not shut down. |
Start the other server, engine, or cluster before restarting this server. |