Configuring SNMP Alarms for Log Files

When configuring SNMP alarms you must specify which log files to monitor, what patterns to look for, and what types of notifications to send when an event matches a log entry.

Log File Monitoring shows the properties you can set to determine which log files in the configuration are monitored.

Table 1. Log File Monitoring
Property Name Description Default Value
snmp-adapter.alarms.logFiles[0].fileName The full path and name of the file to monitor.
snmp-adapter.alarms.logFiles[0].pollingFrequency The interval between reads of the file. Postfix the value with an s to indicate seconds, m for minutes, or ms for milliseconds. 5s
snmp-adapter.alarms.logFiles[0].startFromEndOfFile If the file already exists (with content) and if this value is true, then the SNMP adapter starts reading from the end of the file and ignores all previously written content. If this value is false, then the file is always read from the beginning. false

The following shows a sample alarm log file in the YAML configuration file.

snmp-adapter:
  alarms:
    logFiles:
      - fileName: /var/log/mtx/mylog.log
        pollingFrequency: 5s

Notice that the properties include an array style index. This allows multiple file monitoring configurations to be added.

Matching Rules

The matching rules define regular expressions that are compared against each line of the monitored files. If a match is found, then an SNMP notification is sent to the SNMP NMS. The matching rules determine which parameters are sent and how they are formed. Matching Rules shows the properties you can set.

Table 2. Matching Rules
Property Name Description Default Value
snmp-adapter.alarms.matchingRules[0].pattern The regular expression used to identify if a line in the log file is a match. This regular expression can define groups that are extracted for use in the variables sent in the SNMP notification (trap).
snmp-adapter.alarms.matchingRules[0].trapOid The object identifier of the SNMP notification.
snmp-adapter.alarms.matchingRules[0].order A number representing the order in which the matching rules are processed. The lower the number, the higher up the list it appears. The SNMP adapter only sends an SNMP notification for the first match. 100
snmp-adapter.alarms.matchingRules[0].variables[0].oid The object identifier of the variable being sent in the SNMP notification.
adapter.alarms.matchingRules[0].variables[0].type The data type of this variable. String and integer are supported. string
adapter.alarms.matchingRules[0].variables[0].template The template used to construct the value for this variable. This template can contain the groups captured in the regular expression by using the group index prefixed by the $ symbol. $1

The following shows sample matching rules in the YAML configuration file.

snmp-adapter:
  alarms:
    matchingRules:
 
 
      # Specific Error Trap
      - pattern: '^\[ERROR\s*\]\s+([0-9]{4}-[0-9]{2}-[0-9]{2}\s[0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{0,6})\s+\[.*\]\s+.*\s-\sA very specific situation has arisen because (.*)'
        trapOid: '1.3.6.1.4.1.35838.1.999.1.1'
        variables:
 
 
          # Extract the timestamp out of the log message which is captured as the first group of the regular expression
          - oid: '1.3.6.1.4.1.35838.1.999.2.1'
            type: 'string'
            template: '$1'
 
 
          # Extract the description out of the log message which is captured as the second group of the regular expression
          - oid: '1.3.6.1.4.1.35838.1.999.2.2'
            type: 'string'
            template: 'Error Message was: $2'
 
 
      # Generic Catch All Error Trap (ordered to be last)
      - pattern: '^\[ERROR\s*\]\s+([0-9]{4}-[0-9]{2}-[0-9]{2}\s[0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{0,6})\s+\[.*\]\s+.*\s-\s(.*)'
        trapOid: '1.3.6.1.4.1.35838.1.999.1.1'
        order: 999
        variables:
 
 
          # Extract the timestamp out of the log message which is captured as the first group of the regular expression
          - oid: '1.3.6.1.4.1.35838.1.999.2.1'
            type: 'string'
            template: '$1'
 
 
          # Extract the description out of the log message which is captured as the second group of the regular expression
          - oid: '1.3.6.1.4.1.35838.1.999.2.2'
            type: 'string'
            template: '$2'

Notice that the properties include an array style index at two points. This allows multiple file matching rules to be added, with each matching rule defining multiple variables.

Configuring SNMP Alarms for Queues and Buffers

You can configure alarm thresholds on all key buffers and queues including MtxBuf, write buffers, the CAMEL Gateway input queue and the queue from the CAMEL Gateway. Threshold breaches shall be reported using SNMP and to Prometheus. This feature can be enabled by configuration. Specifically, an event monitoring system such as Prometheus can be configured to raise alarms based on the value of SNMP statistics.

In Queues, see the queues and their descriptions.

Table 3. Queues
Service Queue Description
camel Analyzer:analyzer_task The second queue in the CAMEL Gateway.
camel Logging:camel_logging_task The third queue in the CAMEL Gateway.
camel Sccp:from_sccp_task The first queue into the CAMEL Gateway from network_enabler using the network.
camel UnitData:unitdata_sender_task A queue in the CAMEL Gateway. Messages are queued here before being sent to network_enabler.
camel blade_1_1_1_chrg.1.5.output The queue from the CAMEL Gateway into the Charging Server.
camel blade_1_1_1_sysctrl.1.3.output Inbound queue (shared memory) to CAMEL Gateway stats_task, which handles service state and cluster control messages. Messages typically sent from the Cluster Manager.
charging CHRG-Replay:replay_task Inbound queue to replay_task of Charging Server, which throttles replay messages.
charging CHRG-Upgrade:upgrade_task The priority queue into the Charging Server. Messages requiring database retry are queued here.
charging blade_1_1_1_chrg.1.1.input The queue from the MDC Gateway to the Charging Server.
charging blade_1_1_1_chrg.1.2.input The queue from Diameter Gateway to the Charging Server.
charging blade_1_1_1_chrg.1.3.input The queue from subscriber_db_scan_task to the Charging Server.
charging blade_1_1_1_chrg.1.4.input The queue from task_manager_handler_task back into the top of the Charging Server.
charging blade_1_1_1_chrg.1.5.input The queue from the CAMEL Gateway to the Charging Server.
charging blade_1_1_1_chrg.1.6.input The queue from the Cluster Manager to the Charging Server.
charging blade_1_1_1_chrg.1.7.input The queue from replay_task to the top of the Charging Server.
charging blade_1_1_1_chrg.1.priority.taskMgrHandlerQueue The queue from response_task to task_manager_handler_task.
charging blade_1_1_1_chrg.1.responseQueue The queue from the Transaction Server to the Charging Server.
charging blade_1_1_1_chrg.1.retryQueue The queue into retry_task.
charging blade_1_1_1_chrg.1.taskMgrHandlerQueue The queue from the Task Managerto the Charging Server.
diameter Analyzer:analyzer_task The second queue in the Diameter Gateway.
diameter Diameter-Worker:from_diameter_task The first queue in Diameter Gateway, after incoming DIAMETER messages are decoded.
diameter DiameterIO:diameter_sender_task:01 A queue in Diameter Gateway. Messages are queued here before being sent to network_enabler.
diameter Logging:diameter_logging_task The third queue in Diameter Gateway.
diameter blade_1_1_1_chrg.1.2.output The queue from Diameter Gateway into the Charging Server.
diameter blade_1_1_1_sysctrl.1.2.output Inbound queue (shared memory) to diameter stats_task, which handles service state and cluster control messages. Messages typically sent from the Cluster Manager.
mdc Network-Worker:from_network_task Inbound queue (local) to from_network_task of the MDC Gateway.
mdc NetworkIO:network_sender_task:01 Inbound queue (local) to network_sender_task, each instance serializes to configured endpoint. See configuration file for task number to destination details.
mdc NetworkIO:network_sender_task:02 Inbound queue (local) to network_sender_task, each instance serializes to configured endpoint. See configuration file for task number to destination details.
mdc NetworkIO:network_sender_task:03 Inbound queue (local) to network_sender_task, each instance serializes to configured endpoint. See configuration file for task number to destination details.
mdc blade_1_1_1_chrg.1.1.output Inbound queue (shared memory) for to_network_task. This is the main queue that feeds outbound MDCs from the Charging Server.
mdc blade_1_1_1_gw.1.input Inbound queue (shared memory) to the MDC Gateway stats task, which handles services state and cluster control messages. Messages typically sent from Cluster Manager.
task blade_1_1_1_TaskMgrResponse.1.input The queue from task_manager_handler_task in Charging Server to response_task in the Task Manager.
task blade_1_1_1_chrg.1.3.output The queue from Charging Server to response_task in the Task Manager.
task blade_1_1_1_task.1.input The queue to the Task Manager.
transaction blade_1_1_1_txn.1.1.input The queue from Charging Server to the Transaction Server.
transaction blade_1_1_1_txn.1.4.input The queue from checkpoint_manager_task to transaction_manager_task in the Transaction Server.
transaction blade_1_1_1_txn.1.5.input The queue from transaction_log_reader_task to transaction_manager_task in the Transaction Server.
transaction blade_1_1_1_txn.1.6.input The queue from p2p_sender_receiver_task to transaction_manager_task in the Transaction Server.
transaction blade_1_1_1_txn.1.controlChannelQueue The queue to control_channel_task in the Transaction Server.
transaction blade_1_1_1_txn.1.indexOrganizerQueue The queue from index_organizer_driver_task to index_organizer_task in the Transaction Server.
transaction blade_1_1_1_txn.1.priority.input The priority queue to the Transaction Server.
Note: All queues prefixed above by blade_1_1_1 use the syntax blade_<engine ID>_<cluster ID>_<blade ID>.
All SNMP statistic OIDs for service queue statistics are prefixed with MATRIXX-COMMON-MIB::sysQueueStatsStatName.Service Name.Queue ID. For example:
MATRIXX-COMMON-MIB::sysQueueStatsQueueName.charging.7 = blade_1_1_1_chrg.1.5.input

This means that the queue with service ID, charging and queue ID 7 has the name blade_1_1_1_chrg.1.5.input the queue from the CAMEL Gateway to the Charging Server.

To find the number of times a queue has reached full, see MATRIXX-COMMON-MIB::sysQueueStatsFullCount.Service Name.Queue ID. For example:
MATRIXX-COMMON-MIB::sysQueueStatsFullCount.charging.7 = 57
For each queue, the system monitor can be configured to monitor the stats full count and log alarms if this increases.
The SNMP statistics for buffers are all prefixed by MATRIXX-MIB::sysBufferPoolStats. The names of each buffer pool ID are given by MATRIXX-MIB::sysBufferPoolStatsPoolName.>pool ID>. For example:
MATRIXX-MIB::sysBufferPoolStatsPoolName.6 = blade_1_1_1_shared_buffer_pool_large_6_freequeueSo pool ID

The number of buffers in the pool is given by MATRIXX-MIB::sysBufferPoolStatsCurrentCount.<pool ID> For example, MATRIXX-MIB::sysBufferPoolStatsCurrentCount.6 = 32683 means that there are 32683 free, large buffers. You can configure the system monitoring tool to log an alarm for the number of available large buffers being nearly exhausted when the value of the above statistic drops below, as an example, 5000.