print_blade_stats.py
The print_blade_stats.py script displays statistics for MATRIXX Engine, in-memory databases, cluster and server states, system memory, notifications, Task Manager, Diameter, MDC Gateway, CCF statistics, checkpoint statistics (such as the server ID, last or in-progress checkpoint state, checkpoint start and end times, type of checkpoint (ad hoc, fast restart), related Global Transaction Counter (GTC), and checkpoint name and path), system monitor, and others. For a description of all SNMP statistics, see the discussion about all MATRIXX SNMP statistics in MATRIXX Monitoring and Logging.
How to Use this Script
Use this script to collect debugging information for specific MATRIXX components that you think might have an issue. MATRIXX provides an automated mechanism that runs the capture_diagstats_tofile.py script on a configurable schedule to capture the same statistics that print_blade_stats.py does. This automated mechanism uses fewer system resources and is configured to run more efficiently, and the output is better suited to coordinating with other diagnostic tools.
Syntax
print_blade_stats.py [ -h | -a | -A | -B | -C | -E | -d | -D | -F| -G | -H | -I | -J | -K | -l | -L | -M | -N | -O | -P | --peer_manager | -Q | -r seconds | -R | --rca | --route_cache_agent | -S | --stream | -T | -U | -v | -V | -W | -X | -Y | -Z ]
An asterisk (*) character next to a pricing database indicates that the database is active.
Options
The print_blade_stats.py script has the following options in addition to the general server command line options.
When run without any of the following options, the print_blade_stats.py script prints all statistics for the server. It can be run with a subset of the following options to print specific statistics.
For descriptions of the server command line options, see the discussion about command line options for server scripts in MATRIXX Administration.
- -h, --help
- Prints help information for this script.
- -a
- Print AbsoluteTimer service statistics for various system services and tasks. Statistics for each system task include the number of times AbsoluteTimer passed, the last delay time in microseconds and the related timestamp, and the maximum delay time in microseconds and the related timestamp. These statistics can help you monitor processing threads and whether system tasks run on time or if there might be a pattern of system latency.
- -A, --map_call_out
- Print Mobile Application Part (MAP) statistics for MAP-ATI and MAP-SRI call-out requests. The statistics include the number of requests made and successful responses returned, the number of timeouts, and the number of notifications for failed messages.
- -B, --database
- Print database segment, memory, object, index, timer index, and OID index statistics. If a database has compression enabled, this option also prints compression statistics for the database. The compression ratio is equal to the uncompressed size of an object divided by the compressed size. For example, if the wallet object compression ratio is 1.5, it means if the database was not compressed, the wallet would be 1.5 times larger than the compressed size.
- -C, --cluster_stats
- Print cluster-level information for the local cluster and peer clusters. Local cluster information includes the node ID, service role (processing,
publishing, or checkpointing), node state, and IP address. Peer and processing cluster information includes the engine ID, cluster ID, cluster state, system schema version, and
fully qualified cluster ID (engine:cluster). If this script is run from the processing cluster leader, it also identifies the cluster
leaderID
. If this script is run from the non-leader processing cluster leader, the statistics header is returned with a note to indicate that the statistics are only available from the processing server that is the cluster leader. The cluster peer is the cluster that the local cluster is receiving transactions from and replaying. If the cluster does not have an HA peer or is not supporting another cluster (it is the active cluster), the fully qualified ID is 0:0. - -d, --debug
-
Debug flag. If this option is specified, extra messages are printed to help with debugging this script. By default, the script does not run in debug mode.
- -D, --diameter
- Print diameter SNMP PDU table statistics, Diameter Gateway error result statistics, and latency and connection statistics. Diameter Gateway error result statistics include the total requests received, total
responses sent, average response time, and maximum response time for each
Diameter application and command-code combination.
Other information provided includes:
- Malformed Requests — The total number of malformed requests, for example, receiving a non-Diameter packet.
- Permanent Failures — The total number of permanent failures, for example, any 5xxx Result-Code (per RFC-6733). This total is not incremented.
- Protocol Errors — The total number of protocol errors, for example, any 3xxx Result-Code (per RFC-6733). This total is not incremented.
- Transient Failures — The total number of transient failures, for example, any 4xxx Result-Code (per RFC-6733).
- Transport Down — The total number of transport down errors. This total is not incremented.
- Unknown Types — The total number of unknown types errors. This total is incremented when a packet is received that is not mapped to a MATRIXX Data Container (MDC) in the diameter_dictionary.xml file.
Diameter Gateway latency statistics are recorded for latency buckets (which are time segments), maximum message latency per connection, and Diameter Gateway-related tasks. For each task, the statistics include the total and average latencies. Connection statistics include the number of bytes sent and received, number of messages sent and received, and number of errors that occurred.
When printing the Diameter statistics, print_blade_stats.py uses a hard-coded dictionary to get Diameter PDU statistics. The description refers to the IANA Diameter assignments for the Application ID and Command Codes. If a match is found, it is used. If a match is not found, the script looks for a match in the diameter_dictionary.xml file. You can add Application IDs or commands in the diameter_dictionary.xml file. print_blade_stats.py uses the Application ID and the command value as a key into the dictionary. The hard-coded dictionary has the following values:
'0:257': 'common:CE', '0:258': 'common:RA', '0:274': 'common:AS', '0:275': 'common:ST', '0:280': 'common:DW', '0:282': 'common:DP', '1:265': 'nasreq:AA', '3:271': 'accounting:AC', '4:258': 'credit-control:RA', '4:272': 'credit-control:CC', '16777217:306': 'Sh:UD', '16777217:307': 'Sh:PU', '16777217:308': 'Sh:SN', '16777217:309': 'Sh:PN', '16777236:258': 'Rx:RA', '16777236:265': 'Rx:AA', '16777236:274': 'Rx:AS', '16777236:275': 'Rx:ST', '16777238:258': 'Gx:RA', '16777238:272': 'Gx:CC', '16777302:8388635': 'Sy:SL', '16777302:8388636': 'Sy:SN', '16777302:275': 'Sy:ST', '33686018:430': 'private:mdc',
- -E, --event_loader
- Print Event Loader statistics. The statistics include the number of database errors that
were logged after the Event Loader started, the number of MATRIXX Event
Files (MEFs) in the backlog that are ready to be loaded (but are
not yet loaded) into the Event Repository, the number of MEFs loaded (Mef Loaded), number of MEFs rejected (Mef
Rejected), number of events loaded (Events Loaded), the latest event time
from the last loaded MEF (Last Event Time), and GTC statistics,
including:
- Max Available GTC — The highest GTC available for reading and loading.
- Last Processed GTC — The GTC of the work order that was most recently processed.
- Last Loaded GTC — The GTC of the work order that was most recently loaded to the Event Repository.
Note: This option does not run on an active processing server. - -F, --signalling
- Print Signaling Network statistics. The statistics include the signaling link name, state of the link, received rate limit and number of delivery errors that were logged, and number of messages sent and received.
- -G --charging
- Print the Charging Server statistics, including average, minimum, and maximum latencies when processing messages, average number of transactions processed per second, number of duplicate messages encountered, number of transactions rejected due to collisions, and number of transactions retried. This information also includes message retry information, such as minimum, maximum, and average wait times and the message count for a given retry count.
- -H, --call_start
- Print callback call start statistics for the number of successful and failed callback call start attempts.
- -I, --ussd_call_back
- Print USSD callback statistics for the number of successful and failed callback requests.
- -J, --tcap
- Print TCAP (Transaction Capabilities Application Part) statistics, including the number of TCAP protocol messages sent and received, number of messages not sent due to an error, and number of messages rejected.
- -K, --task_manager
- Print Task Manager statistics for managing the schedule database, including notifications, recurring processing, event cleanup, and session cleanup.
- -l, --ldap
- Print LDAP (Lightweight Directory Access Protocol) Gateway statistics for the number of successful LDAP requests and responses. Statistics include the following:
- -L --camel_gateway
- Print CAMEL (Customized Applications for Mobile network Enhanced Logic) Gateway statistics for the number of charging sessions started and ended.
- -M --sms_charging
- Print CAP3 (CAMEL Application Part 3) SMS statistics, such as the number of valid and invalid SMS operations, SMS messages for which charging was applied immediately, SMS messages for which a reservation was made, and rejected and failed SMS operations.
- -N --notifications
- Print notification processing statistics, such as the number of unique notification messages sent, acknowledgments received, and failures due to maximum retry timeouts, address failures, and socket failures.
- -O, --tsan
- Print TSAN (Temporary Subscriber Access Number) statistics for the CAP1 re-origination service, such as the number of TSAN requests, timeouts, successful releases, and bad messages.
- -P, --pools
- Print memory pool and shared buffer pool (large buffer and huge buffer) statistics for each database.
- --peer_manager
- The MATRIXX Peer Manager is a networking infrastructure function for managing TCP connectivity between MATRIXX services. Each running instance reports its current peers' connectivity state along with server side information. This option prints MATRIXX Peer Manager statistics, including the debug name, server address, and local peer ID. For each peer, it prints the debug name, peer ID, address, state, time when connected, and time when disconnected.
- -Q, --queues
- Print queue statistics for Charging Server, Transaction Server, Diameter Gateway, and MDC Gateway. The statistics include the queue sizes, maximum reached size, number of times the queues were full or empty, and information about the number of messages read in each queue.
- -r seconds, --repeat_seconds seconds
-
Reprint the statistics every specified number of seconds.
- -R, --replay
- Print the following transaction replay statistics.
These statistics are only meaningful when run on the active cluster. After
the active cluster is started, the Destination Cluster ID column in the
output shows its own processing cluster ID and has a nonzero Checkpoint
Replay File Count. After the database restore completes, this column is not
displayed. Note: The Current Replay Batch Count and Current Replay Txn Count statistics are only useful during MATRIXX Engine start-up. These statistics indicate how many files or objects must still be replayed to start the engine. When you are performing a cold restart on an active engine, these statistics list the number files that must be replayed before the engine is available to process transactions. When you are starting a standby engine, these statistics list the number of objects that must be replayed before the engine is available to process transactions.
- For real-time replay on a
standby cluster:
- The GTC that the server is processing.
- The GTC that is being replayed.
- For synchronization of a
standby cluster when it starts:
- The number of outstanding transaction batches to replay. This value must always be equal to or less than the number of processing servers if the engines are in perfect synchronization.
- The number of outstanding transactions to replay.
- The number of outstanding database objects that must be replayed on a standby cluster to get its databases up-to-date. Every database has a number of objects to replay. After all objects in one database are sent, a new count of another database's objects starts until all databases have completed synchronization. The new count adds to the exiting object count that has not finished replay (the object count does not go to zero before replaying objects from next database). The Database Replay Object Count is be a nonzero value until the standby cluster finishes the database initialization process. The Checkpoint Replay File Count is not used when a standby cluster is starting.
- The GTC that the server is processing.
- The GTC that is being replayed.
- For a cold start-up of an engine (either after a complete system failure or start-up of a standalone system), prints the number of outstanding checkpoint files and transaction log files to replay when the engine restores its databases from a checkpoint. The Checkpoint Replay File Count is only be nonzero when the cluster starts and restores from a checkpoint. After the database is restored, the value is always 0. This file count depends on the number of real-time replay batches that are outstanding at that moment, including those being replayed and those queued for replay on the publishing server. The Checkpoint Replay File Count value is always be zero when the second engine is starting.
To support two standby clusters, the SNMP Object ID (OID) for monitoring real-time replay stats to a standby cluster is "txnReplayCurrentTransactionBatchCount.engineId.clusterId," where engineId and clusterId are the engine ID and cluster ID of the standby cluster to watch.
- For real-time replay on a
standby cluster:
- --rca, --route_cache_agent
- Print Route Cache Agent statistics, including the debug name, server address version, MPM name, and diagnostic counters.
- -S, --services
- Print MATRIXX service statistics, such as the service process ID, number of errors, memory usage, and CPU usage.
- --stream
- When event streaming is enabled on the engine, use
this option to print the internal stream statistics of the Event Stream Server, including GTC Sorter, SEF Writer, Stream Publisher, and MEF Publisher.
- GTC Sorter Statistics:
- Low GTC — The transaction with the lowest GTC that the sorter is waiting to receive to process.
- High GTC — The transaction with the highest GTC the sorter has received and processed.
- Current Count — The number of transactions in the sorter.
When the current count is zero (
0
), the Low GTC column has no meaning. - Max Count — The highest number of transactions that has ever been in the sorter at one time.
- Max Size — The maximum number of transactions that the sorter can hold. If this number is exceeded, the sorter does not process.
- SEF Writer Statistics:
- Last Processed GTC — The GTC of the work order most recently processed.
- Last Written GTC — The GTC of the work order most recently written to a Streamed Event File (SEF) or MEF.
- Stream Publisher Statistics:
- Max Available GTC — The highest GTC available for reading, writing, or publishing.
- Client Connections — The total number of current client connections.
- Buffer Count — The number of configured memory buffers. Buffers send and receive requests and responses, and they are shared across all connections.
- Free Buffer Count — The number of remaining buffers that are still available.
- Connection Statistics:
- Session ID — The session ID being read when this utility was run.
- Role — The Event Streaming Framework HA role (Leader/Non-leader).
- Cursor — The cursor being processed.
- Last time — The time of last event transmission by Event Stream Server to Event Streaming Framework.
- Last Count — The number of events included in the last transmission to Event Streaming Framework.
- Total Count — The total number of events sent by a specific stream.
- ReqEvents — The maximum number of events requested by Event Streaming Framework.
- ReqBytes — The maximum number of bytes requested by Event Streaming Framework.
- Filter — A string representing the stream event filter, for example, CancelEvent or ChargeEvent.
- MEF Publisher Statistics:
- Max Available GTC — The highest GTC available for reading, writing, or publishing.
- Last Processed GTC — The GTC of the work order most recently processed.
- Last Written GTC — The GTC of the work order most recently written to a SEF or MEF.
- Last To Be Published GTC — The GTC most recently transferred to a local directory from where the records can be published to a remote target.
- Last Published GTC — The GTC of the work order most recently published to the destination.
- GTC Sorter Statistics:
- --system_monitor
- Displays the System Monitor current node usage level, node service state (
ok
oroos
(out of service)), monitored objects, and the last transition time. For details about System Monitor, see the discussion about MATRIXX System Monitor in MATRIXX Architecture. - -T, --txn
- Print transaction number and GTC statistics in different tables. These statistics must be monitored.
- -U, --ussd_call_out
- Print Unstructured Supplementary Service Data (USSD) statistics for MAP-USSD Notify call-out requests to send USSD notifications. The statistics include the number of requests made and successful responses returned, number of timeouts, and number of notifications for failed messages.
- -v, --print_version
- Print the RPM version number when printing other
statistics. The default value is
True
. To omit the version number from the output, run the script with any options and specify-v 0
. - -V, --voice_charging
- Print Voice Charging statistics, including the number of valid and invalid IDPs received, number of ApplyCharging messages sent and ApplyChargingReport messages received, number of calls for which quota was granted, number of free calls, number of rejected calls, and the number of disconnected, busy, or abandoned calls. Voice Charging statistics also include announcement and VXML script statistics such as the number of announcements or scripts attempted, number of announcements or scripts completed, and number of failed announcements.
- -W, --mdc_gateway
- Print MDC Gateway statistics, including latency statistics and connection statistics. Latency statistics are recorded for latency buckets (which are time segments), maximum message latency per connection, and MDC Gateway-related tasks, such as Transaction Manager prepare and commit tasks. For each task, the statistics include the total and average latencies. Connection statistics include the number of bytes sent and received, number of messages sent and received, and number of errors.
- -X, --ussd_in
- Print USSD incoming service statistics for MAP Process-UnstructuredSS-Request messages. The statistics include the number of requests made and notifications sent.
- -Y, --system
- Print system-level information, including the logical server ID, monitoring interval, number of processing errors, average response time to the network, total amount of system memory allocated for MATRIXX databases and work buffers, amount of memory in use, and cluster heartbeat information. Heartbeat information includes the number of heartbeats sent, received, and missed by the server.
- -Z, --disk_usage
- Print disk usage statistics. This option directs print_blade_stats.py to display disk usage statistics for the SSD and SAN storage that it gets from the global.storage_layout.local_directory and global.storage_layout.shared_directory entries, respectively, defined in the mtx_config.xml file. If the output result is an empty table, those entries are missing or configured incorrectly.