Managing System Memory

Because the in-memory databases resize dynamically as needed, it is difficult to determine when an engine is almost out of free memory. Several statistics can indicate issues and you can take several actions to free memory.

Total Shared Memory

Check the total shared memory for a server with the print_blade_stats.py -Y command. In the following example, the maximum memory available is 21744, and a threshold notification is sent when the available memory falls to 50 MB:

Sys Stats
---------
        Monitoring                     Response Time                    Memory Pool
          Interval   Processing  Threshold in millis     in use       Max Threshold
NodeId   (seconds)       Errors        Avg       Max       (MB)      (MB)      (MB)
===================================================================================
     1           5            0         70       500       3705     21744        50

You can change the threshold at which a notification is sent by editing the create_config.info question: SNMP:What is the system memory notification threshold in MB?.

Database Memory

Check the number of objects in the Subscriber, Activity, Balance Set, and Event databases. If many objects exist, it might mean that expired product offers, sessions, and balances have not been removed by the automatic cleanup operations. To view database statistics, use the following command:

print_blade_stats.py -B

If the maximum number of sessions is nearing, you can increase the number by changing the answer to this create_config.info question: What is the maximum number of active sessions?.

If a lot of sessions are open, several system configuration settings can be tuned to possibly reduce the number. See Automatic Clean Up Operations.

Automatic Clean Up Operations

The following operations occur to free up memory on all servers:

  • Garbage Collection — A configurable, automatic garbage collection process ensures that allocated memory is returned to the available free memory pool. As objects in the database are deleted, reduced in size, or moved because they increased in size, the holes left behind are tagged for cleanup. The cleanup process runs when the percentage of memory that is fragmented reaches a configured threshold, when the number of fragmentation holes reaches a configurable threshold, and at a scheduled, configurable interval. When garbage collection runs, the holes are consolidated into larger blocks and returned to the free memory pool. The create_config.info question that sets the garbage collection interval and batch size is: Do you want to use the default database garbage collection settings?. To verify the garbage collection settings, view the mtx_debg_log file or mtx_config.xml file.

    By default, garbage collection is triggered when the size of fragmentation reaches 11 percent of the total database segment size.

  • Session cleanup — The following Task Manager configuration parameters affect session cleanup:
    • Global:How long after the last RAR retry should the session be torn down?
    • TaskMgr:At what interval (in seconds) should the activity database be scanned for operations that require processing?
    • TaskMgr:How many activity database task messages should be sent per second per blade?
    • TaskMgr:How many Activity database task messages should be sent per second per engine?
    • TaskMgr:How many outstanding Activity database cleanup requests should be sent per blade before pausing?
    • TaskMgr:How many outstanding Activity database cleanup requests should be sent per engine before pausing?

    For information about the last RAR retry parameter value, see the discussion about global system configuration in MATRIXX Installation and Upgrade. For information about the Task Manager parameters values, see the discussion about Task Manager configuration in MATRIXX Installation and Upgrade.

  • Expired purchased offer cleanup — By default, product offers that have expired, either because they have been canceled or their validity period has ended, are removed from the owning subscription, group, or device 45 days after they expire. The create_config.info question that sets this value is: Global:How long (in seconds) should expired offers be retained before being purged from the system?.
  • Expired balance cleanup — By default, expired balances are removed from the owner's wallet 45 days after they expire. The create_config.info question that sets the value is: Global:How long (in seconds) should expired balances be retained before being purged from the system?.
  • Event cleanup — The Task Manager scans the event database to initiate the removal of old event detail records (EDRs). Several questions guide the cleanup operation:
    • TaskMgr:How long (in micros) should the event cleanup scanner pause between each scan?
    • TaskMgr:How many outstanding event cleanup messages should be sent per blade before pausing?
    • TaskMgr:How many outstanding event cleanup messages should be sent in the engine before pausing?
    • TaskMgr:What is the maximum number of events that can be deleted in a single transaction?
    • TaskMgr:How large (in bytes) should the event database grow before events are deleted?
    The last question, TaskMgr:How large (in bytes) should the event database grow before events are deleted?, sets the event cleanup threshold. During system configuration, the create_config.py script verifies that this threshold value is less than the difference between the maximum event database size plus one extended data segment size. If the event cleanup threshold is more than the allowed value, the create_config.py exits with an error similar to the following error:
    Error: based on event database configuration (maxSize=65127055360 bytes), event cleanup purge threshold cannot be more than
    65063454720 bytes
    In such cases, administrators must change the answer to the last question to a value that is less than or equal to the value indicated in the error message.
  • EVENT_REQUEST record cleanup — EVENT_REQUEST records are stored in the activity database in case a one-shot usage event fails, for example, an SMS could not be sent due to a network failure. In such cases, the associated EVENT_REQUEST record is required to refund the account. The default is one day. The create_config.info question that sets the value is: Global:How long (in seconds) should EVENT_REQUEST records be retained before being purged from the system?

For more information, see the discussion about the Task Manager configuration parameters in MATRIXX Installation and Upgrade.

Manual Cleanup Operations

If the available memory is nearing the low threshold, you can perform the following operations to increase the available amount:
  • Restart servers on a regular basis to clean up stale processes and memory.
  • Increase the size of the total system memory. The system uses shared memory for all databases, queues, and other internal structures. The create_config.info question that sets the maximum sizes of these structures is:

    What is the shared memory size in MB to use?