Autoscaling Configuration

The number of pods for MATRIXX gateways and web apps in ReplicaSets can scale up and down based on metrics. Metrics that can trigger scaling include but are not limited to memory usage, CPU usage, transactions per second (TPS), and latency.

The following MATRIXX gateways and web apps allow autoscaling:

SBA Gateway — Configure the CHF with Ingress enabled. An Ingress controller should be provided by the platform (for example, AWS ALB for an EKS deployment or Nginx for a private cloud). The CHF handles persistent HTTP2 connections from the network over which individual requests are multiplexed. If a network load balancer is used, adding another CHF pod does not increase the capacity until the network creates a new connection. An Ingress load balances individual requests to the CHF pods to ensure that requests are evenly distributed and new pods receive traffic.
Payment Service — This component is an ActiveMQ consumer. As this component scales up, it adds more instances of the consumer to the same ActiveMQ queue, and requests are distributed across instances.
RS Gateway — This component handles HTTP/1.1 requests from the network. These requests might be configured to reuse TCP connections using an HTTP Keep-Alive header. Therefore, MATRIXX Support recommends enabling Ingress for RS Gateway for the same reason as CHF.
Gateway Proxy — This component handles internal, persistent connections from upstream components such as RS Gateway. It scales up in response to increased traffic from upstream components. However, the distribution of traffic over Gateway Proxy instances depends on receiving new incoming connections. This situation occurs as a natural result of upstream components scaling up. Ingress controllers are, therefore, not required. A plain internal Kubernetes Service is used (so no extra configuration is required).
Notification Framework — This component is an ActiveMQ consumer. As this component scales up, it adds more instances of the consumer to the same ActiveMQ queue, and requests are distributed across instances.

For components that support horizontal autoscaling, you must set resource limits for each component, for example:

resources:
    limits:
      cpu: 2
      memory: 1G

An example of an autoscaling definition follows:

autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 4
    scaleUp:
      stabilizationWindowSeconds: 300
    scaleDown:
      stabilizationWindowSeconds: 300
    metrics:
      - type: Resource
        name: cpu
        targetType: Utilization
        threshold: 60

Note: There are more complex scaling configuration and threshold parameters; however, basic CPU threshold-based scaling is recommended as a starting point.

Autoscaling Configuration Properties describes configuration properties related to autoscaling.

Table 1. Autoscaling Configuration Properties
Property	Description
`sub_chart_name`.autoscaling.enabled	When set to `true`, autoscaling is enabled. Set to `false` by default.
`sub_chart_name`.autoscaling.maxReplicas	The maximum number of pods in service at any time. Set this property to a value that can handle an unexpected high load. A value twice that of `minReplicas` is recommended. The default value is 4.
`sub_chart_name`.autoscaling.metrics	The average CPU use threshold across all pods of the application. Set this property to 50% for CHF to handle peak throughput at 50% and to 80% for Payment Service, RS Gateway, Gateway Proxy, and Notification Framework. This value is set to an empty list by default.
`sub_chart_name`.autoscaling.metrics[`m`].name	Name of the metric to use. Valid values are `cpu`, `memory`, or the name of a custom metric.
`sub_chart_name`.autoscaling.metrics[`m`].selector	A Kubernetes label selector. For more information, see the discussion about label selectors in Kuberenetes documentation.
`sub_chart_name`.autoscaling.metrics[`m`].targetType	Target value type. Valid values are `Utilization`, `Value`, and `AverageValue`. `Utilization` can only be used for resource-based metrics.
`sub_chart_name`.autoscaling.metrics[`m`].threshold	Quantity at which autoscaling activates. For more information, see the discussion about Quantity in Kubernetes documentation.
`sub_chart_name`.autoscaling.metrics[`m`].type	Type of metric. Valid values are `Resource` for resource-based metrics or `Pods` for custom metrics.
`sub_chart_name`.autoscaling.minReplicas	The minimum number of pods in service at any time. Set this parameter to the number of pods required to handle the expected traffic load. The default number is 2.
`sub_chart_name`.autoscaling.scaleDown.policies	List of policies. Set to an empty list by default.
`sub_chart_name`.autoscaling.scaleDown.policies[`n`].periodSeconds	Number of seconds for which conditions must apply before autoscaling occurs.
`sub_chart_name`.autoscaling.scaleDown.policies[`n`].selectPolicy	The aggregation policy to apply. The default value is `MaxPolicySelect`.
`sub_chart_name`.autoscaling.scaleDown.policies[`n`].type	The type of policy. The default value is `Pods`.
`sub_chart_name`.autoscaling.scaleDown.stabilizationWindowSeconds	The number of seconds for which past recommendations are considered while scaling down. Set this parameter to account for the start-up and shutdown time of an application. Set the value high enough to ensure the application has enough time to fully start or stop. The maximum amount of time is 36000 seconds (1 hour). The default value is 300 seconds (5 minutes).
`sub_chart_name`.autoscaling.scaleUp.policies	List of policies.
`sub_chart_name`.autoscaling.scaleUp.policies[`n`].periodSeconds	Number of seconds for which conditions must apply before autoscaling occurs.
`sub_chart_name`.autoscaling.scaleUp.policies[`n`].periodSeconds	The aggregation policy to apply. The default value is `MaxPolicySelect`.
`sub_chart_name`.autoscaling.scaleUp.policies[`n`].type	The type of policy. The default value is `Pods`.
`sub_chart_name`.autoscaling.scaleUp.stabilizationWindowSeconds	The number of seconds for which past recommendations are considered while scaling up. The maximum amount of time is 36000 seconds (1 hour). The default value is 0.

By default, sub-charts scale with the following behavior:

Scale down to the number of pods specified with sub_chart_name.autoscaling.minReplicas with a 300 second stabilization period.
Scale up to the higher of the following:
- Add no more than four pods per 60 seconds.
- Double the number of pods per 60 seconds without a stabilization period.

Note: Kubernetes clusters have a default tolerance of 10% beyond a threshold. This causes autoscaling to occur higher or lower than the configured threshold. For example, setting a threshold of 30% for CPU usage triggers adding pods when usage is consistently beyond 33%. Likewise, removing pods occurs when CPU usage is less than 27%. Because of the time it takes for services to start and stabilize, high thresholds are likely to tolerate existing services reaching their limits. Take this threshold tolerance into consideration when choosing a resource-based threshold.

For more information, see the discussion about setting autoscaling thresholds in MATRIXX Configuration.