Autoscaling Configuration
The number of pods for MATRIXX gateways and web apps in ReplicaSets can scale up and down based on metrics. Metrics that can trigger scaling include but are not limited to memory usage, CPU usage, transactions per second (TPS), and latency.
The following MATRIXX gateways and web apps allow autoscaling:
- SBA Gateway — Configure the CHF with Ingress enabled. An Ingress controller should be provided by the platform (for example, AWS ALB for an EKS deployment or Nginx for a private cloud). The CHF handles persistent HTTP2 connections from the network over which individual requests are multiplexed. If a network load balancer is used, adding another CHF pod does not increase the capacity until the network creates a new connection. An Ingress load balances individual requests to the CHF pods to ensure that requests are evenly distributed and new pods receive traffic.
- Payment Service — This component is an ActiveMQ consumer. As this component scales up, it adds more instances of the consumer to the same ActiveMQ queue, and requests are distributed across instances.
- RS Gateway — This component handles HTTP/1.1 requests from the network. These requests might be configured to reuse TCP connections using an HTTP Keep-Alive header. Therefore, MATRIXX Support recommends enabling Ingress for RS Gateway for the same reason as CHF.
- Gateway Proxy — This component handles internal, persistent connections from upstream components such as RS Gateway. It scales up in response to increased traffic from upstream components. However, the distribution of traffic over Gateway Proxy instances depends on receiving new incoming connections. This situation occurs as a natural result of upstream components scaling up. Ingress controllers are, therefore, not required. A plain internal Kubernetes Service is used (so no extra configuration is required).
- Notification Framework — This component is an ActiveMQ consumer. As this component scales up, it adds more instances of the consumer to the same ActiveMQ queue, and requests are distributed across instances.
For components that support horizontal autoscaling, you must set resource limits for each component, for example:
resources:
limits:
cpu: 2
memory: 1G
An example of an autoscaling definition
follows:
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 4
scaleUp:
stabilizationWindowSeconds: 300
scaleDown:
stabilizationWindowSeconds: 300
metrics:
- type: Resource
name: cpu
targetType: Utilization
threshold: 60
Note: There are more complex scaling configuration and threshold parameters;
however, basic CPU threshold-based scaling is recommended as a starting point.
Autoscaling Configuration Properties describes configuration properties related to autoscaling.
Property | Description |
---|---|
sub_chart_name.autoscaling.enabled | When set to true , autoscaling is enabled. Set to
false by default. |
sub_chart_name.autoscaling.maxReplicas | The maximum number of pods in service at any time. Set this property to a value that
can handle an unexpected high load. A value twice that of
minReplicas is recommended. The default value is
4. |
sub_chart_name.autoscaling.metrics | The average CPU use threshold across all pods of the application. Set this property to 50% for CHF to handle peak throughput at 50% and to 80% for Payment Service, RS Gateway, Gateway Proxy, and Notification Framework. This value is set to an empty list by default. |
sub_chart_name.autoscaling.metrics[m].name | Name of the metric to use. Valid values are cpu ,
memory , or the name of a custom metric. |
sub_chart_name.autoscaling.metrics[m].selector | A Kubernetes label selector. For more information, see the discussion about label selectors in Kuberenetes documentation. |
sub_chart_name.autoscaling.metrics[m].targetType | Target value type. Valid values are Utilization ,
Value , and AverageValue .
Utilization can only be used for resource-based
metrics. |
sub_chart_name.autoscaling.metrics[m].threshold | Quantity at which autoscaling activates. For more information, see the discussion about Quantity in Kubernetes documentation. |
sub_chart_name.autoscaling.metrics[m].type | Type of metric. Valid values are Resource for resource-based metrics
or Pods for custom metrics. |
sub_chart_name.autoscaling.minReplicas | The minimum number of pods in service at any time. Set this parameter to the number of pods required to handle the expected traffic load. The default number is 2. |
sub_chart_name.autoscaling.scaleDown.policies | List of policies. Set to an empty list by default. |
sub_chart_name.autoscaling.scaleDown.policies[n].periodSeconds | Number of seconds for which conditions must apply before autoscaling occurs. |
sub_chart_name.autoscaling.scaleDown.policies[n].selectPolicy | The aggregation policy to apply. The default value is
MaxPolicySelect . |
sub_chart_name.autoscaling.scaleDown.policies[n].type | The type of policy. The default value is Pods . |
sub_chart_name.autoscaling.scaleDown.stabilizationWindowSeconds | The number of seconds for which past recommendations are considered while scaling down. Set this parameter to account for the start-up and shutdown time of an application. Set the value high enough to ensure the application has enough time to fully start or stop. The maximum amount of time is 36000 seconds (1 hour). The default value is 300 seconds (5 minutes). |
sub_chart_name.autoscaling.scaleUp.policies | List of policies. |
sub_chart_name.autoscaling.scaleUp.policies[n].periodSeconds | Number of seconds for which conditions must apply before autoscaling occurs. |
sub_chart_name.autoscaling.scaleUp.policies[n].periodSeconds | The aggregation policy to apply. The default value is
MaxPolicySelect . |
sub_chart_name.autoscaling.scaleUp.policies[n].type | The type of policy. The default value is Pods . |
sub_chart_name.autoscaling.scaleUp.stabilizationWindowSeconds | The number of seconds for which past recommendations are considered while scaling up. The maximum amount of time is 36000 seconds (1 hour). The default value is 0. |
By default, sub-charts scale with the following behavior:
- Scale down to the number of pods specified with
sub_chart_name.autoscaling.minReplicas
with a 300 second stabilization period. - Scale up to the higher of the following:
- Add no more than four pods per 60 seconds.
- Double the number of pods per 60 seconds without a stabilization period.
Note: Kubernetes clusters have a default tolerance of 10% beyond a threshold. This causes autoscaling to occur higher or lower than the configured threshold. For example, setting a threshold of 30%
for CPU usage triggers adding pods when usage is consistently beyond 33%. Likewise, removing pods occurs when CPU usage is less than 27%. Because of the time it takes for services to start
and stabilize, high thresholds are likely to tolerate existing services reaching their limits. Take this threshold tolerance into consideration when choosing a resource-based
threshold.
For more information, see the discussion about setting autoscaling thresholds in MATRIXX Configuration.