Setting Autoscaling Thresholds

Select resource limits and thresholds carefully. If a service exceeds a specified limit, Kubernetes restarts the pod, resulting in a loss of service if it is the only pod for that service. Scaling has latency caused by the time required for pods to start and stabilize. Thresholds should enable pods to avoid being restarted while the service is being scaled up.

Kubernetes clusters have a default tolerance of 10% beyond a threshold. This causes autoscaling to occur higher or lower than the configured threshold. For example, setting a threshold of 30% for CPU usage triggers adding pods when usage is consistently beyond 33%. Likewise, removing pods occurs when CPU usage is below 27%. Because of the time it takes for services to start and stabilize, high thresholds are likely to tolerate existing services reaching their limits. Take this threshold tolerance into consideration when choosing a resource-based threshold.

The cluster tolerance can be set using the --horizontal-pod-autoscaler-tolerance flag. For more information, see the discussion about algorithm details in Kubernetes documentation.

The location of the flag depends on the underlying implementation of the cluster. Some public cloud implementations do not expose this flag. See the documentation for your implementation for details.