Cloud Native Infrastructure Requirements

Cloud native MATRIXX deployments have infrastructure requirements for third-party software versions, Kubernetes pod characteristics, and container characteristics. Nodes running MATRIXX components have operating system, memory, networking, and storage requirements.

Helm

MATRIXX uses Helm for deploying and upgrading in a controlled manner. The MATRIXX Helm chart packages the required Kubernetes manifest files and provides templating for deployment customization.

For information about version requirements, see the discussion about cloud native third-party software requirements.

Kubernetes

Production deployments require a minimum of two Kubernetes clusters. One for the active MATRIXX Engine, gateways, and web apps, another one for the standby engine, gateways, and web apps. Each cluster requires three master etcd nodes for high availability (HA). To prevent performance degradation, disable memory swap on all master and worker nodes.

The Kubernetes environment should support geo-redundant sites with inter-site latency of no more than 50 ms.

Call Control Framework-Network Enabler (CCF-NE) and the Event Repository must reside outside of the two MATRIXX clusters. The Event Repository may be hosted in virtual machines (VMs) or bare metal servers completely outside of a Kubernetes environment if necessary.

For information about version requirements, see the discussion about cloud native third-party software requirements.

Kubernetes Pods

Kubernetes deployments and StatefulSets ensure that for each function, a required number of pods is available at any point. A Kubernetes Operator supports native auto-healing, allowing Kubernetes to automatically reinstate a pod that has disappeared or is not functioning correctly. The Operator uses liveness probes to detect the state of all pods in a deployment and ensures a newly reinstated pod can rejoin the running cluster without manual interaction.

Engine processing pods require dedicated worker nodes. If SBA Gateway will require 80% or more of worker node vCPU capacity, it should run on a dedicated worker node. At lower vCPU usage levels, SBA Gateway can run on a shared worker node. Other pods can share worker nodes as constrained by affinity rules.

Containers

Cloud native MATRIXX is delivered in Docker images based on Red Hat Universal Base Image 8 (UBI8) version 8.10. All containers require Linux kernel v4 or higher on worker nodes.

Node Hardware and Memory Requirements

Nodes require x86 hardware with at least dual-CPU sockets running at 2.3 GHz frequency for physical cores. Two vCPUs are assumed to be equal to one physical core due to hyperthreading. Depending on TPS requirements, nodes must support 8 vCPUs or more.

Nodes require a minimum of 30 GB RAM. Depending on the number of subscribers, nodes might require 512 GB or more for large environments.

Node Networking

A 10 Gb Ethernet network with redundant physical links for HA is required. MATRIXX Support recommends use of single-root I/O virtualization (SR-IOV) and a separate dedicated transaction network for engine pods. MATRIXX Support recommends use of a pod network with overlay add-ons such as Calico or Flannel for cluster-wide inter-pod communication.

MATRIXX has verified the capability to define Multus configuration for providing a network for 5G signalling to CHF pods separate from the operation and maintenance (O&M) network for Kubernetes.

A network load balancer is required for higher-volume ingress traffic, such as Diameter and 5G network functions, along with an application load balancer (or a separate ingress controller and network load balancer) for HTTP traffic to and from the Business API gateway.

For information about setting operating system network parameters and kernel optimization requirements, see the discussion about configuring OS network parameters.

Storage

MATRIXX requires the following types of shared storage with ReadWriteMany (RWX) capability:

Fast shared storage for engine transaction log files. This is SSD/flash based storage. It requires a 10 Gb network with maximum storage transaction latency of 10 ms.
Standard shared storage for general logging and archiving transaction log files, checkpoints, and event records. These resources can be HDD-based.

Shared storage can be as simple as mounting a volume from a SAN/NAS to the cluster as a PersistentVolume using NFS, preferably over a dedicated network. Depending on requirements, the shared storage volumes should provide at least 200 GB of usable storage per site, up to ~4 TB per site for large environments. The nodes should also have at least 25 GB of local storage per pod for running containers.

Event Repository

Each MongoDB node must support storage of at least 200 GB SSD (RAID1) for indexes and 1 TB SAS (RAID10) for data, up to ~4TB SSD and ~40 TB SAS for very large environments and for long storage periods, depending on Event Repository requirements. This storage can be local or SAN mounted.

In public cloud deployments MongoDB support is vendor-provided.

Optional Tools

Third-party container image repositories, monitoring software such as Prometheus and Grafana; log aggregation/analysis software such as Fluent Bit, ElasticSearch, or Kibana; or continuous integration/delivery (CI/CD) solutions might also be desirable. Exporting subscriber data for analysis might require a MySQL database.