inspect_engine_metrics.py

The inspect_engine_metrics.py script analyzes engine metrics against user-defined boundaries. It can be run inside a MATRIXX pod or in a Kubernetes cluster outside of a MATRIXX pod.

Syntax

The inspect_engine_metrics.py script has the following options, in addition to the standard engine command line options.
inspect_engine_metrics.py [--first-run] | [--get-labels] | [--calculate] --create-pod
Where --create-pod is only used if the script is being run in a Kubernetes cluster (outside a MATRIXX pod).

Options

Note: One of these three options is required: --first run, --get-labels, or --calculate.
--first-run
Creates a default metrics_config.csv file with commonly used metrics and services. It collects available labels for the specified pod and metric combinations from Prometheus and prompts you to enter the minimum, maximum, and upper limit percentage values they want for each metric.
--get-labels
Prepares a CSV file with all available labels for the specified pod and metric combinations from Prometheus. It allows you to update the metrics_config.csv file with the labels you want.
--calculate
  • Fetches metric values from Prometheus based on the metrics_config.csv file.
  • Calculates the status of each metric (ABOVE_MAX, BELOW_MIN, or IN_BOUNDS) based on user-defined boundaries.
  • Outputs the metrics that are outside the defined boundaries (use the --verbose flag to print all metrics).
  • Updates the metrics_config.csv file with metric values and status.
  • Creates a separate CSV file, all_remaining_metrics.csv, with all available metrics not included in the metrics_config.csv file.
--dir
Defines the directory where metrics_config.csv is stored. The default is the current working directory.
--config-csv
Configures the metrics CSV file. The default is metrics_config.csv.
--verbose
When set to true, prints metrics of all statuses, not just outside bounds. The default is false (disabled).
--prom-svc-dns-name
The fully resolved Prometheus server domain name. The default is prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local.
--prom-svc-port
The Prometheus server port. The default is 9090.
--create-pod
When set to true, creates a new pod that downloads from the Prometheus server, saving the data outside of any pod. The default value is false.
--temp-pod-name
Name of the pod to create for downloading from the Prometheus server. The default is temp-pod.
--namespace
The Kubernetes namespace to use for creating the pod. The default is matrixx.
--kube-config-path
The path to the Kubernetes configuration file. The default is ~/.kube/config.
--image-name
The name of the image to use for the pod.

Usage

The inspect_engine_metrics.py script is included in the mtx-debug-kit image and can be attached to a running pod.

Attach the image to a running pod, in the example below publ-s1e1-0:
``` bash
kubectl debug -it -c debugger \
    --target ctr-1 \
    --image your-image-repository/mtx-debug-kit:1.0.0 \
    publ-s1e1-0
```
Where your-image-repository is the URL of your image repository.