Retrieving Core Dump Information

Core dumps are files that include the complete contents of the memory of a process at a point in time. Core dump handling in MATRIXX must be enabled per namespace. By default this feature is disabled and the commonly mounted /coredumps directory is empty until this feature is enabled.

Enable core dumps with a DaemonSet that creates and runs a privileged pod on each node in the specified namespace. If no namespace is given, core dumps are enabled for the default namespace. The pod runs a command that tells the Linux kernel on the node to save core dump information to a specific location. MATRIXX Support suggests the shared mount point /coredumps because it persists if the pod does not, which is common with single process pods or if the parent process of a pod crashes.

Note: When applying a DaemonSet, pods are not restarted and do not need to be restarted.

Applying a DaemonSet has the following important considerations:

  • Any pods running on nodes where the DaemonSet is applied (or created after the DaemonSet is applied) write core dump information into the /coredumps directory. If that directory is missing, lacks permissions, or has no space to write core files, the Linux kernel does not create core dump files at all.
  • Any existing core dump logic location is overwritten for applications running on nodes where the DaemonSet is applied.
  • MATRIXX Support recommends using tolerations and node affinity to target specific nodes for MATRIXX applications.

DaemonSets can be applied using a command similar to the following:

kubectl apply -f values.yaml

The following is a DaemonSet that enables core dump file creation where applied:

apiVersion: "apps/v1"
kind: "DaemonSet"
metadata:
  name: "sysctl-enable-coredumps"
spec:
  selector:
    matchLabels:
      app: sysctl-enable-coredumps
  template:
    metadata:
      labels:
        app: "sysctl-enable-coredumps"
    spec:
      initContainers:
        - name: "sysctl"
          image: "busybox:latest"
          securityContext:
            privileged: true
          command:
            - "/bin/sh"
            - "-c"
            - "sysctl -w fs.suid_dumpable=1 kernel.core_uses_pid=1 kernel.core_pattern=/coredump/core_%e_%p_%h_%t"
      containers:
        - name: "pause"
          image: gcr.io/google_containers/pause