Configure Email Alerts in Kasten K10

In this guide we will review the installation and configuration of prometheus in order to obtain alerts via email using the federation, rules and alertmanager of prometheus in conjunction with the monitoring of Kasten K10.

Initial Steps

As usual, we will always review the official documentation of the solutions.ones that we will install and/or configure in this guide.

Prometheus: https://prometheus.io/

Kasten: https://docs.kasten.io/latest/operating/monitoring.html

We will carry out the configuration in a first part with the integration of K10 Multi-Cluster Manager and then we will see how to configure the rules when it is an installation without centralized administration.

What is Prometheus?

Prometheus is a solution focused on monitoring the resources of kubernetes based on time series metrics, that is, real-time monitoring of the actionsones that they are configured. For example with prometheus you could monitor, via rules, the use of CPU, Memory, Conexioneyes, sesionejust whatever you want to configure.

It is also the standard solution for monitoring clusters of kubernetes since it allows us to have a very detailed view of the resources to be monitored as well as helping us to solve any errors that exist.

Kasten K10 and Prometheus

Kasten also uses Prometheus for its internal monitoring of K10. In fact, in the previous link, we can see that there are many metrics that Kasten k10 export to Prometheus as for example:

  • catalog
  • jobs
  • actions
  • backup
  • restore
  • export
  • import
  • report
  • run

So, how can we configure Prometheus to send us an email when, for example, some backup policy does not work correctly?

Prometheus setup on Kasten K10

As we know, prometheus already comes pre-installed on Kasten K10, but for this instance it is not recommended that it be modified, since it is managed by helm and has certain settings.ones that work directly with the default reports of Kasten K10 and also for K10 Multi-Cluster Manager if you have it activated. Therefore the idea is to federate Prometheus that comes pre-installed (so as not to modify it) with a new instance of Prometheus:

K10 Federal Prometheus

Installation and Configuration Prometheus

Now we will proceed to create a namespace for our monitoring instance, in this case we will call it alerts, for this we will execute the following in our cluster of kubernetes:

kubectl create ns alertas
create namespace

And now we will add the repository for Prometheus helm:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
add helm repository

And we will create a new file named “kasten_prometheus_values_smtp.yaml”:

defaultRules:
  create: false
alertmanager:
  config:
    global:
      resolve_timeout: 5m
    route:
      repeat_interval: 30m
      receiver: 'email'
      routes:
      - receiver: 'email'
        match:
          severity: kasten
    receivers:
      - name: 'email'
        email_configs:
          - to: [email protected]
            from: [email protected]
            smarthost: smtp.24xsiempre.com:25
            auth_username: SuperDuperUserName
            auth_password: SuperDuperPassword
prometheus:
  prometheusSpec:
    additionalScrapeConfigs:
      - job_name: k10
        scrape_interval: 15s
        honor_labels: true
        scheme: http
        metrics_path: '/k10/prometheus/federate'
        params:
          'match[]':
            - '{__name__=~"jobs.*"}'
            - '{__name__=~"catalog.*"}'
        static_configs:
          - targets:
            - 'prometheus-server.kasten-io.svc.cluster.local'
            labels:
              app: "k10"
#Valores para deshabilitar componentes que no son necesarios 
grafana:
  enabled: false
kubeApiServer:
  enabled: false
kubelet:
  enabled: false
kubeStateMetrics:
  enabled: false
kubeControllerManager:
  enabled: false
kubeEtcd:
  enabled: false
kubeProxy:
  enabled: false
coreDns:
  enabled: false
kubeScheduler:
  enabled: false

Where the following lines should be edited:

Modify the above variables with the correct data and save “kasten_prometheus_values_smtp.yaml”:

Protheus installation

And now we install Prometheus with the following command:

helm install prometheus prometheus-community/kube-prometheus-stack -n alertas -f kasten_prometheus_values_smtp.yaml
prometheus

And to validate that it has been installed correctly, we execute:

kubectl --namespace alertas get pods -l "release=prometheus"
prometheus pods

Creation of Prometheus Rules for K10 Multi Cluster Manager

Here we will make the most important configuration, since we are using K10 Multi-Cluster Manager, we must know how to correctly identify each of the clusters that is being protected by Kasten K10Therefore, when reading the documentation, a key “Tip” appears, where it indicates that to identify the secondary clusters, the variable {cluster=”development”} must be added, where the name of the cluster is how we identify it in Multi-Cluster Manager. And for the primary cluster it should be {cluster=””}.

For this guide, I have 3 clusters configured:

  • production (primary cluster for K10 Multi-Cluster)
  • development (secondary cluster for K10 Multi-Cluster)
  • tanzu (secondary cluster for K10 Multi-Cluster)

Therefore, we will create the configuration file with the name “alertas_cluster.yaml” and copy the content:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    app: kube-prometheus-stack
    release: prometheus
  name: prometheus-kube-prometheus-kasten.rules
spec:
  groups:
  - name: kasten_alert
    rules:
    - alert: K10JobsFailsClusterProd
      expr: |-
        increase(catalog_actions_count{cluster="", status="failed"}[10m]) > 0
      for: 1m
      labels:
        severity: kasten
      annotations:
        summary: "Politicas de Kasten K10 con error hace 10 minutos"
        description: "Politica << {{ $labels.policy }} >> en cluster << produccion >> Ha fallado en los ultimos 10 minutos"
    - alert: K10JobsFailsClusterDev
      expr: |-
        increase(catalog_actions_count{cluster="desarrollo", status="failed"}[10m]) > 0
      for: 1m
      labels:
        severity: kasten
      annotations:
        summary: "Politicas de Kasten K10 con error hace 10 minutos"
        description: "Politica << {{ $labels.policy }} >> en cluster << {{ $labels.cluster }} >> Ha fallado en los ultimos 10 minutos"
    - alert: K10JobsFailsClusterTanzu
      expr: |-
        increase(catalog_actions_count{cluster="tanzu", status="failed"}[10m]) > 0
      for: 1m
      labels:
        severity: kasten
      annotations:
        summary: "Politicas de Kasten K10 con error hace 10 minutos"
        description: "Politica << {{ $labels.policy }} >> en cluster << {{ $labels.cluster }} >> Ha fallado en los ultimos 10 minutos"

The variables to edit are in each of the alerts:

  • alert: Name of the Alert
  • expr: ONLY edit the cluster name (remember the previous tip)
  • summary: Summary text without editing the variables
  • description: Descriptive text

The most important variable in the above file is "expr" which is the "query" or query to prometheus according to the metrics of Kasten K10 To detect the failure in the jobs, if you want to create new queries you can visit:

https://prometheus.io/docs/prometheus/latest/querying/basics/

And finally we will create the rules in our Prometheus instance:

kubectl apply -f alertas_cluster.yaml -n alertas
setup prometheus rules kasten

And to validate the creation of the rule:

kubectl get prometheusrules.monitoring.coreos.com -n alertas
list rules

Creation of Prometheus Rules without K10 multi-cluster

In case you have installed Kasten K10 without the use of K10 Multi-Cluster, also, it is possible to configure rules without the need to declare the name of the cluster, the only difference is the creation of the rule that must be like the following:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    app: kube-prometheus-stack
    release: prometheus
  name: prometheus-kube-prometheus-kasten.rules
spec:
  groups:
  - name: kasten_alert
    rules:
    - alert: K10JobsFail
      expr: |-
        increase(catalog_actions_count{status="failed"}[10m]) > 0
      for: 1m
      labels:
        severity: kasten
      annotations:
        summary: "Politicas de Kasten K10 con error hace 10 minutos"
        description: "Politica << {{ $labels.policy }} Ha fallado en los ultimos 10 minutos"

Testing Email Alerts

To test this configuration, we need to generate errors in the backup policies of the clusters, for that, in the existing backup policies we will eliminate the backup repositories of Kasten K10 o “Location Profiles” to force the error, where the policies will be shown like this:

kasten test fail

And finally we execute the policies that generate the error. With this we should wait for the alerts to arrive by email. If we run the following command:

kubectl port-forward service/prometheus-kube-prometheus-prometheus 9090:9090 -n alertas

We can enter the Prometheus web console to see the created rules:

prometheus rules

Running the backup policies with errors, prometheus, will detect through the rules that there are some errors:

prometheus rules

Where finally it marks it and the sending of the alerts by email is executed:

prometheus rules

Some examples of notificationsoneso alerts that arrive in the mail, where it includes the name of the backup policy and its respective cluster where the error exists:

email notification kasten

Recommendations

In some cases where instances of Prometheus already exist, it is possible, that adding the instance to monitor and federate prometheus from Kasten K10 didn't work correctly, for example in Rancher, with “cattle-monitoring” it is necessary to disable the prometheus operator, otherwise both instances will try to override with the other forcing the pods to restart.

Regarding the notificaciones or alerts of Kasten K10, it is possible to create new queries or queries to obtain other types of data, such as licenses, used space, etc.

 

add a comment

Your email address will not be published. Required fields are marked *