Data protection, Kubernetes, cybersecurity and AI. Hands-on guides from the trenches: Veeam, Kasten, VMware, Oracle, cloud, and whatever I’m breaking in the homelab this week.
Table of ContentsTable of Contents
In this guide we will review the installation and configuration of prometheus in order to obtain alerts via email using the federation, rules and alertmanager of prometheus in conjunction with the monitoring of Kasten K10.
We will carry out the configuration in a first part with the integration of K10 Multi-Cluster Manager and then we will see how to configure the rules when it is an installation without centralized administration.
Prometheus is a solution focused on monitoring the resources of kubernetes based on time series metrics, that is, real-time monitoring of the actions that have been configured. For example, with Prometheus you could monitor, via rules, the use of CPU, Memory, Connections, sessions or whatever you want to configure.
It is also the standard solution for monitoring clusters of kubernetes since it allows us to have a very detailed view of the resources to be monitored as well as helping us to solve any errors that exist.
Kasten also uses Prometheus for its internal monitoring of K10. In fact, in the previous link, we can see that there are many metrics that Kasten k10 export to Prometheus as for example:
catalog
jobs
actions
backup
restore
export
import
report
run
So, how can we configure Prometheus to send us an email when, for example, some backup policy does not work correctly?
As we know, prometheus already comes pre-installed on Kasten K10, but for this instance it is not recommended that it be modified, since it is managed by helm and has certain configurations that work directly with the default reports of Kasten K10 and also for K10 Multi-Cluster Manager if you have it activated. Therefore the idea is to federate Prometheus that comes pre-installed (so as not to modify it) with a new instance of Prometheus:
Now we will proceed to create a namespace for our monitoring instance, in this case we will call it alerts, for this we will execute the following in our cluster of kubernetes:
kubectl create ns alertas
```bash
And now we will add the repository for Prometheus helm:
```bash
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
```bash
And we will create a new file named “kasten\_prometheus\_values\_smtp.yaml”:
```bash
defaultRules:
create: falsealertmanager:
config:
global:
resolve_timeout: 5m
route:
repeat_interval: 30m
receiver: 'email' routes:
- receiver: 'email' match:
severity: kasten
receivers:
- name: 'email' email_configs:
- to: [email protected] from: [email protected] smarthost: smtp.24xsiempre.com:25
auth_username: SuperDuperUserName
auth_password: SuperDuperPassword
prometheus:
prometheusSpec:
additionalScrapeConfigs:
- job_name: k10
scrape_interval: 15s
honor_labels: true scheme: http
metrics_path: '/k10/prometheus/federate' params:
'match[]':
- '{__name__=~"jobs.*"}' - '{__name__=~"catalog.*"}' static_configs:
- targets:
- 'prometheus-server.kasten-io.svc.cluster.local' labels:
app: "k10"#Valores para deshabilitar componentes que no son necesariosgrafana:
enabled: falsekubeApiServer:
enabled: falsekubelet:
enabled: falsekubeStateMetrics:
enabled: falsekubeControllerManager:
enabled: falsekubeEtcd:
enabled: falsekubeProxy:
enabled: falsecoreDns:
enabled: falsekubeScheduler:
enabled: false```bash
Where the following lines should be edited:
- **to**: [email protected]- **from**: [email protected]- **smart host**: smtp.24xsiempre.com:25
- **auth\_username**: SuperDuperUserName
- **auth\_password**: SuperDuperPassword
Modify the above variables with the correct data and save “kasten\_prometheus\_values\_smtp.yaml”:
And now we install Prometheus with the following command:
```bash
helm install prometheus prometheus-community/kube-prometheus-stack -n alertas -f kasten_prometheus_values_smtp.yaml
```bash
And to validate that it has been installed correctly, we execute:
```bash
kubectl --namespace alertas get pods -l "release=prometheus"```bash
## Creation of Prometheus Rules for K10 Multi Cluster ManagerHere we will make the most important configuration, since we are using K10 Multi-Cluster Manager, we must know how to correctly identify each of the clusters that is being protected by Kasten K10Therefore, when reading the documentation, a key “Tip” appears, where it indicates that to identify the secondary clusters, the variable {cluster=”development”} must be added, where the name of the cluster is how we identify it in Multi-Cluster Manager. And for the primary cluster it should be {cluster=””}.
For this guide, I have 3 clusters configured:
- production (primary cluster for K10 Multi-Cluster)- development (secondary cluster for K10 Multi-Cluster)- tanzu (secondary cluster for K10 Multi-Cluster)Therefore, we will create the configuration file with the name “alertas\_cluster.yaml” and copy the content:
```bash
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
app: kube-prometheus-stack
release: prometheus
name: prometheus-kube-prometheus-kasten.rules
spec:
groups:
- name: kasten_alert
rules:
- alert: K10JobsFailsClusterProd
expr: |-
increase(catalog_actions_count{cluster="", status="failed"}[10m]) > 0for: 1m
labels:
severity: kasten
annotations:
summary: "Politicas de Kasten K10 con error hace 10 minutos" description: "Politica << {{ $labels.policy }} >> en cluster << produccion >> Ha fallado en los ultimos 10 minutos" - alert: K10JobsFailsClusterDev
expr: |-
increase(catalog_actions_count{cluster="desarrollo", status="failed"}[10m]) > 0for: 1m
labels:
severity: kasten
annotations:
summary: "Politicas de Kasten K10 con error hace 10 minutos" description: "Politica << {{ $labels.policy }} >> en cluster << {{ $labels.cluster }} >> Ha fallado en los ultimos 10 minutos" - alert: K10JobsFailsClusterTanzu
expr: |-
increase(catalog_actions_count{cluster="tanzu", status="failed"}[10m]) > 0for: 1m
labels:
severity: kasten
annotations:
summary: "Politicas de Kasten K10 con error hace 10 minutos" description: "Politica << {{ $labels.policy }} >> en cluster << {{ $labels.cluster }} >> Ha fallado en los ultimos 10 minutos"```bash
The variables to edit are in each of the alerts:
- alert: Name of the Alert
- expr: ONLY edit the cluster name (remember the previous tip)- summary: Summary text without editing the variables
- description: Descriptive text
The most important variable in the above file is "expr" which is the "query" or query to prometheus according to the metrics of Kasten K10 To detect the failure in the jobs, if you want to create new queries you can visit:
https://prometheus.io/docs/prometheus/latest/querying/basics
And finally we will create the rules in our Prometheus instance:
```bash
kubectl apply -f alertas_cluster.yaml -n alertas
```bash
And to validate the creation of the rule:
```bash
kubectl get prometheusrules.monitoring.coreos.com -n alertas
```bash
## Creation of Prometheus Rules without K10 multi-clusterIn case you have installed Kasten K10 without the use of K10 Multi-Cluster, also, it is possible to configure rules without the need to declare the name of the cluster, the only difference is the creation of the rule that must be like the following:
```bash
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
app: kube-prometheus-stack
release: prometheus
name: prometheus-kube-prometheus-kasten.rules
spec:
groups:
- name: kasten_alert
rules:
- alert: K10JobsFail
expr: |-
increase(catalog_actions_count{status="failed"}[10m]) > 0for: 1m
labels:
severity: kasten
annotations:
summary: "Politicas de Kasten K10 con error hace 10 minutos" description: "Politica << {{ $labels.policy }} Ha fallado en los ultimos 10 minutos"```bash
## Testing Email AlertsTo test this configuration, we need to generate errors in the backup policies of the clusters, for that, in the existing backup policies we will eliminate the backup repositories of Kasten K10 o “Location Profiles” to force the error, where the policies will be shown like this:
And finally we execute the policies that generate the error. With this we should waitfor the alerts to arrive by email. If we run the following command:
```bash
kubectl port-forward service/prometheus-kube-prometheus-prometheus 9090:9090 -n alertas
We can enter the Prometheus web console to see the created rules:
Running the backup policies with errors, prometheus, will detect through the rules that there are some errors:
Where finally it marks it and the sending of the alerts by email is executed:
Some examples of notifications or alerts that arrive by email, which include the name of the backup policy and its respective cluster where the error exists:
In some cases where instances of Prometheus already exist, it is possible, that adding the instance to monitor and federate prometheus from Kasten K10 didn’t work correctly, for example in Rancher, with “cattle-monitoring” it is necessary to disable the prometheus operator, otherwise both instances will try to override with the other forcing the pods to restart.
Regarding notifications or alerts Kasten K10, it is possible to create new queries or queries to obtain other types of data, such as licenses, used space, etc.