Insights & updates from our experts
[Kubernetes tip] Multi-Cluster Configurations with Prometheus

This tip is for those who are using Prometheus federation to monitor multiple clusters.
How should alertmanager be configured for multiple clusters? Let us say that if there's an issue for Cluster A it only needs to send an alert for cluster A?
alerting_rules.yml:
groups:
- name: Instances
rules:
- alert: TEST ALERT FROM PROMETHEUS PLEASE ACKNOWLEDGE
expr: prometheus_build_info{instance="localhost:9090"} == 1
for: 10s
labels:
severity: page
annotations:
description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.'
summary: 'Instance {{ $labels.instance }} down'
action: TESTING PLEASE ACKNOWLEDGE, NO FURTHER ACTION REQUIRED ONLY A TEST

In such cases, every alert should be routed to proper team based on labels (if there is problem with application A on cluster B - team responsible should be notified). In the above case, two alerts are triggered by the same rule. You'll have to deduplicate them. Now, if you don't wish to be alerted on each trigger of very smiliar alertsyou can treat them as a group.
If you know some app in node A have disk issues, and all other apps on that node have the same issue (the same cause) you might not want to recieve 10 alerts, but you'd rather just want to be informed of one if the conditions are met(like they were triggered by similar rules/in similar place and withing given time interval).
Do read up on the AlertManager docs for more infomation on alert grouping.
Looking for an end-to-end incident alerting, on-call scheduling and response orchestration platform?
Sign up for a 14-day free trial of Xurrent IMR. No CC required. Implement modern incident response and SRE best practices within your production operations and provide industry-leading SLAs to your customers

How Long Should ITSM Implementation Really Take in 2026?
Most vendors will tell you ITSM implementation takes six months to a year — but modern, configuration-first platforms have rewritten the math entirely. See what real implementations look like in 2026, and why a long rollout is now a choice, not a given.





.webp)
%20(1).webp)


.jpg)













