首先把alertmanager-main这个service改为NodePort的server.
➜ manifests git:(master) ✗ k get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main NodePort 10.96.120.87 <none> 9093:31175/TCP 29h
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 29h
grafana NodePort 10.96.163.124 <none> 3000:31263/TCP 29h
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 29h
node-exporter ClusterIP None <none> 9100/TCP 29h
prometheus-adapter ClusterIP 10.96.121.50 <none> 443/TCP 29h
prometheus-k8s NodePort 10.96.129.132 <none> 9090:30525/TCP 29h
prometheus-operated ClusterIP None <none> 9090/TCP 29h
prometheus-operator ClusterIP None <none> 8080/TCP 29h
➜ manifests git:(master) ✗
可以使用svc“alertmanager-main”的nodeport查看Alertmanager的Status状态,获取访问prometheus地址和alertmanager地址.
➜ alertmanager git:(master) minikube service list
这些配置在manifests中的alertmanager-secret.yaml文件中
➜ manifests git:(master) ✗ pwd
/Users/steven/code/prometheus/kube-prometheus/manifests
➜ manifests git:(master) ✗ cat alertmanager-secret.yaml
apiVersion: v1
data:
alertmanager.yaml: Imdsb2JhbCI6CiAgInJlc29sdmVfdGltZW91dCI6ICI1bSIKInJlY2VpdmVycyI6Ci0gIm5hbWUiOiAibnVsbCIKInJvdXRlIjoKICAiZ3JvdXBfYnkiOgogIC0gIm5hbWVzcGFjZSIKICAiZ3JvdXBfaW50ZXJ2YWwiOiAiNW0iCiAgImdyb3VwX3dhaXQiOiAiMzBzIgogICJyZWNlaXZlciI6ICJudWxsIgogICJyZXBlYXRfaW50ZXJ2YWwiOiAiMTJoIgogICJyb3V0ZXMiOgogIC0gIm1hdGNoIjoKICAgICAgImFsZXJ0bmFtZSI6ICJXYXRjaGRvZyIKICAgICJyZWNlaXZlciI6ICJudWxsIg==
kind: Secret
metadata:
name: alertmanager-main
namespace: monitoring
type: Opaque
➜ manifests git:(master) ✗
将alertmanager-secret.yaml对应的value值用base64进行解码,发现这里面的内容和web页面查看的内容一样.
➜ manifests git:(master) ✗ echo "Imdsb2JhbCI6CiAgInJlc29sdmVfdGltZW91dCI6ICI1bSIKInJlY2VpdmVycyI6Ci0gIm5hbWUiOiAibnVsbCIKInJvdXRlIjoKICAiZ3JvdXBfYnkiOgogIC0gIm5hbWVzcGFjZSIKICAiZ3JvdXBfaW50ZXJ2YWwiOiAiNW0iCiAgImdyb3VwX3dhaXQiOiAiMzBzIgogICJyZWNlaXZlciI6ICJudWxsIgogICJyZXBlYXRfaW50ZXJ2YWwiOiAiMTJoIgogICJyb3V0ZXMiOgogIC0gIm1hdGNoIjoKICAgICAgImFsZXJ0bmFtZSI6ICJXYXRjaGRvZyIKICAgICJyZWNlaXZlciI6ICJudWxsIg==" | base64 -d
"global":
"resolve_timeout": "5m"
"receivers":
- "name": "null"
"route":
"group_by":
- "namespace"
"group_interval": "5m"
"group_wait": "30s"
"receiver": "null"
"repeat_interval": "12h"
"routes":
- "match":
"alertname": "Watchdog"
"receiver": "null"% ➜ manifests git:(master) ✗
创建yaml文件
➜ manifests git:(master) ✗ pwd
/Users/steven/code/prometheus/kube-prometheus/manifests
➜ manifests git:(master) ✗ cat alertmanager.yaml
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.qq.com:465'
smtp_from: '431054426@qq.com'
smtp_auth_username: '431054426@qq.com'
smtp_auth_password: ‘****’ # 生成的token
smtp_require_tls: false # 默认是true需要改成false
templates:
- "/etc/alertmanager/config/*.tmpl"
route:
group_by: ['job','cluster','service']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: ‘wechat‘
routes:
- receiver: ‘email’
group_wait: 10s
match:
alertname: KubeCPUOvercommit # 报警对象
receivers:
- name: ‘email’
email_configs:
- to: 'zky.linux@gmail.com'
send_resolved: true
- name: 'wechat'
wechat_configs:
- corp_id: ‘*****’ # 在企业中也可以找到
to_party: '8'
to_user: 'steven'
agent_id: '1000028'
api_secret: ‘****’ # 可以在微信新建的应用中找到
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
➜ manifests git:(master) ✗
查看wechat.tmpl文件
➜ manifests git:(master) ✗ cat wechat.tmpl
{{ define "wechat.default.message" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{- range $index, $alert := .Alerts -}}
{{- if eq $index 0 -}}
告警类型: {{ $alert.Labels.alertname }}
告警级别: {{ $alert.Labels.severity }}
=====================
{{- end }}
===告警详情===
告警详情: {{ $alert.Annotations.message }}
故障时间: {{ $alert.StartsAt.Format "2006-01-02 15:04:05" }}
===参考信息===
{{ if gt (len $alert.Labels.instance) 0 -}}故障实例ip: {{ $alert.Labels.instance }};{{- end -}}
{{- if gt (len $alert.Labels.namespace) 0 -}}故障实例所在namespace: {{ $alert.Labels.namespace }};{{- end -}}
{{- if gt (len $alert.Labels.node) 0 -}}故障物理机ip: {{ $alert.Labels.node }};{{- end -}}
{{- if gt (len $alert.Labels.pod_name) 0 -}}故障pod名称: {{ $alert.Labels.pod_name }}{{- end }}
=====================
{{- end }}
{{- end }}
{{- if gt (len .Alerts.Resolved) 0 -}}
{{- range $index, $alert := .Alerts -}}
{{- if eq $index 0 -}}
告警类型: {{ $alert.Labels.alertname }}
告警级别: {{ $alert.Labels.severity }}
=====================
{{- end }}
===告警详情===
告警详情: {{ $alert.Annotations.message }}
故障时间: {{ $alert.StartsAt.Format "2006-01-02 15:04:05" }}
恢复时间: {{ $alert.EndsAt.Format "2006-01-02 15:04:05" }}
===参考信息===
{{ if gt (len $alert.Labels.instance) 0 -}}故障实例ip: {{ $alert.Labels.instance }};{{- end -}}
{{- if gt (len $alert.Labels.namespace) 0 -}}故障实例所在namespace: {{ $alert.Labels.namespace }};{{- end -}}
{{- if gt (len $alert.Labels.node) 0 -}}故障物理机ip: {{ $alert.Labels.node }};{{- end -}}
{{- if gt (len $alert.Labels.pod_name) 0 -}}故障pod名称: {{ $alert.Labels.pod_name }};{{- end }}
=====================
{{- end }}
{{- end }}
{{- end }}
➜ manifests git:(master) ✗
删除原来的Secret对象,用我们刚才创建的文件来创建Secret
➜ manifests git:(master) ✗ k get secrets -n monitoring | grep aler
alertmanager-main Opaque 2 18h
alertmanager-main-token-lqf7k kubernetes.io/service-account-token 3 2d3h
➜ manifests git:(master) ✗
➜ manifests git:(master) ✗ k delete secrets alertmanager-main -n monitoring
secret "alertmanager-main" deleted
➜ manifests git:(master) ✗ k create secret generic alertmanager-main --from-file=./alertmanager.yaml --from-file=./wechat.tmpl -n monitoring
secret/alertmanager-main created
➜ manifests git:(master) ✗
验证
➜ manifests git:(master) ✗ k delete secrets alertmanager-main -n monitoring
secret "alertmanager-main" deleted
➜ manifests git:(master) ✗ k create secret generic alertmanager-main --from-file=./alertmanager.yaml --from-file=./wechat.tmpl -n monitoring
secret/alertmanager-main created
➜ manifests git:(master) ✗ k get secrets -n monitoring | grep aler
alertmanager-main Opaque 2 6s
alertmanager-main-token-lqf7k kubernetes.io/service-account-token 3 2d3h
➜ manifests git:(master) ✗
验证是否收到邮件,需要注意的是,小心别当成垃圾邮件.
alertmanager的config文件已变更
可以登录pods查看到我们创建的2个文件
➜ manifests git:(master) ✗ k exec -it alertmanager-main-0 /bin/sh -n monitoring
Defaulting container name to alertmanager.
Use 'kubectl describe pod/alertmanager-main-0 -n monitoring' to see all of the containers in this pod.
/alertmanager $ ls /etc/alertmanager/config/
alertmanager.yaml wechat.tmpl
/alertmanager $
See also