Prometheus + Node Exporter 主机监控
目标
使用 Node Exporter 采集主机指标,Prometheus 抓取并展示,配置 CPU 告警规则。
完整配置
prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: 'production'
region: 'cn-east'
rule_files:
- 'alerts.yml'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: ['localhost:9100', '192.168.1.11:9100', '192.168.1.12:9100']
labels:
env: 'production'
metric_relabel_configs:
- source_labels: [__name__]
regex: 'node_(cpu|memory|disk|network|filesystem).*'
action: keep
alerts.yml
groups:
- name: host-alerts
rules:
- alert: HighCPUUsage
expr: 100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) >
for: 5m
labels:
severity: warning
annotations:
summary: "主机 {{ $labels.instance }} CPU 使用率 > 80%"
description: "当前值: {{ $value }}%,持续 5 分钟"
- alert: HostOutOfMemory
expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 >
for: 1m
labels:
severity: critical
annotations:
summary: "主机 {{ $labels.instance }} 内存不足"
description: "可用内存仅 {{ $value | humanize }}%"
- alert: DiskFull
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 <
for: 5m
labels:
severity: critical
annotations:
summary: "磁盘空间不足"
运行步骤
docker run -d --name node-exporter -p 9100:9100 \
-v /proc:/host/proc:ro -v /sys:/host/sys:ro \
prom/node-exporter
docker run -d --name prometheus -p 9090:9090 \
-v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
-v $(pwd)/alerts.yml:/etc/prometheus/alerts.yml \
prom/prometheus
预期输出
访问 http://localhost:9090/targets,所有 Target 状态应为 UP。在 Graph 页面执行 PromQL 可看到折线图。