Node Exporter 主机监控

知识库
知识库文档
/tech-stacks/prometheus/examples/Node Exporter 主机监控.md

文档

Prometheus + Node Exporter 主机监控

目标

使用 Node Exporter 采集主机指标,Prometheus 抓取并展示,配置 CPU 告警规则。

完整配置

prometheus.yml

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: 'production'
    region: 'cn-east'

# 告警规则文件
rule_files:
  - 'alerts.yml'

# 抓取配置
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100', '192.168.1.11:9100', '192.168.1.12:9100']
        labels:
          env: 'production'
    # 抓取指标过滤(减少数据量)
    metric_relabel_configs:
      - source_labels: [__name__]
        regex: 'node_(cpu|memory|disk|network|filesystem).*'
        action: keep

alerts.yml

groups:
- name: host-alerts
  rules:
  - alert: HighCPUUsage
    expr: 100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) >; 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "主机 {{ $labels.instance }} CPU 使用率 > 80%"
      description: "当前值: {{ $value }}%,持续 5 分钟"

  - alert: HostOutOfMemory
    expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 >; 90
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "主机 {{ $labels.instance }} 内存不足"
      description: "可用内存仅 {{ $value | humanize }}%"

  - alert: DiskFull
    expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 <; 10
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "磁盘空间不足"

运行步骤

# 1. 启动 Node Exporter
docker run -d --name node-exporter -p 9100:9100 \
  -v /proc:/host/proc:ro -v /sys:/host/sys:ro \
  prom/node-exporter

# 2. 启动 Prometheus
docker run -d --name prometheus -p 9090:9090 \
  -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
  -v $(pwd)/alerts.yml:/etc/prometheus/alerts.yml \
  prom/prometheus

# 3. 访问 Prometheus
# http://localhost:9090

# 4. PromQL 查询示例
# CPU 使用率:   100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# 内存使用率:   (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
# 磁盘使用率:   (1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100

预期输出

访问 http://localhost:9090/targets,所有 Target 状态应为 UP。在 Graph 页面执行 PromQL 可看到折线图。

信息

路径
/tech-stacks/prometheus/examples/Node Exporter 主机监控.md
更新时间
2026/5/31