Scrape Configuration

Prometheus collects metrics by scraping HTTP endpoints exposed by targets. Understanding scrape configuration is a core part of the PCA exam.

How Scraping Works

  1. Prometheus sends an HTTP GET request to the target's metrics endpoint (default /metrics)
  2. The target responds with metrics in Prometheus exposition format
  3. Prometheus parses and stores the metrics with a timestamp
  4. This repeats at the configured scrape_interval

Basic Scrape Configuration

The scrape_configs section in prometheus.yml defines which targets to scrape:

global:
  scrape_interval: 15s     # Default interval for all jobs
  evaluation_interval: 15s # How often to evaluate rules

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

Key Parameters

Parameter Default Description
scrape_interval 1m (global) How often to scrape targets
scrape_timeout 10s Timeout for each scrape request
metrics_path /metrics HTTP path to scrape
scheme http HTTP or HTTPS
honor_labels false If true, keep labels from the target
honor_timestamps true Use timestamps from the target

Static Configuration

The simplest way to define targets:

scrape_configs:
  - job_name: "node-exporters"
    scrape_interval: 30s
    static_configs:
      - targets:
          - "node1:9100"
          - "node2:9100"
          - "node3:9100"
        labels:
          env: "production"
          dc: "us-east-1"

Service Discovery

For dynamic environments, Prometheus supports many service discovery mechanisms:

File-Based Service Discovery

Watch a JSON or YAML file for target changes:

scrape_configs:
  - job_name: "file-sd"
    file_sd_configs:
      - files:
          - "targets/*.json"
        refresh_interval: 5m

Target file format (targets/app.json):

[
  {
    "targets": ["app1:8080", "app2:8080"],
    "labels": {
      "env": "production",
      "app": "myservice"
    }
  }
]

Kubernetes Service Discovery

Discover pods, services, nodes, and endpoints in Kubernetes:

scrape_configs:
  - job_name: "kubernetes-pods"
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: (.+)

Other SD Mechanisms

  • consul_sd_configs — Consul service discovery
  • dns_sd_configs — DNS-based discovery (SRV records)
  • ec2_sd_configs — AWS EC2 instances
  • azure_sd_configs — Azure VMs
  • gce_sd_configs — Google Compute Engine

Relabeling

Relabeling allows you to modify labels before scraping (relabel_configs) or before storage (metric_relabel_configs).

Common Relabel Actions

Action Description
keep Keep targets matching regex
drop Drop targets matching regex
replace Set target label to replacement
labelmap Map matching labels to new names
labeldrop Remove matching labels
labelkeep Keep only matching labels

Example: Filter by Annotation

relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: "true"

Example: Replace Port

relabel_configs:
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2

Key Exam Tips

  1. scrape_interval: The global default is 1 minute. Job-level settings override the global default.
  2. honor_labels: When false (default), Prometheus renames conflicting labels with exported_ prefix.
  3. Up metric: Prometheus automatically adds up{job="...", instance="..."} metric (1 = healthy, 0 = failed scrape).
  4. Meta labels: Service discovery provides __meta_* labels that are available during relabeling but not stored.
  5. __address__: The special label that determines the target's host and port.