Scrape Configuration

Prometheus collects metrics by scraping HTTP endpoints exposed by targets. Understanding scrape configuration is a core part of the PCA exam.

How Scraping Works

Prometheus sends an HTTP GET request to the target's metrics endpoint (default /metrics)
The target responds with metrics in Prometheus exposition format
Prometheus parses and stores the metrics with a timestamp
This repeats at the configured scrape_interval

Basic Scrape Configuration

The scrape_configs section in prometheus.yml defines which targets to scrape:

global:
  scrape_interval: 15s     # Default interval for all jobs
  evaluation_interval: 15s # How often to evaluate rules

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

Key Parameters

Parameter	Default	Description
`scrape_interval`	1m (global)	How often to scrape targets
`scrape_timeout`	10s	Timeout for each scrape request
`metrics_path`	`/metrics`	HTTP path to scrape
`scheme`	`http`	HTTP or HTTPS
`honor_labels`	`false`	If true, keep labels from the target
`honor_timestamps`	`true`	Use timestamps from the target

Static Configuration

The simplest way to define targets:

scrape_configs:
  - job_name: "node-exporters"
    scrape_interval: 30s
    static_configs:
      - targets:
          - "node1:9100"
          - "node2:9100"
          - "node3:9100"
        labels:
          env: "production"
          dc: "us-east-1"

Service Discovery

For dynamic environments, Prometheus supports many service discovery mechanisms:

File-Based Service Discovery

Watch a JSON or YAML file for target changes:

scrape_configs:
  - job_name: "file-sd"
    file_sd_configs:
      - files:
          - "targets/*.json"
        refresh_interval: 5m

Target file format (targets/app.json):

[
  {
    "targets": ["app1:8080", "app2:8080"],
    "labels": {
      "env": "production",
      "app": "myservice"
    }
  }
]

Kubernetes Service Discovery

Discover pods, services, nodes, and endpoints in Kubernetes:

scrape_configs:
  - job_name: "kubernetes-pods"
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: (.+)

Other SD Mechanisms

consul_sd_configs — Consul service discovery
dns_sd_configs — DNS-based discovery (SRV records)
ec2_sd_configs — AWS EC2 instances
azure_sd_configs — Azure VMs
gce_sd_configs — Google Compute Engine

Relabeling

Relabeling allows you to modify labels before scraping (relabel_configs) or before storage (metric_relabel_configs).

Common Relabel Actions

Action	Description
`keep`	Keep targets matching regex
`drop`	Drop targets matching regex
`replace`	Set target label to replacement
`labelmap`	Map matching labels to new names
`labeldrop`	Remove matching labels
`labelkeep`	Keep only matching labels

Example: Filter by Annotation

relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: "true"

Example: Replace Port

relabel_configs:
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2

Key Exam Tips

scrape_interval: The global default is 1 minute. Job-level settings override the global default.
honor_labels: When false (default), Prometheus renames conflicting labels with exported_ prefix.
Up metric: Prometheus automatically adds up{job="...", instance="..."} metric (1 = healthy, 0 = failed scrape).
Meta labels: Service discovery provides __meta_* labels that are available during relabeling but not stored.
__address__: The special label that determines the target's host and port.