Key Monitoring Methodologies
RED Method — Rate, Errors, Duration
Designed for request-driven services such as APIs and microservices. Focuses on three essential metrics:
Rate — Measures how many requests your service receives (e.g., requests per second). It shows overall demand.
Errors — Tracks the percentage or count of failed requests. It indicates reliability problems.
Duration — Measures how long requests take to complete. It reflects performance and user experience.
If rate is stable, errors are low, and duration is consistent, the service is likely healthy.
USE Method — Utilization, Saturation, Errors
Designed for monitoring infrastructure and system resources. Focuses on each individual resource (CPU, memory, disk, network):
Utilization — How busy the resource is (e.g., CPU usage percentage).
Saturation — Whether the resource is overloaded or queuing work (e.g., disk queue length).
Errors — Whether the resource is failing (e.g., network errors, disk read failures).
USE helps identify infrastructure bottlenecks before they impact services.
Four Golden Signals — Latency, Traffic, Errors, Saturation
Introduced by Google's Site Reliability Engineering (SRE) practices. Provides a high-level system view:
Latency — How long requests take
Traffic — How much demand the system handles
Errors — The rate of failed requests
Saturation — How close the system is to its limits
It combines ideas from RED (service perspective) and USE (resource perspective) into a unified monitoring model.
Why These Methodologies Matter
They provide a structured way to monitor systems without being overwhelmed by hundreds of metrics. Instead of measuring everything, they help you focus on the signals that most directly indicate system health.