Prometheus Metrics Types: Understanding Gauges and Counters
In system monitoring and observability, understanding the differences between metric types is critical for building robust and insightful monitoring solutions.
Prometheus, a powerful open-source monitoring system, offers several metric types, with Gauges and Counters being two of the most fundamental and frequently used.
In this blog, we will dive deep into the characteristics, use cases, and key distinctions between Gauges and Counters, providing you with a comprehensive understanding to elevate your monitoring strategies and enhance your system's observability.
Table of Contents:
- What is a counter?
- What is a gauge?
- Practical implementation examples
- Technical differences between counter and gauge
- Choosing between gauge and counter
- Common mistakes and best practices
- Advanced usage: combining gauges and counters
What is a counter?
A counter is a cumulative metric that only increases over time or is reset to zero. It represents a value that continuously increments and is ideal for tracking:
- Total number of orders processed
- Number of login attempts
- Total data transferred
- Accumulated error counts
Key characteristics of counters:
- Always monotonically increasing (i.e., it can only go up or reset, but never decrease)
- Represents the total occurrences of an event
- Perfect for measuring cumulative values over time
Example scenarios for counters:
- Total number of orders processed by a system
- Number of failed login attempts in an application
- Total data transferred through a network interface
- Number of errors in a system during a specific period
Prometheus query example:
orders_total{status="success"}
This query tracks the total number of orders processed successfully.
What is a gauge?
A gauge is a metric that represents a single numerical value, which can arbitrarily go up or down. It is ideal for tracking:
- Current memory usage
- Active database queries
- System temperature
- Disk space utilization
Key characteristics of Gauges:
- Can increase or decrease dynamically
- Represents the current state of a system at a given point in time
- Captures instantaneous values
- Ideal for tracking variable metrics or things that fluctuate over time
Example scenarios for Gauges:
- Current system memory usage in bytes
- Active database queries or transactions
- Temperature of a machine
- Disk space utilization
Prometheus query example:
disk_space_utilization{mountpoint="/dev/sda1"}
This query retrieves the current disk space usage for a specific mount point.
Practical implementation examples
Counter example: Total files processed
# Total orders processed counter
orders_total{status="success"} 3000
orders_total{status="error"} 10
As shown above, the counter for orders processed will only increase over time. The status label indicates whether the order processing was successful or encountered an error.
Gauge example: Current disk space utilization
# Current disk space utilization (percentage)
disk_space_utilization{mountpoint="/dev/sda1"} 75
This gauge shows the current disk space utilization, which can fluctuate as data is written to or removed from the disk.
Technical differences between counter and gauge
Aspect | Counter | Gauge |
---|---|---|
Nature | Monotonic: Only increases or resets to zero. | Dynamic: Can increase or decrease. |
Usage with Functions | Used with `rate()` or `increase()` to calculate changes over time. | No rate functions needed; represents direct value. |
Reset Behavior | Resets to zero when the service restarts. | Holds its last reported value until updated. |
Purpose | Tracks cumulative events (e.g., total requests served). | Represents the current state (e.g., memory usage). |
Examples | Total errors, requests, or jobs completed. | Current temperature, memory usage, or CPU load. |
Choosing between gauge and counter
Use a Counter when:
- Tracking total events or occurrences
- Measuring cumulative occurrences over time
- Calculating rates of change
- Monitoring values that only increase (e.g., total number of orders processed, number of login attempts)
Use a Gauge when:
- Tracking the current state of a system
- Measuring fluctuating values (e.g., disk space, memory usage)
- Monitoring instantaneous metrics that can go up and down
- Representing system states that change over time (e.g., current temperature, active database queries)
Common mistakes and best practices
Counter Pitfalls:
- Incorrect Decrease: A common mistake is attempting to decrease a counter (which is not possible).
- Neglecting Functions: Not using functions like
rate()
orincrease()
for meaningful analysis can lead to misinterpretation of data. - Counter Resets: Forgetting that counters reset to zero after a service restart can cause misleading data points.
Gauge Pitfalls:
- Using Gauges for Cumulative Metrics: Gauges should not be used for tracking cumulative events, which should be handled by counters.
- Confusing Current State Representation: Gauges represent the current state, not a cumulative count. Misunderstanding this can lead to incorrect assumptions about the data.
- Overcomplicated Tracking: Avoid using gauges to track overly complex or indirect metrics unless necessary.
Advanced usage: combining gauges and counters
Using both metric types together in your monitoring strategy allows you to capture a comprehensive view of system performance.
Practical monitoring strategy example:
# Counter: Total orders processed
orders_total{status="success"} 3000
# Gauge: Current disk space utilization
disk_space_utilization{mountpoint="/dev/sda1"} 75
This example demonstrates the use of both counters (for total orders processed) and gauges (for disk space utilization) to provide a complete picture of system activity.
Code Example: Implementing in Python with Prometheus Client
from prometheus_client import Counter, Gauge
# Counter example: Total orders processed
ORDERS_PROCESSED = Counter(
'orders_total',
'Total Orders Processed',
['status']
)
# Gauge example: Disk space utilization
DISK_SPACE_UTILIZATION = Gauge(
'disk_space_utilization',
'Disk Space Utilization',
['mountpoint']
)
# Incrementing a counter (order processing)
ORDERS_PROCESSED.labels(status='success').inc(5)
# Setting a gauge (disk space)
DISK_SPACE_UTILIZATION.labels(mountpoint='/dev/sda1').set(75)
In this example, we define a Counter for total orders processed and a Gauge for tracking disk space utilization. The counter increments as orders are processed, while the gauge reflects the current percentage of disk space used.
Conclusion:
Understanding the difference between counters and gauges is key to creating effective monitoring strategies. By choosing the right metric type, you ensure that your system monitoring delivers accurate and insightful data.
- Counters capture the totality of an event, continuously incrementing over time.
- Gauges reflect the instantaneous state, tracking values that fluctuate.
Together, they offer a complete view of your application's performance, health, and system behaviour.
#1 Solution for Logs, Traces & Metrics
APM
Kubernetes
Logs
Synthetics
RUM
Serverless
Security
More