Prometheus Metrics Types: Understanding Gauges and Counters

In system monitoring and observability, understanding the differences between metric types is critical for building robust and insightful monitoring solutions.

Prometheus, a powerful open-source monitoring system, offers several metric types, with Gauges and Counters being two of the most fundamental and frequently used.

In this blog, we will dive deep into the characteristics, use cases, and key distinctions between Gauges and Counters, providing you with a comprehensive understanding to elevate your monitoring strategies and enhance your system's observability.

Table of Contents:

  1. What is a counter?
  2. What is a gauge?
  3. Practical implementation examples
  4. Technical differences between counter and gauge
  5. Choosing between gauge and counter
  6. Common mistakes and best practices
  7. Advanced usage: combining gauges and counters

What is a counter?

A counter is a cumulative metric that only increases over time or is reset to zero. It represents a value that continuously increments and is ideal for tracking:

  • Total number of orders processed
  • Number of login attempts
  • Total data transferred
  • Accumulated error counts

Key characteristics of counters:

  • Always monotonically increasing (i.e., it can only go up or reset, but never decrease)
  • Represents the total occurrences of an event
  • Perfect for measuring cumulative values over time

Example scenarios for counters:

  • Total number of orders processed by a system
  • Number of failed login attempts in an application
  • Total data transferred through a network interface
  • Number of errors in a system during a specific period

Prometheus query example:

orders_total{status="success"}

This query tracks the total number of orders processed successfully.

What is a gauge?

A gauge is a metric that represents a single numerical value, which can arbitrarily go up or down. It is ideal for tracking:

  • Current memory usage
  • Active database queries
  • System temperature
  • Disk space utilization

Key characteristics of Gauges:

  • Can increase or decrease dynamically
  • Represents the current state of a system at a given point in time
  • Captures instantaneous values
  • Ideal for tracking variable metrics or things that fluctuate over time

Example scenarios for Gauges:

  • Current system memory usage in bytes
  • Active database queries or transactions
  • Temperature of a machine
  • Disk space utilization

Prometheus query example:

disk_space_utilization{mountpoint="/dev/sda1"}

This query retrieves the current disk space usage for a specific mount point.

Practical implementation examples

Counter example: Total files processed

# Total orders processed counter
orders_total{status="success"} 3000
orders_total{status="error"} 10

As shown above, the counter for orders processed will only increase over time. The status label indicates whether the order processing was successful or encountered an error.

Gauge example: Current disk space utilization

# Current disk space utilization (percentage)
disk_space_utilization{mountpoint="/dev/sda1"} 75

This gauge shows the current disk space utilization, which can fluctuate as data is written to or removed from the disk.

Technical differences between counter and gauge

Counter vs Gauge
Aspect Counter Gauge
Nature Monotonic: Only increases or resets to zero. Dynamic: Can increase or decrease.
Usage with Functions Used with `rate()` or `increase()` to calculate changes over time. No rate functions needed; represents direct value.
Reset Behavior Resets to zero when the service restarts. Holds its last reported value until updated.
Purpose Tracks cumulative events (e.g., total requests served). Represents the current state (e.g., memory usage).
Examples Total errors, requests, or jobs completed. Current temperature, memory usage, or CPU load.

Choosing between gauge and counter

Use a Counter when:

  • Tracking total events or occurrences
  • Measuring cumulative occurrences over time
  • Calculating rates of change
  • Monitoring values that only increase (e.g., total number of orders processed, number of login attempts)

Use a Gauge when:

  • Tracking the current state of a system
  • Measuring fluctuating values (e.g., disk space, memory usage)
  • Monitoring instantaneous metrics that can go up and down
  • Representing system states that change over time (e.g., current temperature, active database queries)

Common mistakes and best practices

Counter Pitfalls:

  • Incorrect Decrease: A common mistake is attempting to decrease a counter (which is not possible).
  • Neglecting Functions: Not using functions like rate() or increase() for meaningful analysis can lead to misinterpretation of data.
  • Counter Resets: Forgetting that counters reset to zero after a service restart can cause misleading data points.

Gauge Pitfalls:

  • Using Gauges for Cumulative Metrics: Gauges should not be used for tracking cumulative events, which should be handled by counters.
  • Confusing Current State Representation: Gauges represent the current state, not a cumulative count. Misunderstanding this can lead to incorrect assumptions about the data.
  • Overcomplicated Tracking: Avoid using gauges to track overly complex or indirect metrics unless necessary.

Advanced usage: combining gauges and counters

Using both metric types together in your monitoring strategy allows you to capture a comprehensive view of system performance.

Practical monitoring strategy example:

# Counter: Total orders processed
orders_total{status="success"} 3000

# Gauge: Current disk space utilization
disk_space_utilization{mountpoint="/dev/sda1"} 75

This example demonstrates the use of both counters (for total orders processed) and gauges (for disk space utilization) to provide a complete picture of system activity.

Code Example: Implementing in Python with Prometheus Client

from prometheus_client import Counter, Gauge

# Counter example: Total orders processed
ORDERS_PROCESSED = Counter(
    'orders_total',
    'Total Orders Processed',
    ['status']
)

# Gauge example: Disk space utilization
DISK_SPACE_UTILIZATION = Gauge(
    'disk_space_utilization',
    'Disk Space Utilization',
    ['mountpoint']
)

# Incrementing a counter (order processing)
ORDERS_PROCESSED.labels(status='success').inc(5)

# Setting a gauge (disk space)
DISK_SPACE_UTILIZATION.labels(mountpoint='/dev/sda1').set(75)

In this example, we define a Counter for total orders processed and a Gauge for tracking disk space utilization. The counter increments as orders are processed, while the gauge reflects the current percentage of disk space used.

Conclusion:

Understanding the difference between counters and gauges is key to creating effective monitoring strategies. By choosing the right metric type, you ensure that your system monitoring delivers accurate and insightful data.

  • Counters capture the totality of an event, continuously incrementing over time.
  • Gauges reflect the instantaneous state, tracking values that fluctuate.

Together, they offer a complete view of your application's performance, health, and system behaviour.

Atatus

#1 Solution for Logs, Traces & Metrics

tick-logo APM

tick-logo Kubernetes

tick-logo Logs

tick-logo Synthetics

tick-logo RUM

tick-logo Serverless

tick-logo Security

tick-logo More

Sujitha Sakthivel

Sujitha Sakthivel

Technical Writer | Skilled in simplifying complex tech topics!😎
Chennai