Choosing Between Counter, Gauge, Histogram, and Summary

Q: When should I use a Counter instead of a Gauge?

Use a Counter for cumulative event totals that only increase, such as requests served or errors raised, and always query them with rate(). Use a Gauge for values that move up and down, such as queue depth or memory in use.

Q: Is a Histogram always better than a Summary?

For fleet-wide latency SLOs, yes, because histogram buckets aggregate across replicas and let you compute quantiles at query time. A Summary is only preferable when you need a precise quantile from a single process and will never aggregate it.

Q: Can I change histogram buckets after deployment?

You can change the bucket definition in code and redeploy, but historical samples keep their original buckets. Quantiles you derive at query time will only be as precise as the buckets that were active when the data was recorded.

Q: What metric type fits a cache hit ratio?

Use two Counters, one for hits and one for total lookups, then divide their rates at query time. Computing the ratio as a single Gauge in the application loses the ability to window and aggregate it correctly.

Picking the wrong Prometheus metric type produces graphs that lie: counters read as gauges drop to zero on every deploy, and summaries chosen for fleet SLOs return per-instance nonsense. This guide is a decision walkthrough for selecting among Counter, Gauge, Histogram, and Summary in Python, with a comparison table and a concrete example for each. It is part of the Python Metrics and Instrumentation guide and supports the metric types and cardinality control guide; for keeping the label dimensions of these instruments bounded, see controlling label cardinality in Prometheus.

Decision tree from the nature of the measurement to the correct metric type.

Prerequisites

pip install "prometheus-client>=0.20.0,<1.0.0"

Python 3.10+ and a Prometheus server (>=2.50,<3.0) scraping the exposition endpoint are assumed. No special environment variables are required for these examples beyond an open exposition port via start_http_server.

The Four Types in One Sentence Each

A Counter is a value that only goes up and resets to zero on process restart; it answers how many events have occurred and is meaningful only as a rate. A Gauge is a value that can move in either direction and represents a snapshot of current state. A Histogram observes values into fixed cumulative buckets and exports the raw bucket counts so the server can derive quantiles at query time. A Summary computes a fixed set of quantiles inside the client process and exports them directly. The two pairs split along a clean line: counters and gauges describe a single number, while histograms and summaries describe a distribution of many numbers.

Implementation

The choice reduces to three questions answered in order, mirroring the decision tree above.

Step 1 — Does the value only ever increase, resetting only on restart? If so, it is a Counter. Counts of events — requests, errors, bytes processed, retries — are the canonical case. Never read a counter's raw value in a query; wrap it in rate() so process restarts are handled as resets rather than data loss.

from prometheus_client import Counter

ERRORS = Counter("app_errors_total", "Errors raised", labelnames=("kind",))
ERRORS.labels(kind="timeout").inc()      # monotonic; rate() in queries

A common trap at this step is reaching for a Gauge because the number "goes up over time and I want the total." If the underlying events are discrete and you care about the rate or the increase over a window, it is still a Counter even though the displayed total grows. The test is not whether the displayed number increases but whether the instrument can ever decrease for a reason other than a restart. Error totals, requests served, and bytes written never legitimately decrease, so they are counters regardless of how you plan to visualise them.

Step 2 — Is it a current value that moves both up and down? That is a Gauge: queue depth, in-flight requests, connection pool size, temperature, memory in use. Use set, inc, and dec. For an expensive-to-poll value, register an observable callback instead of setting it on a hot path.

from prometheus_client import Gauge

QUEUE_DEPTH = Gauge("task_queue_depth", "Pending tasks")
QUEUE_DEPTH.set(get_queue_length())      # snapshot; can rise and fall

Gauges carry a subtlety in aggregation: because a gauge is a point-in-time value, summing it across replicas is only meaningful when the quantity is genuinely additive, such as total memory used by a fleet. Averaging or taking the maximum is correct for quantities like utilisation percentages. Choosing the wrong aggregation function over a gauge produces a number that is technically valid PromQL but semantically meaningless, so decide at design time how the gauge will be combined across instances.

Step 3 — Is it a distribution of observed values you want quantiles or rates of? Then it is a Histogram or a Summary, and the deciding sub-question is aggregation. If you need quantiles across many replicas — almost every latency SLO — choose a Histogram, because its buckets sum across instances and histogram_quantile() runs on the merged result. Choose a Summary only for a precise quantile from one process that you will never aggregate.

from prometheus_client import Histogram

LATENCY = Histogram(
    "request_duration_seconds", "Request latency",
    buckets=(0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5),  # straddle the SLO
)
with LATENCY.time():
    serve_request()

from prometheus_client import Summary

# Only for single-instance quantiles you will not aggregate.
GC_PAUSE = Summary("gc_pause_seconds", "GC pause duration")
with GC_PAUSE.time():
    run_gc_cycle()

A worked counterexample: for a cache hit ratio, resist a single Gauge holding the percentage. Expose two Counters — hits and total lookups — and divide their rates at query time (rate(cache_hits_total[5m]) / rate(cache_lookups_total[5m])). This preserves correct windowing and aggregation that a precomputed gauge throws away.

The histogram-versus-summary decision deserves its own scrutiny because it is the most consequential and the most often gotten wrong. A histogram records nothing more than a count per bucket, so the work it does at observe time is a cheap increment. All the statistical effort — interpolating a quantile — happens later on the server, over whatever set of series the query selects, which is exactly why buckets can be summed across replicas before histogram_quantile() runs. A summary instead runs a streaming quantile estimator inside the client for every observation, producing a precomputed value with a tight per-quantile error bound but tied irrevocably to one process. Those precomputed quantiles cannot be averaged or summed: the p99 of two processes is not the average of their individual p99s, and PromQL has no way to recover the true combined quantile from the published ones. For any objective measured across more than one replica, that property alone forces the histogram. The summary earns its place only for a single long-lived process — a batch job, a daemon — where a precise quantile of that one process is the actual question and aggregation will never be needed.

A practical note on histogram evolution: bucket boundaries are part of the data, not just the query. If you redeploy with new boundaries, samples recorded under the old boundaries keep them, and a quantile spanning the boundary change blends two resolutions. Plan bucket layouts to outlast the dashboards that consume them, and place a boundary on every threshold you alert against so the interpolation has resolution exactly where it matters.

Configuration Options

Type	Direction	Query pattern	Aggregates across replicas	Typical use
Counter	Monotonic up	`rate()`, `increase()`	Yes (sum of rates)	Requests, errors, bytes
Gauge	Up and down	raw value, `avg`, `max`	Yes (sum/avg)	Queue depth, in-flight, pool size
Histogram	Distribution	`histogram_quantile()` over buckets	Yes (sum buckets)	Latency and size SLOs
Summary	Distribution	read quantile series directly	No	Single-instance quantiles

Verification

Scrape the exposition endpoint and confirm each instrument renders with its declared # TYPE line and the expected series shape: counters and gauges as a single sample, histograms as _bucket/_sum/_count, summaries as quantile series plus _sum/_count.

Expected Output: the four types appear distinctly in the exposition text.

# TYPE app_errors_total counter
app_errors_total{kind="timeout"} 1.0
# TYPE task_queue_depth gauge
task_queue_depth 7.0
# TYPE request_duration_seconds histogram
request_duration_seconds_bucket{le="0.25"} 1.0
request_duration_seconds_sum 0.031
request_duration_seconds_count 1.0
# TYPE gc_pause_seconds summary
gc_pause_seconds_sum 0.004
gc_pause_seconds_count 1.0

Common Mistakes

Reading a Counter without rate() Symptom: graphs sawtooth to zero on every deployment. Root cause: querying the raw cumulative value, which resets when the process restarts. Remediation: wrap counters in rate() or increase(), which interpret resets correctly.

Choosing a Summary for a fleet-wide quantile Symptom: a p99 panel spanning replicas shows per-instance or nonsensical values. Root cause: summary quantiles are computed in each process and cannot be merged. Remediation: use a Histogram and compute the quantile from summed buckets with histogram_quantile().

Storing a ratio or percentage as one Gauge Symptom: you cannot recompute the ratio over a different time window or aggregate it across instances. Root cause: the division happened in the application, discarding the numerator and denominator. Remediation: expose the two underlying Counters and divide their rates in the query.

Using a Gauge for something that only increases Symptom: deploys leave gaps and the rate is hard to compute. Root cause: a monotonic quantity was modelled as a Gauge with manual inc(), so it loses the counter's reset semantics and PromQL cannot detect restarts. Remediation: declare it a Counter and let rate() and increase() handle resets automatically.

A Quick Reference for the Common Cases

To make the decision automatic for the measurements that appear in almost every service, fix these defaults in mind. Request and error totals are Counters, queried with rate(). In-flight requests, queue depth, and connection-pool size are Gauges. Request latency and response size are Histograms with SLO-aligned buckets. Cache hit ratio is two Counters divided at query time, never a Gauge. A garbage-collection pause time on a single daemon is the rare legitimate Summary. Anchoring on these defaults removes most of the per-metric deliberation and leaves only the genuinely novel measurements to reason about from first principles using the decision tree above.

When a measurement does not fit any default, walk the three questions in order — only increases, moves both ways, or a distribution — and let the answer pick the type before you think about labels. Then apply the cardinality lens: if the natural label is unbounded, keep that dimension in traces or logs and instrument the bounded view. The type decision and the label decision together produce telemetry that is cheap to store and correct to query.

Frequently Asked Questions

When should I use a Counter instead of a Gauge?

Use a Counter for cumulative event totals that only increase, such as requests served or errors raised, and always query them with rate(). Use a Gauge for values that move up and down, such as queue depth or memory in use.

Is a Histogram always better than a Summary?

For fleet-wide latency SLOs, yes, because histogram buckets aggregate across replicas and let you compute quantiles at query time. A Summary is only preferable when you need a precise quantile from a single process and will never aggregate it.

Can I change histogram buckets after deployment?

You can change the bucket definition in code and redeploy, but historical samples keep their original buckets. Quantiles you derive at query time will only be as precise as the buckets that were active when the data was recorded.

What metric type fits a cache hit ratio?

Use two Counters, one for hits and one for total lookups, then divide their rates at query time. Computing the ratio as a single Gauge in the application loses the ability to window and aggregate it correctly.

Frequently Asked Questions

Related Guides