The OpenTelemetry Metrics SDK in Python
Recording metrics with OpenTelemetry in Python means wiring together a small set of explicit objects: a MeterProvider that owns configuration, one or more Meter instances that mint instruments, the instruments themselves, and a metric reader that periodically collects and exports aggregated data. Unlike ad-hoc counters scattered through a codebase, this SDK gives you a single configuration point for aggregation, temporality, and export, which is what makes it predictable under load. This guide is part of the Python Metrics and Instrumentation guide, and it shares its provider-and-exporter mental model with OpenTelemetry SDK setup for tracing; if you have already configured a TracerProvider, the metrics path will feel familiar. Two focused walkthroughs extend this page: exporting OTLP metrics to the Collector and recording counters and histograms with OpenTelemetry.
Prerequisites
Isolate a virtual environment and pin the metrics packages. The SDK and the gRPC OTLP exporter share a release train, so pin them to the same range. The API package is a transitive dependency of the SDK but pinning it explicitly keeps the data model fixed.
python -m venv .venv && source .venv/bin/activate
pip install \
"opentelemetry-api>=1.30.0,<2.0.0" \
"opentelemetry-sdk>=1.30.0,<2.0.0" \
"opentelemetry-exporter-otlp-proto-grpc>=1.30.0,<2.0.0"
You also need a reachable OTLP endpoint. For local development, run an OpenTelemetry Collector listening on 4317 for gRPC. The collector receiver wiring is covered in detail in exporting OTLP metrics to the Collector.
Concept & architecture
The metrics SDK is built from five collaborating objects. Understanding each one prevents the most common misconfigurations.
The MeterProvider is the root. It holds the Resource, the set of metric readers, and any View objects. You construct it once at process startup and register it globally with metrics.set_meter_provider(). Its lifecycle matters: nothing is exported until a reader is attached, and on shutdown you must call shutdown() (or rely on the registered atexit hook) so the final batch flushes.
A Meter is obtained from the provider via meter_provider.get_meter(name, version). The name and version form the instrumentation scope that travels with every exported metric. Create one Meter per module or library rather than one global Meter, so data is attributable to the code that produced it.
Instruments are the recording surface. Synchronous instruments are called from your code path: Counter (monotonic additive total, e.g. requests served), UpDownCounter (non-monotonic additive value, e.g. queue depth), and Histogram (distribution of values, e.g. request duration). Asynchronous instruments are read on demand through callbacks: ObservableCounter, ObservableUpDownCounter, and ObservableGauge. The observable variants are ideal for values you can sample but not increment, such as memory usage or connection-pool size. The mechanics of recording on each are detailed in recording counters and histograms with OpenTelemetry.
A metric reader collects aggregated state from the instruments and hands it to an exporter. The PeriodicExportingMetricReader runs a background timer thread that collects on a fixed interval and pushes through an OTLPMetricExporter. Because collection runs off-thread, recording on synchronous instruments stays cheap and non-blocking, which makes it safe inside asyncio request handlers.
A View is an optional transformation applied between an instrument and its aggregation. Views rename metrics, drop or allow specific attribute keys (the single most effective cardinality control), or override the aggregation, most importantly to set explicit histogram bucket boundaries.
Temporality
Temporality decides what a counter value means at export time. Under cumulative temporality each export carries the running total since the start of the process; under delta temporality each export carries only the change since the previous collection. Cumulative is robust to dropped exports because the next export re-states the total; delta is lighter and suits backends that recompute rates per interval. You set a preference per instrument kind on the exporter.
The choice is not cosmetic: it changes how the receiving backend computes rates and how it handles process restarts. A Prometheus-style store expects monotonic cumulative series and derives rates by differencing scrapes, so feeding it delta data produces nonsensical counters. A hosted OTLP endpoint that aggregates server-side often prefers delta, because each process restart resets a cumulative counter to zero and forces the backend to detect the reset. Pick one preference, encode it on the exporter, and keep it consistent across every service that writes to the same backend, otherwise dashboards mix two interpretations of the same metric name. The export hop and its temporality flag are covered end to end in exporting OTLP metrics to the Collector.
Aggregation and Views
Between an instrument and its export sits an aggregation. Counters use a sum aggregation, histograms use an explicit-bucket aggregation, and gauges use a last-value aggregation; these defaults are applied automatically. A View overrides any of them for a matched instrument. The three highest-value uses are pinning histogram bucket boundaries to your real latency profile, allow-listing attribute keys to cap the number of time series an instrument can produce, and renaming or dropping an instrument you do not control. Views are evaluated in order and the first match wins, so order specific Views before wildcard ones.
Step-by-step implementation
Step 1 — Build the Resource. Resource attributes are the top-level dimensions every metric is grouped by. Use semantic conventions for service identity.
import os
from opentelemetry.sdk.resources import Resource
from opentelemetry.semconv.resource import ResourceAttributes
resource = Resource.create({
ResourceAttributes.SERVICE_NAME: os.getenv("SERVICE_NAME", "checkout-service"),
ResourceAttributes.SERVICE_VERSION: os.getenv("SERVICE_VERSION", "3.1.0"),
ResourceAttributes.DEPLOYMENT_ENVIRONMENT: os.getenv("DEPLOYMENT_ENV", "production"),
})
Step 2 — Configure the OTLP exporter with a temporality preference. The gRPC exporter targets the Collector. The temporality preference maps each instrument kind to delta or cumulative.
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.metrics import Counter, Histogram, ObservableGauge
from opentelemetry.sdk.metrics.export import AggregationTemporality
# Prefer delta for additive instruments, cumulative for gauges.
temporality = {
Counter: AggregationTemporality.DELTA,
Histogram: AggregationTemporality.DELTA,
ObservableGauge: AggregationTemporality.CUMULATIVE,
}
exporter = OTLPMetricExporter(
endpoint=os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "otel-collector:4317"),
insecure=False,
timeout=10,
preferred_temporality=temporality,
)
Step 3 — Wrap the exporter in a periodic reader. The interval is the cadence at which all instruments are collected and observable callbacks fire.
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
reader = PeriodicExportingMetricReader(
exporter,
export_interval_millis=15000, # collect + export every 15s
export_timeout_millis=10000,
)
Step 4 — Define Views for aggregation control. This View pins explicit latency buckets on a duration histogram and drops a high-cardinality attribute.
from opentelemetry.sdk.metrics.view import View, ExplicitBucketHistogramAggregation
latency_view = View(
instrument_name="http.server.duration",
aggregation=ExplicitBucketHistogramAggregation(
boundaries=[5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000]
),
)
# Drop a noisy attribute everywhere to bound cardinality.
drop_user_id = View(instrument_name="*", attribute_keys={"http.method", "http.route"})
Step 5 — Construct and register the MeterProvider. Pass the resource, the reader, and the Views, then set it globally.
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
provider = MeterProvider(
resource=resource,
metric_readers=[reader],
views=[latency_view, drop_user_id],
)
metrics.set_meter_provider(provider)
Step 6 — Acquire a Meter and instruments. Name the Meter after the module so the instrumentation scope is meaningful.
meter = metrics.get_meter(__name__, "3.1.0")
request_counter = meter.create_counter(
"http.server.request.count",
unit="{request}",
description="Total HTTP requests handled",
)
latency_hist = meter.create_histogram(
"http.server.duration",
unit="ms",
description="HTTP server request duration",
)
Configuration reference
| Object / parameter | Purpose | Typical value |
|---|---|---|
Resource.create({...}) |
Service identity attached to every metric | service.name, service.version, deployment.environment |
MeterProvider(metric_readers=...) |
Root config; owns readers and Views | one provider per process |
get_meter(name, version) |
Instrumentation scope on exported data | __name__, package version |
PeriodicExportingMetricReader(export_interval_millis=...) |
Collection cadence | 15000–60000 ms |
export_timeout_millis |
Per-export deadline | 10000 ms |
OTLPMetricExporter(endpoint=...) |
gRPC target | otel-collector:4317 |
OTLPMetricExporter(insecure=...) |
Disable TLS (dev only) | False in production |
preferred_temporality={...} |
Delta vs cumulative per instrument kind | delta for counters/histograms |
View(aggregation=ExplicitBucketHistogramAggregation(...)) |
Custom histogram buckets | latency boundaries in ms |
View(attribute_keys={...}) |
Allow-list attributes, dropping the rest | bounded label set |
Async & concurrency considerations
Synchronous instrument calls (add, record) are thread-safe and non-blocking; the SDK aggregates in memory and the export happens on the reader's background thread, so calling request_counter.add(1, {...}) inside an asyncio coroutine never awaits network I/O. Observable callbacks, by contrast, are invoked by the reader thread on each interval, so keep them fast and side-effect free; do not perform blocking I/O or acquire contended locks inside a callback, or you will stall collection for every instrument. If you fork worker processes (Gunicorn, Celery prefork), construct the MeterProvider after the fork so each worker owns its own reader thread and exporter connection; a provider created in the parent shares file descriptors and produces merged or duplicated series. This mirrors the post-fork initialization rule from OpenTelemetry SDK setup for tracing.
Because each worker exports its own series, the backend receives one data point per worker per instrument-attribute combination. That is correct and desirable: you can sum across workers at query time, and a single misbehaving worker stays visible instead of being averaged away. The cost is that resource attributes alone no longer uniquely identify a series, so include a worker or instance identifier in the resource when you need to disambiguate. Avoid the temptation to add a high-cardinality per-request identifier to bound this; the right level of cardinality is per worker and per bounded label set, and the attribute discipline that enforces it is the same one described in recording counters and histograms with OpenTelemetry.
Production code examples
End-to-end: record, collect, and export
This program initializes the full pipeline, registers an observable gauge for connection-pool depth, records on a counter and histogram, and forces a flush so the export is visible immediately.
import os
import time
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.semconv.resource import ResourceAttributes
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
# 1. Identity
resource = Resource.create({
ResourceAttributes.SERVICE_NAME: "checkout-service",
ResourceAttributes.SERVICE_VERSION: "3.1.0",
})
# 2. Exporter -> reader -> provider
exporter = OTLPMetricExporter(endpoint="otel-collector:4317", insecure=True)
reader = PeriodicExportingMetricReader(exporter, export_interval_millis=15000)
provider = MeterProvider(resource=resource, metric_readers=[reader])
metrics.set_meter_provider(provider)
# 3. Instruments
meter = metrics.get_meter(__name__, "3.1.0")
requests = meter.create_counter("http.server.request.count", unit="{request}")
latency = meter.create_histogram("http.server.duration", unit="ms")
# 4. Observable gauge: read pool depth on each collection
def read_pool(options):
from opentelemetry.metrics import Observation
yield Observation(7, {"pool": "primary"})
meter.create_observable_gauge("db.pool.in_use", callbacks=[read_pool], unit="{connection}")
# 5. Simulate traffic
for _ in range(50):
requests.add(1, {"http.route": "/checkout", "http.status_code": 200})
latency.record(42.5, {"http.route": "/checkout"})
time.sleep(0.01)
# 6. Flush deterministically and shut down
provider.force_flush()
provider.shutdown()
Expected Output:
{
"resourceMetrics": [{
"resource": {
"attributes": [
{"key": "service.name", "value": {"stringValue": "checkout-service"}},
{"key": "service.version", "value": {"stringValue": "3.1.0"}}
]
},
"scopeMetrics": [{
"scope": {"name": "__main__", "version": "3.1.0"},
"metrics": [
{
"name": "http.server.request.count",
"unit": "{request}",
"sum": {
"isMonotonic": true,
"aggregationTemporality": "AGGREGATION_TEMPORALITY_CUMULATIVE",
"dataPoints": [{
"asInt": "50",
"attributes": [
{"key": "http.route", "value": {"stringValue": "/checkout"}},
{"key": "http.status_code", "value": {"intValue": "200"}}
]
}]
}
},
{
"name": "http.server.duration",
"unit": "ms",
"histogram": {
"aggregationTemporality": "AGGREGATION_TEMPORALITY_CUMULATIVE",
"dataPoints": [{
"count": "50",
"sum": 2125.0,
"bucketCounts": ["0", "0", "0", "50", "0"],
"explicitBounds": [10, 25, 50, 100]
}]
}
},
{
"name": "db.pool.in_use",
"unit": "{connection}",
"gauge": {
"dataPoints": [{
"asInt": "7",
"attributes": [{"key": "pool", "value": {"stringValue": "primary"}}]
}]
}
}
]
}]
}]
}
Console export for local debugging
When you cannot reach a Collector, swap the OTLP exporter for the console exporter to print the same payload to stdout. Everything else in the pipeline is identical.
from opentelemetry.sdk.metrics.export import (
ConsoleMetricExporter,
PeriodicExportingMetricReader,
)
reader = PeriodicExportingMetricReader(
ConsoleMetricExporter(),
export_interval_millis=5000,
)
Expected Output:
{"resource_metrics": [{"resource": {"service.name": "checkout-service"},
"scope_metrics": [{"scope": {"name": "__main__"},
"metrics": [{"name": "http.server.request.count", "data":
{"data_points": [{"value": 50, "attributes": {"http.route": "/checkout"}}]}}]}]}]}
Common mistakes
No MeterProvider configured, metrics are dropped (or silently no data). The SDK falls back to a no-op provider if you call get_meter() before set_meter_provider(). Root cause: instrument creation happened at import time, before bootstrap ran. Remediation: build and register the provider first, then acquire Meters; in frameworks, do it in a startup hook.
Observable callback raises and metric vanishes. A callback that raises inside the reader thread drops that instrument's data point for the interval and logs an exception. Root cause: blocking I/O, missing keys, or a non-generator callback. Remediation: make callbacks pure and fast, return or yield Observation objects, and guard external lookups with cached values.
Histogram buckets look wrong or are the defaults. A View set the buckets but the metric name in the View did not match the instrument. Root cause: instrument_name mismatch (typo or wrong casing). Remediation: set instrument_name to the exact instrument name, or match with a * wildcard plus instrument_type.
Last batch never arrives. A short-lived script exits before the first export interval. Root cause: no flush on shutdown. Remediation: call provider.force_flush() and provider.shutdown() before exit; long-running services rely on the periodic interval instead.
Exploding series count. Per-request unique values (user IDs, full URLs) become attributes and create one time series each. Root cause: unbounded attribute cardinality. Remediation: use a View with attribute_keys to allow only bounded labels, the same discipline applied to recording counters and histograms with OpenTelemetry.
Frequently Asked Questions
When should I use delta temporality instead of cumulative?
Use delta temporality when your backend expects per-interval values, such as some hosted OTLP endpoints and Prometheus remote-write gateways that recompute rates. Use cumulative when exporting to Prometheus scraping or any store that tracks monotonic totals, since it tolerates dropped exports without losing the running sum.
Do I need a separate Meter per module?
Get one Meter per instrumentation scope, typically named after the module or library using the dunder name. The scope name and version appear on exported metrics and help you attribute data to the code that produced it.
Why are my observable gauge callbacks never called?
Observable callbacks only fire when the PeriodicExportingMetricReader collects, which happens on its export interval. If the process exits before the first interval or you never registered the callback on a real instrument, no data is collected. Lower the interval or call force_flush before shutdown.
Can I change histogram buckets after the SDK is running?
Bucket boundaries are fixed at MeterProvider construction through a View with an explicit bucket histogram aggregation. To change them you must rebuild the provider, because the aggregation is bound to the instrument when the first measurement is recorded.