Sampling Strategies for Distributed Tracing in Python OpenTelemetry

Sampling decides which traces you keep, and the wrong configuration either floods your backend with cost or silently drops the one error trace you needed at 3 a.m. This guide gives exact OpenTelemetry Python SDK configurations for head-based, parent-based, and tail-based sampling, and shows how they interact. It is a focused companion to Span Lifecycle and Attributes and sits within the broader Distributed Tracing and OpenTelemetry in Python guide; for the provider bootstrap these examples assume, see OpenTelemetry SDK Setup.

Head vs tail sampling Top path shows the SDK dropping traces at the edge before export. Bottom path shows the SDK keeping all traces and the Collector deciding after the full trace arrives. Where the keep-or-drop decision happens Python SDK head sampler drop 90% at origin backend Python SDK always on Collector tail policy backend HEAD TAIL
Head sampling drops at the SDK before export; tail sampling keeps everything to the Collector and decides after the full trace assembles.

Prerequisites

pip install \
  "opentelemetry-api>=1.30.0,<2.0.0" \
  "opentelemetry-sdk>=1.30.0,<2.0.0"

Head sampling can also be driven by environment variables read at startup, which is the recommended way to change rates per environment without touching code:

export OTEL_TRACES_SAMPLER="parentbased_traceidratio"
export OTEL_TRACES_SAMPLER_ARG="0.1"   # 10% of root traces

How Sampling Fits the Trace Lifecycle

Sampling answers one question — keep this trace or drop it — but where and when you answer it changes everything. Head sampling answers at the trace origin, the instant the root span is created, using only what is known then: the trace ID, the span name, and the attributes passed at creation. It is cheap and predictable, and because the decision is encoded in the sampled bit of the traceparent header, it propagates for free to every downstream service. The cost is blindness: at span start you do not yet know whether the request will error or run slow, so you cannot preferentially keep the traces you most want.

Tail sampling answers after the trace completes, in the Collector, where the full set of spans, their statuses, and their latencies are all visible. That visibility is the whole point — keep every error and every slow path, sample the rest — but it requires buffering complete traces in memory and routing every span of a trace to the same Collector instance. Between the two sits parent-based sampling, which is not a third policy so much as the rule that makes head sampling coherent across services: a child honors its parent's decision so a trace is never half-kept. Most production deployments combine all three: ParentBased(ALWAYS_ON) at the edge to capture full traces, and a Collector tail_sampling policy to do the actual keep-or-drop on outcome. Pure head probability is the right choice only when Collector cost or operational complexity rules out tail sampling.

Implementation

Step 1 — Pick the head sampler. TraceIdRatioBased(rate) makes a deterministic decision from the trace ID, so the same trace ID always yields the same keep-or-drop outcome across every service. That determinism is what keeps a trace whole: if the edge keeps it, downstream services keep it too.

Step 2 — Wrap it in ParentBased. ParentBased inspects the incoming span context. If a remote parent already decided to sample, the child honors that decision; only when there is no parent does it fall back to the root sampler. This is the correct default for every service except the very edge. Re-deciding sampling on a child service is the fastest way to fragment a trace, since half the spans get dropped while the rest are kept.

ParentBased actually exposes four delegate slots beyond the root: remote_parent_sampled, remote_parent_not_sampled, local_parent_sampled, and local_parent_not_sampled. The defaults are sensible — honor whatever the parent decided — but the slots let you, for example, force-sample whenever a remote parent was sampled while applying a probability to unsampled remote parents. In practice the only reason to touch them is to recover a fraction of unsampled traces for a specific high-value service; for everything else the bare ParentBased(root=...) form is correct. The determinism of TraceIdRatioBased is the other half of why this works: because the decision is a pure function of the 128-bit trace ID, an edge service and a service three hops downstream computing the same ratio against the same trace ID reach the same answer even if the parent flag were somehow lost.

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.sampling import ParentBased, TraceIdRatioBased
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter

# 10% of root traces; downstream services inherit this decision
sampler = ParentBased(root=TraceIdRatioBased(0.1))
provider = TracerProvider(sampler=sampler)
provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(provider)

Step 3 — Add a custom sampler when business logic must override probability. Implement the Sampler interface and return a SamplingResult. Keep should_sample free of I/O; it runs on the request thread, and a blocking call there shows up directly in P99 latency. A custom sampler also sees only the attributes passed at span creation through the attributes= argument — anything added later with set_attribute is invisible to the decision — so route, tenant, or any other sampling key must be supplied at start_as_current_span() time, as covered in Span Lifecycle and Attributes. The sampler below always keeps critical routes and otherwise defers to a probabilistic delegate.

from opentelemetry.sdk.trace.sampling import (
    Sampler, SamplingResult, Decision, ParentBased, TraceIdRatioBased,
)
from opentelemetry.trace import SpanKind


class PriorityRouteSampler(Sampler):
    """Force-sample critical routes; delegate everything else to a 10% sampler."""

    def __init__(self):
        self._fallback = ParentBased(root=TraceIdRatioBased(0.1))

    def should_sample(self, parent_context, trace_id, name,
                      kind=SpanKind.INTERNAL, attributes=None,
                      links=None, trace_state=None) -> SamplingResult:
        route = (attributes or {}).get("http.route", "")
        if route.startswith("/api/critical"):
            return SamplingResult(Decision.RECORD_AND_SAMPLE, attributes, trace_state)
        return self._fallback.should_sample(
            parent_context, trace_id, name, kind, attributes, links, trace_state)

    def get_description(self) -> str:
        return "PriorityRouteSampler{fallback=parentbased_traceidratio:0.1}"

Step 4 — Add tail sampling in the Collector for outcome-based retention. The SDK cannot tail-sample because it decides at span start, before latency or status is known. Head sampling trades coverage for cost: at 10% you pay a tenth of the storage but you also throw away 90% of your error traces, because an error is no more likely to be sampled than a success. Tail sampling inverts that trade — keep every error and slow trace, sample the boring successful ones — at the cost of buffering whole traces in the Collector until the decision window closes. Run the Collector with a tail_sampling processor, and crucially set the SDK head sampler to ParentBased(ALWAYS_ON) so the Collector receives complete traces to evaluate.

The decision_wait window is the parameter that most often bites teams. It must be longer than your slowest realistic trace, because the processor evaluates a trace only after decision_wait elapses from the first span it sees. Set it too short and a slow trace's late spans arrive after the decision is made, so they are evaluated as a separate, incomplete trace and the latency policy misfires. Set it too long and the in-memory num_traces buffer fills, evicting traces before they are decided. Size num_traces to roughly expected_new_traces_per_sec × decision_wait with headroom, and remember that tail sampling is stateful per Collector instance: a load-balanced fleet must route all spans of a trace to the same instance, usually via a loadbalancing exporter keyed on trace ID, or partial traces will land on different instances and each will see an incomplete picture.

processors:
  tail_sampling:
    decision_wait: 10s              # window to assemble a full trace
    num_traces: 50000               # in-memory trace buffer
    expected_new_traces_per_sec: 100
    policies:
      - name: errors
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: slow
        type: latency
        latency: {threshold_ms: 500}

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [tail_sampling]
      exporters: [otlp]

Configuration Options

Mechanism Where it runs Decision basis Cost lever
TraceIdRatioBased SDK Trace ID hash, fixed rate Network + storage at origin
ParentBased SDK Upstream sampled flag Preserves trace integrity
ALWAYS_ON / ALWAYS_OFF SDK Unconditional Full volume or none
Custom Sampler SDK Span name / attributes CPU on request thread
tail_sampling Collector Full-trace status, latency Collector memory + CPU

The matching env vars for the built-in head samplers are OTEL_TRACES_SAMPLER (always_on, always_off, traceidratio, parentbased_always_on, parentbased_traceidratio) and OTEL_TRACES_SAMPLER_ARG for the ratio.

Verification

Run the head-sampling provider with a critical route and inspect the console export. A sampled span prints; a dropped one produces no output and reports False from is_recording().

with trace.get_tracer(__name__).start_as_current_span(
    "checkout", attributes={"http.route": "/api/critical"}
) as span:
    print("recording:", span.is_recording())

Expected Output:

recording: True
{
  "name": "checkout",
  "context": {"trace_id": "0x7b8a...", "span_id": "0x1234...", "trace_state": "[]"},
  "kind": "SpanKind.INTERNAL",
  "parent_id": null,
  "status": {"status_code": "UNSET"},
  "attributes": {"http.route": "/api/critical"}
}

To confirm rates in aggregate, enable debug logging for the sampling module and cross-check retention against your configured probability over a representative window:

import logging
logging.getLogger("opentelemetry.sdk.trace.sampling").setLevel(logging.DEBUG)

A measured keep rate that drifts far from the configured ratio usually means a child service is re-deciding sampling instead of inheriting the parent decision.

Common Mistakes

Mixing head and tail sampling without coordination. Error signature: fragmented trace graphs and trace-ID lookups returning partial or no spans. Remediation: when tail sampling, set the edge SDK to ParentBased(ALWAYS_ON) so the Collector receives whole traces; never head-drop upstream of a tail policy.

Re-deciding sampling in a child service. Error signature: broken traceparent continuity, orphaned spans, inconsistent trace IDs across services. Remediation: use ParentBased everywhere except the edge, and never re-initialize the TracerProvider with a different root sampler downstream. Honoring the inbound decision depends on correct context propagation and baggage.

Blocking I/O inside a custom sampler. Error signature: P99 latency spikes and thread-pool exhaustion under load. Remediation: keep should_sample to attribute lookups only, cache anything derived, and target sub-100-microsecond evaluation; never call a database or HTTP service from it.

Frequently Asked Questions

Does OpenTelemetry Python support dynamic sampling rate changes without restarts?

Not in the built-in samplers, which are fixed at provider construction. You can implement a custom sampler that reads its rate from Redis or etcd, or move the decision to the Collector's tail sampling where policies can be reloaded.

How does ParentBased sampling handle a missing parent context?

It delegates to its configured root sampler, typically TraceIdRatioBased, so trace origins still get consistent baseline coverage while in-flight traces inherit the upstream decision.

Can I sample based on HTTP status codes in Python?

Not in the head-based SDK, which decides before the response exists. Use the Collector's tail_sampling status_code policy, or a custom sampler that inspects request attributes available at span start.

Will head sampling at ten percent break tail sampling?

Yes. Head sampling drops spans before the Collector ever sees them, so tail policies can only act on the ten percent that survived. Set the edge sampler to ParentBased(ALWAYS_ON) when you rely on tail sampling.