Setting up OpenTelemetry in FastAPI: Production-Grade Configuration

Implementing distributed observability in asynchronous Python frameworks requires explicit SDK configuration. Auto-instrumentation wrappers frequently break Starlette's middleware chain. This guide delivers a zero-overhead implementation tailored to FastAPI's event loop architecture.

You will configure explicit provider initialization, enforce W3C Trace Context propagation, and tune batch processors for non-blocking export. The workflow aligns with enterprise-grade Distributed Tracing and OpenTelemetry in Python standards.

Prerequisites & Dependency Pinning

Async context loss occurs when instrumentation packages mismatch Starlette's internal routing layer. Pin exact versions to guarantee middleware compatibility.

pip install \
 opentelemetry-sdk>=1.20.0 \
 opentelemetry-instrumentation-fastapi==0.41b0 \
 opentelemetry-exporter-otlp-proto-grpc==1.20.0

Verify installation integrity before proceeding. Mismatched opentelemetry-api and opentelemetry-sdk versions trigger ImportError during provider initialization.

SDK Initialization & Resource Configuration

Resource metadata must be attached before the TracerProvider instantiates. This establishes service topology mapping across your observability backend. Follow the standardized OpenTelemetry SDK Setup workflow for deterministic provider lifecycles.

Environment variables override programmatic defaults. Always define OTEL_SERVICE_NAME and OTEL_RESOURCE_ATTRIBUTES at deployment time.

export OTEL_SERVICE_NAME="fastapi-backend"
export OTEL_RESOURCE_ATTRIBUTES="deployment.environment=production,team=platform"

Programmatic resource creation ensures fallback values when environment variables are absent. This prevents generic unknown_service labels in trace aggregations.

FastAPI Middleware & Async Span Lifecycle

Automatic HTTP instrumentation captures request boundaries but ignores custom business logic. Attach FastAPIInstrumentor after app instantiation to guarantee middleware intercepts all routes.

Manual span injection preserves async context across dependencies. Use tracer.start_as_current_span() inside route handlers or dependency functions. This maintains parent-child relationships across await boundaries.

W3C Trace Context headers propagate automatically. The middleware extracts traceparent and tracestate from incoming requests. Downstream services receive consistent baggage without manual header parsing.

Exporter Configuration & Sampling Strategy

High-throughput endpoints require controlled sampling to prevent collector saturation. ParentBasedSampler respects upstream sampling decisions while TraceIdRatioBased applies probabilistic filtering.

Configure OTLP gRPC endpoints with explicit TLS settings. Disable insecure=True in production deployments. Set OTEL_EXPORTER_OTLP_TIMEOUT to prevent exporter retries from blocking the event loop.

Tune BatchSpanProcessor queue limits to match your concurrency ceiling. Under backpressure, the processor drops spans gracefully instead of raising SpanExportError exceptions.

Production Code Examples

Complete SDK Bootstrap & FastAPI Attachment

import os
import asyncio
from fastapi import FastAPI, Request
from opentelemetry import trace, context
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.trace import SpanKind

# 1. Resource & Provider Initialization
resource = Resource.create({
 "service.name": os.getenv("OTEL_SERVICE_NAME", "fastapi-backend"),
 "deployment.environment": os.getenv("DEPLOYMENT_ENV", "production")
})

provider = TracerProvider(resource=resource)

# 2. Batch Processor Tuning (Non-blocking)
exporter = OTLPSpanExporter(
 endpoint=os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "otel-collector:4317"),
 insecure=os.getenv("OTEL_EXPORTER_OTLP_INSECURE", "true").lower() == "true"
)

processor = BatchSpanProcessor(
 exporter,
 max_export_batch_size=512,
 max_queue_size=2048,
 schedule_delay_millis=5000
)
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

# 3. FastAPI Attachment
app = FastAPI()
FastAPIInstrumentor.instrument_app(
 app,
 tracer_provider=provider,
 excluded_urls="healthz,metrics,docs"
)

# 4. Async Route with Manual Span Injection
tracer = trace.get_tracer(__name__)

@app.get("/process/{item_id}")
async def process_item(item_id: str, request: Request):
 with tracer.start_as_current_span(
 "process_item_logic",
 kind=SpanKind.INTERNAL,
 attributes={"item.id": item_id}
 ) as span:
 # Simulate async I/O
 await asyncio.sleep(0.05)
 span.set_attribute("processing.status", "completed")
 return {"item_id": item_id, "status": "processed"}

# Graceful Shutdown Hook
@app.on_event("shutdown")
async def shutdown():
 provider.force_flush(timeout_millis=5000)
 provider.shutdown()

Expected Output (Collector Side):

{
 "resourceSpans": [{
 "resource": {
 "attributes": [
 {"key": "service.name", "value": {"stringValue": "fastapi-backend"}},
 {"key": "deployment.environment", "value": {"stringValue": "production"}}
 ]
 },
 "scopeSpans": [{
 "spans": [{
 "name": "process_item_logic",
 "kind": "SPAN_KIND_INTERNAL",
 "attributes": [
 {"key": "item.id", "value": {"stringValue": "12345"}},
 {"key": "processing.status", "value": {"stringValue": "completed"}}
 ]
 }]
 }]
 }]
}

Common Mistakes

CLI Auto-Instrumentation Overrides Async Context

Error Signature: RuntimeWarning: coroutine 'Starlette.__call__' was never awaited or broken parent-child span relationships. Root Cause: opentelemetry-instrument CLI wrappers inject synchronous middleware that bypasses FastAPI's async routing stack. Remediation: Remove CLI execution flags. Implement programmatic FastAPIInstrumentor.instrument_app() after app instantiation. This guarantees Starlette's ASGI middleware chain executes natively.

SimpleSpanProcessor Blocks Event Loop

Error Signature: asyncio.exceptions.TimeoutError during peak load, followed by opentelemetry.sdk.trace.export.SpanExportError: Export timed out. Root Cause: SimpleSpanProcessor executes synchronous HTTP/gRPC calls on every request completion. This blocks the Python event loop and exhausts the thread pool. Remediation: Replace with BatchSpanProcessor. Set max_queue_size to 2x your expected concurrent requests. Configure schedule_delay_millis between 2000-5000ms to amortize network I/O.

Missing Service Topology Metadata

Error Signature: Observability backend displays unknown_service or python as service identifiers. Distributed trace aggregation fails across namespaces. Root Cause: SDK defaults to process executable names when OTEL_SERVICE_NAME is absent. Resource attributes are not explicitly attached to the provider. Remediation: Define OTEL_SERVICE_NAME in container orchestration manifests. Pass a Resource object with service.name during TracerProvider initialization. Verify attributes appear in the first exported span payload.