Async Tracing Patterns in Python
Tracing an asyncio service is mostly a context-management problem: spans must follow coroutines across await points and create_task boundaries without leaking into unrelated work, and export must never stall the event loop. This guide collects the production patterns that keep async traces intact, and it is part of the Distributed Tracing and OpenTelemetry in Python guide. It builds on OpenTelemetry SDK setup for provider initialization and links out to two focused tasks: tracing SQLAlchemy async queries and instrumenting aiohttp client requests.
Three priorities run through every pattern below: keep context isolated per asyncio task so spans nest correctly, keep export off the event loop so latency stays honest, and correlate logs with the active span so a trace and its logs can be joined later. Get these right and async tracing behaves identically to synchronous tracing from the backend's perspective.
Prerequisites
Install the SDK and the gRPC OTLP exporter with pinned ranges so the SDK and exporter stay compatible.
pip install \
"opentelemetry-sdk>=1.30.0,<2.0.0" \
"opentelemetry-exporter-otlp-proto-grpc>=1.30.0,<2.0.0"
Point the SDK at a collector and name the service before the process starts.
export OTEL_EXPORTER_OTLP_ENDPOINT="http://otel-collector:4317"
export OTEL_SERVICE_NAME="async-api"
Concept and architecture
OpenTelemetry's active span lives in a contextvars.ContextVar, not thread-local storage. This is the single most important fact for async tracing: contextvars is the only context mechanism that asyncio understands. When you call asyncio.create_task(), the runtime copies the current context into the new task, so a span started before the task becomes the parent of any span started inside it. asyncio.gather() schedules tasks that already carry that copied context, which is why concurrent children correctly nest under one parent without manual plumbing.
How contextvars copy-on-create actually works
A ContextVar is not a global variable; it is a key into a Context object, and exactly one Context is "current" per logical execution at any moment. Reading the active span calls ContextVar.get() against the current context, and start_as_current_span writes a new value with ContextVar.set(), which returns a Token. The token is what lets the span's context manager restore the prior value on exit, so spans nest as a stack rather than overwriting one another.
The crucial mechanic is that Context snapshots are copy-on-create, not copy-on-write and not shared by reference. contextvars.copy_context() produces a shallow copy of the current context's variable-to-value mapping. Mutating a ContextVar inside that copy with set() rebinds the key only in the copy; the original context is untouched. This is exactly why two asyncio tasks spawned from the same parent can each start their own child span without clobbering each other: each task received its own snapshot at create_task time, and each set() lands in a private mapping. The shared part is the value a key pointed to at copy time, but since a span object is immutable as far as the context is concerned (you replace the binding, you do not mutate the previous span), there is no cross-task interference.
copy_context().run(func, *args) is the explicit form of the same machinery. It enters the copied context, calls func with that context current, and restores the previous context on return. Anything func sets stays inside the copy. Use it when a coroutine must start from a deliberately frozen snapshot rather than inheriting whatever the caller's context happens to be at await time.
The corollary is that thread-local context is wrong here. Coroutines suspend at await and resume later, possibly on a different thread under a thread-pool executor, so thread-local storage cannot track the logical task. Anything that stashes the active span in thread-local state leaks it across unrelated coroutines. The event loop, by contrast, swaps the current Context in and out as it switches between ready tasks, so each coroutine always resumes with the context it suspended under.
create_task, gather, and TaskGroup span parenting
asyncio.gather(*coros) wraps any bare coroutine in a task via ensure_future, and every task copies the context at the instant it is created. The practical consequence is timing: the parent context is captured when create_task/gather runs, not when the child coroutine first executes. If you start a parent span, then gather, the children nest correctly. If you gather first and start the parent span afterward, the children captured the earlier (parentless) context and will float as roots.
asyncio.TaskGroup (Python 3.11+) behaves identically for context capture — each tg.create_task(...) snapshots the current context — but adds structured-concurrency semantics that interact with tracing in a useful way. The group's async with block does not exit until every child task completes, so a span opened around the TaskGroup reliably outlives all its children, and a child raising an exception cancels its siblings. Those cancellations surface as CancelledError inside the sibling spans, which you should record (set the span status to error or add an event) rather than swallow, so a partial fan-out failure is visible in the trace rather than appearing as silently truncated children.
Export has its own constraint. The exporter performs network I/O, and you must never do that on the event loop. The BatchSpanProcessor solves this by buffering finished spans in an in-memory queue and draining them from a dedicated background thread. Span creation and ending are cheap, synchronous, non-blocking operations on the loop; the slow gRPC write happens elsewhere. The background thread holds no event-loop reference and never awaits anything, which is precisely why it can block on the gRPC socket without affecting request latency.
Step-by-step implementation
- Initialize the provider once at import time. Build the
TracerProviderand attach aBatchSpanProcessorbefore any coroutine runs. Provider setup is synchronous and belongs at module scope, following the OpenTelemetry SDK setup workflow.
import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(
OTLPSpanExporter(endpoint=os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT")),
max_export_batch_size=512,
schedule_delay_millis=2000,
max_queue_size=2048,
))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__)
- Use the async context manager for every span.
start_as_current_spanattaches the span to the context on enter and detaches it on exit, even when the body raises. Never callspan.end()by hand inside a coroutine; an earlyawaitthat raises will leak the span otherwise.
import asyncio
from opentelemetry.trace import Status, StatusCode
async def process_item(item: dict) -> None:
with tracer.start_as_current_span("process_item") as span:
span.set_attribute("item.id", item["id"])
await asyncio.sleep(0.1) # non-blocking I/O; loop stays free
span.set_status(Status(StatusCode.OK))
- Let create_task carry the parent context. Start a parent span, then spawn the tasks inside it. Each task inherits the copied context, so every child nests under the parent on the same trace.
async def main() -> None:
items = [{"id": 1}, {"id": 2}, {"id": 3}]
with tracer.start_as_current_span("batch") as parent:
parent.set_attribute("batch.size", len(items))
# Each task copies the current context, so children nest under "batch"
tasks = [asyncio.create_task(process_item(i)) for i in items]
await asyncio.gather(*tasks)
Expected Output (collector spans):
batch trace_id=a1b2... span_id=1111... parent_id=null
process_item trace_id=a1b2... span_id=2222... parent_id=1111... item.id=1
process_item trace_id=a1b2... span_id=3333... parent_id=1111... item.id=2
process_item trace_id=a1b2... span_id=4444... parent_id=1111... item.id=3
- Prefer TaskGroup for structured fan-out. On Python 3.11+,
asyncio.TaskGroupgives the same context-copy behavior ascreate_taskbut guarantees the parent span outlives every child and propagates failures. Record cancellations on the affected spans so a partial failure is legible in the trace.
async def process_all(items: list[dict]) -> None:
with tracer.start_as_current_span("batch") as parent:
parent.set_attribute("batch.size", len(items))
try:
async with asyncio.TaskGroup() as tg:
# Each create_task snapshots the current context here,
# so every child nests under "batch".
for item in items:
tg.create_task(process_item(item))
except* Exception as eg: # one sibling failed; the rest were cancelled
parent.set_status(Status(StatusCode.ERROR))
parent.record_exception(eg.exceptions[0])
Expected Output (collector spans):
batch parent_id=null status=ERROR
process_item parent_id=<batch> item.id=1 status=OK
process_item parent_id=<batch> item.id=2 status=ERROR
process_item parent_id=<batch> item.id=3 status=ERROR (cancelled)
- Carry context into thread-pool work.
loop.run_in_executorruns the target on a worker thread that does not inherit the calling task's context, so the active span is invisible there. Capture the context first and run the blocking function throughctx.runso any span it opens nests under the caller. This is the correct pattern for wrapping a synchronous, blocking library you cannot make async.
import functools
def blocking_lookup(key: str) -> str:
# Runs on a worker thread; ctx.run restored the parent span here.
with tracer.start_as_current_span("blocking_lookup") as span:
span.set_attribute("cache.key", key)
return _legacy_sync_client.get(key) # blocking, but off the loop
async def lookup(key: str) -> str:
loop = asyncio.get_running_loop()
ctx = contextvars.copy_context() # snapshot includes the active span
# functools.partial binds args; ctx.run makes the snapshot current
# inside the worker thread before blocking_lookup executes.
return await loop.run_in_executor(
None, lambda: ctx.run(functools.partial(blocking_lookup, key))
)
Expected Output:
blocking_lookup parent_id=<caller span> cache.key=user:42
- Isolate branches that must not share context. When parallel work should start fresh rather than nest under the caller, run it under a copied context so mutations stay local.
import contextvars
async def isolated(item: dict) -> None:
ctx = contextvars.copy_context()
await ctx.run(process_item, item) # runs with an independent snapshot
Configuration reference
| Setting | Default | Effect when tuned |
|---|---|---|
max_export_batch_size |
512 | Larger batches cut export calls but raise per-flush latency and memory. |
schedule_delay_millis |
5000 | Lower for fresher data under steady load; higher to coalesce bursts. |
max_queue_size |
2048 | Caps in-memory spans; spans are dropped once full under a traffic spike. |
export_timeout_millis |
30000 | Bounds how long a single flush waits on the collector. |
OTEL_TRACES_SAMPLER |
parentbased_always_on |
Set to parentbased_traceidratio for head sampling on busy services. |
OTEL_TRACES_SAMPLER_ARG |
— | Sampling ratio, e.g. 0.1, when using the ratio sampler. |
Async and concurrency considerations
The fastest way to corrupt async traces is to block the event loop inside an open span. A synchronous time.sleep, a blocking DB driver, or a CPU-bound loop holds the loop while the span stays open, so the span records the entire stall as its own latency and every other coroutine starves. Keep blocking work off the loop with await loop.run_in_executor(...) or an async driver, and confine in-span work to genuinely awaitable I/O. Database access is a common offender here; see tracing SQLAlchemy async queries for the async-engine-specific pattern, and instrumenting aiohttp client requests for outbound calls that must propagate the active context downstream.
Sampling decisions also matter more under high concurrency because each span you keep costs queue space. Apply a parent-based ratio sampler at the SDK to drop most traces cheaply at creation, then lean on the collector for tail sampling so slow or errored traces survive regardless of the head decision. The tradeoffs are covered in depth alongside span lifecycle and attributes tuning. Set attribute limits strictly: unbounded attributes on high-volume spans inflate every payload the background thread ships.
Manual context handling needs discipline. If you call context.attach() directly rather than using the span context manager, you own the matching context.detach(token) and it must run in a finally block. A missed detach leaves the span active for whatever coroutine the loop resumes next, producing spans with the wrong parent.
Background tasks are the subtlest case. A fire-and-forget asyncio.create_task(do_work()) that you never await still captures the current context at creation, so its spans nest under whatever was active at that line — which is rarely what you want once the originating request span has closed. The task may outlive the request, leaving a child span whose parent has already ended; backends render this as an orphaned or out-of-order span. Two remedies apply. If the background work is genuinely independent, sever the link deliberately by starting it under a copied context and opening a fresh root span inside the task, so it stands alone with its own trace. If it is logically part of the request, either keep the request span open until the task completes (a TaskGroup enforces this) or carry an explicit link via trace.Link to the originating span so the relationship is recorded without implying parent-child timing. Whichever you choose, hold a reference to every background task — a task with no live reference can be garbage-collected mid-flight, which both loses its spans and raises a "Task was destroyed but it is pending" warning.
Fan-out amplifies a second hazard: shared session and connection objects. A single aiohttp.ClientSession or AsyncEngine reused across gathered tasks is correct and recommended, but each per-request span must still be opened inside its own task so the instrumentation reads that task's context, not the parent's. Creating the client span outside the task and merely issuing the request inside it produces a single client span shared by every branch instead of one per call. The outbound and database cases each have a dedicated walkthrough — instrumenting aiohttp client requests and tracing SQLAlchemy async queries — and both depend on this per-task context rule to attribute spans correctly under concurrency.
Production code examples
This end-to-end example correlates logs with the active span so a trace and its log lines can be joined downstream. The formatter reads the current span context inside format, which works across await boundaries because the active span lives in contextvars. Emitting the W3C-formatted trace_id and span_id lets aggregators join logs to traces on deterministic keys, the same correlation strategy used for adding trace IDs to log records.
import logging
import json
from opentelemetry import trace
class AsyncTraceFormatter(logging.Formatter):
def format(self, record: logging.LogRecord) -> str:
ctx = trace.get_current_span().get_span_context()
# contextvars makes the active span visible across awaits
if ctx.is_valid:
record.trace_id = format(ctx.trace_id, "032x")
record.span_id = format(ctx.span_id, "016x")
else:
record.trace_id = "0" * 32
record.span_id = "0" * 16
return json.dumps({
"level": record.levelname,
"message": record.getMessage(),
"trace_id": record.trace_id,
"span_id": record.span_id,
})
logger = logging.getLogger("async_app")
logger.setLevel(logging.INFO)
_handler = logging.StreamHandler()
_handler.setFormatter(AsyncTraceFormatter())
logger.addHandler(_handler)
async def handle() -> None:
with tracer.start_as_current_span("handle"):
logger.info("processing request")
Expected Output:
{"level": "INFO", "message": "processing request", "trace_id": "a1b2c3d4e5f60718293a4b5c6d7e8f90", "span_id": "1234567890abcdef"}
The second example shows a clean shutdown path. Because the BatchSpanProcessor queue may hold unflushed spans when the loop stops, force a flush before exit so in-flight spans are not lost.
async def serve() -> None:
try:
await main()
finally:
# Drain the background queue before the process exits
trace.get_tracer_provider().force_flush()
if __name__ == "__main__":
asyncio.run(serve())
Expected Output:
batch parent_id=null
process_item parent_id=<batch span_id> (x3, all flushed before exit)
The third example is a realistic fan-out: a request handler queries several upstream shards concurrently, then kicks off a fire-and-forget audit task that must not nest under the (soon-to-close) request span. It demonstrates per-task span creation under gather, a deliberately detached background span, and a held reference set so the task is not collected mid-flight.
from opentelemetry.context import Context
_background: set[asyncio.Task] = set() # keep references alive
async def query_shard(shard: int, key: str) -> dict:
# Opened inside the task, so the instrumentation reads THIS task's context.
with tracer.start_as_current_span("query_shard") as span:
span.set_attribute("db.shard", shard)
await asyncio.sleep(0.05) # stand-in for the async driver call
return {"shard": shard, "key": key}
async def audit(key: str) -> None:
# Detached: an empty context makes "audit" a fresh root on its own trace,
# so it never dangles under the request span after that span has ended.
with tracer.start_as_current_span(
"audit", context=Context()
) as span:
span.set_attribute("audit.key", key)
await asyncio.sleep(0.2)
async def handle_request(key: str) -> list[dict]:
with tracer.start_as_current_span("handle_request") as span:
span.set_attribute("request.key", key)
# Per-shard spans each nest under handle_request via gather's copy.
results = await asyncio.gather(*(query_shard(s, key) for s in range(3)))
# Fire-and-forget; hold the reference so it is not GC'd mid-flight.
task = asyncio.create_task(audit(key))
_background.add(task)
task.add_done_callback(_background.discard)
return results
Expected Output (collector spans):
handle_request parent_id=null trace_id=aaaa...
query_shard parent_id=<handle_request> db.shard=0 trace_id=aaaa...
query_shard parent_id=<handle_request> db.shard=1 trace_id=aaaa...
query_shard parent_id=<handle_request> db.shard=2 trace_id=aaaa...
audit parent_id=null audit.key=user:42 trace_id=bbbb...
The audit span surfaces on its own trace_id, so it never appears as a child of an already-closed request span, while the three shard spans nest cleanly under the request.
Common mistakes
Thread-local context in asyncio. Storing the active span in thread-local state fails across await because coroutines suspend and resume independently of threads. The result is leaked context and spans attached to the wrong parent. Rely on the SDK's contextvars-based context exclusively.
Blocking I/O inside an open span. A synchronous network call or CPU-bound loop inside start_as_current_span blocks the loop, inflates the span's recorded duration, and starves other coroutines. Move blocking work to an executor or switch to an async driver.
Unbounded span creation in tight loops. Starting a span per iteration of a hot for/while loop without sampling exhausts memory and overflows the exporter queue once max_queue_size is hit, dropping traces silently. Sample at the SDK and only span meaningful units of work.
Forgetting to flush on shutdown. Letting the process exit without force_flush() discards spans still sitting in the batch queue, so the last requests before shutdown vanish from the backend. Flush in a finally block around the event loop.
Losing context across run_in_executor. Offloading blocking work with loop.run_in_executor without first capturing the context means the worker thread starts from an empty context, so any span it opens becomes a parentless root on a new trace. Snapshot with contextvars.copy_context() and invoke the target through ctx.run(...) so the active span travels into the worker.
Dropping references to background tasks. A fire-and-forget task created with asyncio.create_task and not stored anywhere can be garbage-collected before it finishes, which both discards its spans and emits a "Task was destroyed but it is pending" warning. Keep every background task in a module-level set and discard it from a done callback so the reference lives exactly as long as the task does.
Frequently Asked Questions
How do I prevent trace context loss when using asyncio.gather?
asyncio.create_task() copies the current contextvars context into each new task at creation time, so spans started before the gather become parents of spans started inside the tasks. If parallel branches must not share mutations, run each under contextvars.copy_context().run(...) to give it an independent snapshot.
Does OpenTelemetry support async exporters natively in Python?
The SDK does not run the exporter on the event loop. BatchSpanProcessor buffers spans in memory and flushes them from a dedicated background thread using the OTLP exporter, so trace generation never blocks await points provided schedule_delay_millis is tuned to your traffic.
What is the recommended sampling strategy for high-throughput async services?
Combine head sampling at the SDK with tail sampling at the collector. A parent-based ratio sampler drops most traces cheaply at creation, while the collector retains slow or errored traces regardless of the head decision, preserving the traces you actually investigate.
Why does using time.sleep instead of asyncio.sleep inflate my span durations?
time.sleep blocks the event loop, so every other coroutine, including the spans they own, is stalled until it returns. The blocked spans stay open and record the wait as their own latency. Use asyncio.sleep or run blocking calls in a thread executor to keep durations honest.
Do I need OpenTelemetry context detach in async code?
Only when you attach context manually. start_as_current_span used as an async context manager attaches and detaches for you. If you call context.attach() directly, you must call context.detach(token) in a finally block or the active span leaks into unrelated coroutines.
How do I keep the active span when offloading work to run_in_executor?
run_in_executor runs the function on a thread-pool worker that does not inherit the calling task's contextvars, so the active span is lost. Capture the context with contextvars.copy_context() before submitting and invoke the function through ctx.run(...), or attach the parent context inside the worker, so the span started there nests under the caller.