Structured Logging with the Python Standard Library: Zero-Dependency JSON Output

You need machine-parseable JSON logs but do not want to add a logging dependency to the service. This guide is part of the Python Logging Fundamentals and Structured Data reference and the formatter configuration guide, and it walks through the one technique that solves the problem with nothing but the standard library: subclass logging.Formatter, serialize the LogRecord into a deterministic dictionary, and wire it through logging.config.dictConfig.

LogRecord to OTel-aligned JSON Selected LogRecord attributes on the left map to standardized JSON field names on the right that match OpenTelemetry log conventions. LogRecord levelname levelno created getMessage() exc_info JSON fields severity_text severity_number timestamp message exception
The formatter maps raw record attributes onto OpenTelemetry-aligned JSON field names.

Prerequisites

This technique uses only modules that ship with CPython, so there is nothing to install.

# No third-party packages required; Python 3.10+ recommended.
python --version
# Optional: set a service name that the formatter can read from the environment.
export SERVICE_NAME="payment-service"

Implementation

Step 1 — Subclass logging.Formatter and override format(). Build a dictionary from the attributes you want on the wire. Select fields explicitly rather than dumping record.__dict__, which carries internal bookkeeping (relativeCreated, msg, args, levelno) and any attribute a third-party library may have attached. An explicit projection is the only way to keep the schema stable enough for a query layer to depend on, and it keeps the serialized line small. Note that the base class's format() also fills record.message and record.asctime as a side effect when called; overriding format() outright, as here, skips that machinery and is both faster and free of the local-timezone trap that formatTime carries.

Step 2 — Map fields to OpenTelemetry conventions. Emit severity_text from record.levelname and severity_number from record.levelno so collectors interpret the record without custom ingestion rules. Convert the timestamp from record.created to RFC 3339 with datetime.fromtimestamp(..., tz=timezone.utc). Aligning the numeric severity here is the same translation described in log levels and severity mapping.

Step 3 — Handle exceptions and extra fields safely. Call self.formatException(record.exc_info) and store the result as a single escaped string. The traceback is multi-line, and a raw newline in the byte stream breaks any consumer that splits on \n to find record boundaries, so either let json.dumps escape it inside a string value or replace \n explicitly as shown. The base class caches its rendering in record.exc_text after the first call, so two handlers sharing a record do not re-render the traceback twice. Merge caller-supplied context from a dedicated extra_fields key, guarded with if k not in log_obj, so user data can extend the payload but never overwrite the reserved schema.

Step 4 — Wire the formatter through dictConfig and propagate context. Reference the formatter by its callable path, attach it to a StreamHandler, and read trace identifiers from contextvars so async tasks emit correlated records. Avoid basicConfig() in production; declarative configuration fails fast on a bad class path.

import json
import logging
import logging.config
import asyncio
from contextvars import ContextVar
from datetime import datetime, timezone

# Async-safe context: each task and thread gets its own copy.
trace_id_var: ContextVar[str] = ContextVar("trace_id", default="")
span_id_var: ContextVar[str] = ContextVar("span_id", default="")


class OTelJSONFormatter(logging.Formatter):
    def format(self, record: logging.LogRecord) -> str:
        log_obj = {
            "timestamp": datetime.fromtimestamp(record.created, tz=timezone.utc).isoformat(),
            "severity_text": record.levelname,       # OTel severity_text
            "severity_number": record.levelno,       # OTel severity_number (Python scale)
            "logger": record.name,
            "message": record.getMessage(),
            "module": record.module,
            "function": record.funcName,
            "line": record.lineno,
            "trace_id": trace_id_var.get(),
            "span_id": span_id_var.get(),
        }
        # Store the traceback as one escaped string so line-delimited parsers stay intact.
        if record.exc_info and record.exc_info[0]:
            log_obj["exception"] = self.formatException(record.exc_info).replace("\n", "\\n")
        # Merge caller context without clobbering reserved keys.
        extra = getattr(record, "extra_fields", {})
        log_obj.update({k: v for k, v in extra.items() if k not in log_obj})
        return json.dumps(log_obj, default=str, sort_keys=True)


LOGGING_CONFIG = {
    "version": 1,
    "disable_existing_loggers": False,  # keep third-party loggers alive
    "formatters": {"otel_json": {"()": "__main__.OTelJSONFormatter"}},
    "handlers": {
        "console": {"class": "logging.StreamHandler", "formatter": "otel_json", "level": "INFO"}
    },
    "root": {"level": "INFO", "handlers": ["console"]},
}
logging.config.dictConfig(LOGGING_CONFIG)


async def simulate_request():
    # Populate context as if extracted from a W3C traceparent header at ingress.
    trace_id_var.set("4bf92f3577b34da6a3ce929d0e0e4736")
    span_id_var.set("00f067aa0ba902b7")
    logger = logging.getLogger("payment.service")
    logger.info("Transaction processed", extra={"extra_fields": {"amount": 49.99, "currency": "USD"}})
    try:
        raise ValueError("Invalid payment gateway response")
    except Exception:
        logger.exception("Payment routing failed")


if __name__ == "__main__":
    asyncio.run(simulate_request())

Why subclassing beats string templates

The base logging.Formatter interpolates a %-style template such as "%(levelname)s %(message)s", which can fake JSON by writing a template full of quotes and braces. That approach breaks the moment a message or a field value contains a quote, a backslash, or a newline, because string interpolation does no escaping. Subclassing and serializing with json.dumps is the only stdlib technique that produces valid JSON for arbitrary input, because the serializer escapes every value at the boundary. The default=str argument is the second half of that guarantee: it coerces a datetime, Decimal, or UUID that would otherwise raise TypeError into a string instead of crashing the formatter mid-line.

Two ways to attach extra fields

There are two complementary mechanisms for getting application context into a record, and they suit different scopes. The extra= keyword on a single call carries data that changes per statement; the logging module copies each key onto the LogRecord as an attribute, so the formatter reads it back with getattr. Nest it under a single key — the example uses extra_fields — so user data can never collide with a reserved attribute like message or args, which would raise KeyError. A LoggerAdapter, by contrast, binds a fixed dict to a logger so every call from that component carries the same context without repeating it.

import logging

logger = logging.getLogger("payment.service")
# Component-scoped context: every call from this adapter carries worker_id.
worker = logging.LoggerAdapter(logger, {"extra_fields": {"worker_id": "w-3"}})
# Per-call context layered on top for this one statement.
worker.info("charge settled", extra={"extra_fields": {"amount": 12.0}})

Expected Output:

{"amount": 12.0, "message": "charge settled", "severity_text": "INFO", "worker_id": "w-3", ...}

Injecting request context with contextvars

The formatter reads trace_id_var.get() and span_id_var.get() at serialization time, which works because a ContextVar is copied per asyncio task and per thread. Set the values once at ingress — typically in middleware that parses a W3C traceparent header — and every record emitted anywhere downstream in that task picks them up with no call-site plumbing. The critical discipline is to reset the variable when the request ends, using the token returned by set(), so a worker that handles many requests on one thread never bleeds one request's trace ID into the next.

def begin(trace_id: str, span_id: str):
    # Return tokens so the caller can reset cleanly in a finally block.
    return trace_id_var.set(trace_id), span_id_var.set(span_id)


def end(tokens):
    t_tok, s_tok = tokens
    trace_id_var.reset(t_tok)
    span_id_var.reset(s_tok)

Expected Output:

# After end(tokens), the next record outside the request scope logs trace_id ""
{"message": "idle", "trace_id": "", "span_id": "", ...}

For the deeper rationale on why contextvars beats thread-local storage and how the record factory fits in, see formatter configuration, and route the formatted output through a non-blocking sink as described in handler architecture.

Configuration options

Option Location Effect
() dictConfig formatter Callable path that builds the formatter
disable_existing_loggers dictConfig root Keep (False) or silence (True) pre-existing loggers
sort_keys json.dumps Stable field ordering for diffs and golden tests
default=str json.dumps Coerce datetime, Decimal, UUID to strings
extra_fields logger call extra= Per-record context merged into the payload
handler level dictConfig handler Per-sink threshold independent of logger level

Verification

Run the module with python app.py. Each line is a single JSON object; piping through python -m json.tool confirms it parses.

Expected Output (two newline-delimited JSON records):

{"amount": 49.99, "currency": "USD", "function": "simulate_request", "line": 53, "logger": "payment.service", "message": "Transaction processed", "module": "__main__", "severity_number": 20, "severity_text": "INFO", "span_id": "00f067aa0ba902b7", "timestamp": "2026-06-19T10:30:00+00:00", "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736"}
{"exception": "Traceback (most recent call last):\\n ...\\nValueError: Invalid payment gateway response", "function": "simulate_request", "line": 58, "logger": "payment.service", "message": "Payment routing failed", "module": "__main__", "severity_number": 40, "severity_text": "ERROR", "span_id": "00f067aa0ba902b7", "timestamp": "2026-06-19T10:30:00+00:00", "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736"}

A quick assertion confirms the contract in a unit test:

import json, logging


def test_emits_valid_json():
    rec = logging.LogRecord("t", logging.INFO, __file__, 1, "hi", None, None)
    line = OTelJSONFormatter().format(rec)
    parsed = json.loads(line)            # raises if not valid JSON
    assert parsed["severity_number"] == 20
    assert parsed["severity_text"] == "INFO"


def test_extra_fields_and_context():
    # Context set in this task must surface on the record.
    trace_id_var.set("4bf92f3577b34da6a3ce929d0e0e4736")
    rec = logging.LogRecord("t", logging.INFO, __file__, 1, "ok", None, None)
    rec.extra_fields = {"amount": 5}     # simulate extra= injection
    parsed = json.loads(OTelJSONFormatter().format(rec))
    assert parsed["amount"] == 5
    assert parsed["trace_id"] == "4bf92f3577b34da6a3ce929d0e0e4736"
    # Reserved keys must win over user-supplied collisions.
    rec.extra_fields = {"message": "spoofed"}
    parsed = json.loads(OTelJSONFormatter().format(rec))
    assert parsed["message"] == "ok"     # formatter guards reserved keys

Common mistakes

Returning the serialized string only inside the if record.exc_info block. The formatter returns None for normal records, which surfaces as null output or a TypeError in the handler. Make return json.dumps(...) the final, unconditional statement of format().

Serializing non-primitive objects directly. A bare json.dumps on a dict containing a datetime raises TypeError: Object of type datetime is not JSON serializable. Pass default=str or convert values before serialization.

Using module-level globals for request context. Global dictionaries leak data across concurrent requests in WSGI and ASGI servers. Replace them with contextvars, which isolate state per task and per thread.

Overwriting reserved fields from extra=. Passing a reserved attribute name directly, as in extra={"message": "x"}, raises KeyError: "Attempt to overwrite 'message' in LogRecord" because the logging module refuses to clobber its own attributes. Nest caller data under one namespaced key such as extra_fields and merge it with a k not in log_obj guard so user values can never shadow the reserved schema.

Forgetting to reset a ContextVar after the request. A set() without a matching reset() leaves the previous request's trace ID live on a reused worker thread, so an idle or cross-request log line inherits a stale identifier. Capture the token returned by set() and call reset(token) in a finally block at the request boundary.

Frequently Asked Questions

Can I use the standard library for structured logging in async applications?

Yes. Pair logging with contextvars to propagate trace identifiers across event loops without thread-local storage. Because each asyncio task gets its own copy of the context, records stay correctly attributed even under concurrent I/O.

How do I handle multi-line exception tracebacks in JSON logs?

Render the traceback with formatException and store it as a single string field. Escape embedded newlines, or split the text into a list field, so line-delimited log parsers do not break the record across multiple lines.

Does this approach impact performance compared to third-party libraries?

The overhead is negligible. A json.dumps call on a small dictionary adds roughly 50 to 100 microseconds per record, and avoiding an extra dependency keeps the import graph and memory footprint smaller. Third-party libraries are a convenience, not a performance requirement.

Should I pass extra fields with extra= or a LoggerAdapter?

Use extra= for per-call data that varies each log statement, and a LoggerAdapter when a fixed set of context applies to every call from a component. The adapter injects its dict into every record so call sites do not repeat it, while extra= keeps one-off values local to the statement.