Python Formatter Configuration for Production Observability
Formatter configuration decides how every log event is serialized before a handler ships it to stdout, a file, or a collector, so it is the single most consequential setting for whether your telemetry is machine-parseable. This guide is part of the Python Logging Fundamentals and Structured Data reference, and it focuses on the standard library's logging.Formatter because that is the layer every other library ultimately renders through. A correctly built formatter produces deterministic JSON, enforces UTC timestamps, and attaches distributed-trace context without forcing application code to pass identifiers by hand. For the zero-dependency serialization details see structured logging with Python standard library, and for the factory-based trace plumbing see adding trace IDs to log records.
The priorities below shape every decision in this guide:
- Balance serialization cost against readability, defaulting to compact JSON in production.
- Enforce a deterministic field set and ordering so SIEM and query layers stay stable.
- Inject trace and span identifiers for correlation without blocking the calling thread.
- Degrade gracefully when optional context attributes are missing.
Prerequisites
The core technique needs only the standard library, which ships with every supported CPython release. The optional accelerator is a maintained JSON logging formatter; pin it so reproducible builds do not drift.
# Python 3.11+ recommended for taskName on LogRecord and faster json
python --version
# Optional, only if you prefer a maintained formatter over a hand-rolled subclass
pip install "python-json-logger>=2.0.7,<4.0.0"
Concept and architecture
A logging.Formatter is invoked by a handler, on the thread that emitted the record, after the level filter has already passed. It receives a fully populated LogRecord and must return a string. Because formatting is synchronous and in-band, every microsecond it spends is added to the latency of the request that logged. That single fact drives the rest of this guide: keep the formatter cheap, keep it deterministic, and resolve any expensive context before the log call rather than inside format().
The record itself is constructed by a factory. By default that factory is logging.LogRecord, but logging.setLogRecordFactory() lets you install a subclass that captures ambient state, such as a trace identifier, at creation time. This is the cleanest place to attach correlation data because it runs once per record and requires no change to call sites. The formatter then simply reads the attribute. Align the emitted severity field with the conventions described in log levels and severity mapping so a single record is interpretable by both human readers and automated routing.
Handlers and formatters are decoupled by design: one formatter can serve many handlers, and you should pre-allocate it once rather than constructing a new instance per record. The handler owns delivery and buffering; for non-blocking delivery to a central sink, route formatted output through the patterns in handler architecture.
Formatter versus filter responsibilities
A recurring design error is to overload the formatter with logic that belongs in a filter. The two have strictly separate jobs, and the logging module runs them at different points in the pipeline. A logging.Filter is consulted on the logger and again on the handler before anything is serialized; its filter(record) method returns a falsy value to drop the record entirely, and it is also the sanctioned place to mutate the record because it can attach attributes that later stages will read. A logging.Formatter is invoked last, by the handler, with the single contract of returning a string. It must never decide whether a record survives, and it must never perform expensive enrichment, because by the time it runs the drop decision is already made and any work it does is unconditional.
This separation has concrete consequences. If you need to suppress health-check noise, scrub a credential, or tag every record with the current request's tenant ID, do it in a filter (or the record factory), not the formatter. Putting a drop condition inside format() is impossible — the method has no way to signal "skip this line" — and putting enrichment there means it runs even for records that a downstream handler at a higher level would have discarded. The table later in this guide treats the record factory as the enrichment site of choice for ambient context because it runs exactly once per record, but a filter is the right tool when the enrichment decision depends on the record's own contents.
A minimal redaction filter shows the boundary clearly: the filter mutates and returns True, and the formatter never knows redaction happened.
import logging
import re
_CARD = re.compile(r"\b\d{13,16}\b")
class RedactCardNumbers(logging.Filter):
"""Mutate the message in place, then approve the record."""
def filter(self, record: logging.LogRecord) -> bool:
record.msg = _CARD.sub("[REDACTED]", record.getMessage())
record.args = () # message is now fully interpolated
return True # never drops; only sanitizes
Expected Output:
{"level":"INFO","message":"charged card [REDACTED] for 42.50"}
Step-by-step implementation
Step 1 — Capture trace context in a record factory. Read the ambient trace and span identifiers from contextvars so the value is correct across both threads and coroutines, and attach them to every record.
import contextvars
import logging
# Async-safe context propagation: contextvars are copied per task and per thread.
trace_id_ctx: contextvars.ContextVar[str | None] = contextvars.ContextVar("trace_id", default=None)
span_id_ctx: contextvars.ContextVar[str | None] = contextvars.ContextVar("span_id", default=None)
_base_factory = logging.getLogRecordFactory()
def context_factory(*args, **kwargs) -> logging.LogRecord:
record = _base_factory(*args, **kwargs)
# Capture at creation time so the value reflects the emitting context, not format() time.
record.trace_id = trace_id_ctx.get()
record.span_id = span_id_ctx.get()
return record
logging.setLogRecordFactory(context_factory)
Step 2 — Subclass logging.Formatter and build a deterministic dictionary. Select only the fields you want on the wire, in a fixed order, and serialize with compact separators.
import json
from datetime import datetime, timezone
class ProductionJSONFormatter(logging.Formatter):
"""Compact, deterministic JSON with UTC timestamps and optional trace context."""
def format(self, record: logging.LogRecord) -> str:
log_obj = {
"timestamp": datetime.fromtimestamp(record.created, tz=timezone.utc).isoformat(),
"level": record.levelname,
"logger": record.name,
"message": record.getMessage(),
"module": record.module,
"line": record.lineno,
}
# Exception text is rendered once and stored as a single escaped string field.
if record.exc_info and record.exc_info[0] is not None:
log_obj["exception"] = self.formatException(record.exc_info)
# Trace context is present only when a request scope set it.
trace_id = getattr(record, "trace_id", None)
if trace_id:
log_obj["trace_id"] = trace_id
log_obj["span_id"] = getattr(record, "span_id", None)
# default=str protects against Decimal, UUID, datetime in extra fields.
return json.dumps(log_obj, separators=(",", ":"), default=str)
Step 3 — Wire the formatter through dictConfig. Declarative configuration keeps logging setup out of business logic and fails fast on startup if a class path is wrong.
import logging.config
LOGGING_CONFIG = {
"version": 1,
"disable_existing_loggers": False, # keep third-party library loggers alive
"formatters": {
"json": {"()": "__main__.ProductionJSONFormatter"},
},
"handlers": {
"console": {"class": "logging.StreamHandler", "formatter": "json", "level": "INFO"},
},
"root": {"level": "INFO", "handlers": ["console"]},
}
logging.config.dictConfig(LOGGING_CONFIG)
Step 4 — Set context at the request boundary and reset it after. Populate contextvars in middleware so downstream code logs correlated records without passing identifiers around.
def begin_request(trace_id: str, span_id: str):
return trace_id_ctx.set(trace_id), span_id_ctx.set(span_id)
Configuration reference
| Setting | Where | Purpose | Recommended value |
|---|---|---|---|
separators |
json.dumps |
Strips whitespace from JSON output | (",", ":") |
default |
json.dumps |
Coerces non-serializable values | str |
datefmt |
formatter init | Format for formatTime; ignored if you derive timestamps yourself |
unset; use UTC ISO 8601 |
() |
dictConfig formatter | Callable that builds the formatter | "<module>.ProductionJSONFormatter" |
disable_existing_loggers |
dictConfig root | Whether to silence loggers created before config | False |
style |
formatter init | Interpolation style for the base class | "%" (default) |
| record factory | setLogRecordFactory |
Attaches ambient context per record | context_factory |
LogRecord attribute reference
Every formatter is, at bottom, a projection from LogRecord attributes onto wire fields, so knowing which attributes exist — and which are computed lazily — is what lets you build a deterministic schema. The attributes below are populated by the factory for every record; some are cheap to read and some trigger work the first time you touch them.
| Attribute | Type | Meaning | Cost note |
|---|---|---|---|
name |
str | Logger name (dotted hierarchy) | free |
levelname / levelno |
str / int | Symbolic and numeric severity | free |
getMessage() |
method | Message with %/{} args interpolated |
interpolates on call |
created / msecs |
float | Unix epoch seconds and millisecond fraction | free; prefer over asctime |
module / funcName / lineno |
str / str / int | Source location of the call | free |
pathname / filename |
str | Full and base path of the source file | free |
process / processName |
int / str | PID and process name | free |
thread / threadName |
int / str | Thread identity | free |
taskName |
str / None | asyncio task name (Python 3.12+) | free when present |
exc_info / exc_text |
tuple / str | Exception triple and its cached rendering | rendering is lazy |
stack_info |
str / None | Captured call stack when stack_info=True |
only if requested |
args |
tuple / dict | Raw interpolation arguments | free |
Two attributes deserve emphasis. Always call record.getMessage() rather than reading record.msg directly, because msg holds the un-interpolated template and the positional args; only getMessage() applies them, and it does so once. And prefer record.created over the asctime/formatTime machinery: formatTime calls time.localtime by default, which silently bakes the host's local timezone into your timestamps. Deriving the timestamp yourself from record.created with an explicit UTC zone, as the formatter above does, is both faster and correct across regions.
Avoid the temptation to serialize record.__dict__ wholesale. It carries internal bookkeeping attributes (relativeCreated, levelno, msg, args) you almost never want on the wire, and any custom attribute attached by a third-party library leaks into your schema unannounced. An explicit field projection is the only way to keep the output deterministic enough for a SIEM query layer to depend on.
Exception and stack formatting
Exception rendering is where naive JSON formatters most often corrupt a log stream. record.exc_info is the standard three-tuple (type, value, traceback) populated when you call logger.exception(...) or pass exc_info=True. The base logging.Formatter renders it through formatException, which returns a multi-line string identical to what an unhandled exception prints to stderr. That string contains embedded newlines, which are legal inside a JSON string value but fatal to any consumer that splits the stream on \n to find record boundaries — the single most common log-ingestion pipeline design.
There are two correct strategies. The simplest is to let json.dumps escape the rendered traceback as one string field, as the formatter above does: the newlines become \n escape sequences inside the JSON string, so the physical log line stays unbroken while the logical traceback is preserved. The second, useful when your backend can index arrays, is to split the traceback into a list of frame strings so each frame is individually queryable. Either way, the cardinal rule is that the traceback must not introduce a raw newline into the byte stream.
The base class also caches its work: after the first call, formatException stores its result in record.exc_text and reuses it, so two handlers sharing a record do not re-render the traceback twice. If you build your own renderer, respect that cache or you pay the rendering cost per handler. For non-exception diagnostics, stack_info=True on the log call captures the current call stack into record.stack_info; treat it exactly like exc_text, escaping it into a single field. The helper below renders both consistently and degrades to None when neither is present.
import logging
def render_diagnostics(record: logging.LogRecord, fmt: logging.Formatter) -> dict:
"""Return escaped exception/stack fields, or an empty dict if absent."""
out: dict[str, str] = {}
if record.exc_info and record.exc_info[0] is not None:
# formatException caches into record.exc_text after the first call.
out["exception"] = record.exc_text or fmt.formatException(record.exc_info)
if record.stack_info:
out["stack"] = fmt.formatStack(record.stack_info)
return out
Expected Output:
{"exception": "Traceback (most recent call last):\n File \"app.py\", line 49, in charge\n raise ValueError(\"declined\")\nValueError: declined"}
Structured extras
Application code attaches per-record context through the extra= keyword on any logging call. The logging module copies each key of that dict directly onto the LogRecord as an attribute, which is powerful and dangerous in equal measure: logger.info("x", extra={"amount": 42}) makes record.amount available to the formatter, but extra={"message": "x"} raises KeyError because message is a reserved attribute name. There are two robust patterns for consuming extras safely.
The first is to nest everything under a single namespaced key — extra={"fields": {...}} — so the formatter reads getattr(record, "fields", {}) and merges it under a known prefix, eliminating any chance of collision with name, msg, args, levelname, or the other reserved attributes. The second is to diff the record's __dict__ against the set of standard attributes and treat the remainder as user extras; this is what python-json-logger does internally. The first is more explicit and is what production schemas should prefer.
A LoggerAdapter is the right tool when a fixed set of context applies to every call from a component — a worker ID, a queue name — rather than per-call data. The adapter injects its extra dict into every record without the call site repeating it, and unlike the record factory it is scoped to one logger rather than the whole process.
import json
import logging
class ExtrasFormatter(logging.Formatter):
def format(self, record: logging.LogRecord) -> str:
obj = {"level": record.levelname, "message": record.getMessage()}
# Merge namespaced extras without clobbering reserved keys.
obj.update(getattr(record, "fields", {}))
return json.dumps(obj, separators=(",", ":"), default=str)
logger = logging.getLogger("worker")
# Component-scoped fixed context via LoggerAdapter.
bound = logging.LoggerAdapter(logger, {"fields": {"worker_id": "w-7"}})
bound.info("job done", extra={"fields": {"job_id": 1183}})
Expected Output:
{"level":"INFO","message":"job done","worker_id":"w-7","job_id":1183}
Async and concurrency considerations
contextvars are the correct propagation primitive because they are copied per asyncio task and per thread, so a trace identifier set inside one request never leaks into another even under heavy concurrency. Thread-local storage cannot make that guarantee for coroutine code because many coroutines share a thread. Capturing the value in the record factory rather than inside format() matters here too: by the time a handler runs the formatter, the emitting coroutine may have yielded, so reading the context at format time can return the wrong identifier.
Keep the formatter free of I/O. Any network call to resolve context turns a synchronous formatter into a blocking operation on the event loop, which cascades into request timeouts. Resolve everything you need before the log call, and let a non-blocking handler own delivery.
Production code examples
The module below combines the factory, formatter, and dictConfig wiring into one runnable example and exercises it from an asyncio task.
import asyncio
import contextvars
import json
import logging
import logging.config
from datetime import datetime, timezone
trace_id_ctx: contextvars.ContextVar[str | None] = contextvars.ContextVar("trace_id", default=None)
span_id_ctx: contextvars.ContextVar[str | None] = contextvars.ContextVar("span_id", default=None)
_base_factory = logging.getLogRecordFactory()
def context_factory(*args, **kwargs) -> logging.LogRecord:
record = _base_factory(*args, **kwargs)
record.trace_id = trace_id_ctx.get()
record.span_id = span_id_ctx.get()
return record
logging.setLogRecordFactory(context_factory)
class ProductionJSONFormatter(logging.Formatter):
def format(self, record: logging.LogRecord) -> str:
log_obj = {
"timestamp": datetime.fromtimestamp(record.created, tz=timezone.utc).isoformat(),
"level": record.levelname,
"logger": record.name,
"message": record.getMessage(),
"module": record.module,
"line": record.lineno,
}
if record.exc_info and record.exc_info[0] is not None:
log_obj["exception"] = self.formatException(record.exc_info)
trace_id = getattr(record, "trace_id", None)
if trace_id:
log_obj["trace_id"] = trace_id
log_obj["span_id"] = getattr(record, "span_id", None)
return json.dumps(log_obj, separators=(",", ":"), default=str)
logging.config.dictConfig({
"version": 1,
"disable_existing_loggers": False,
"formatters": {"json": {"()": "__main__.ProductionJSONFormatter"}},
"handlers": {"console": {"class": "logging.StreamHandler", "formatter": "json", "level": "INFO"}},
"root": {"level": "INFO", "handlers": ["console"]},
})
logger = logging.getLogger("observability")
async def handle_payment():
t = trace_id_ctx.set("0af7651916cd43dd8448eb211c80319c")
s = span_id_ctx.set("b7ad6b7169203331")
try:
logger.info("payment authorized", extra={"amount": 42.50})
try:
raise ValueError("gateway declined")
except ValueError:
logger.exception("payment failed")
finally:
trace_id_ctx.reset(t)
span_id_ctx.reset(s)
if __name__ == "__main__":
asyncio.run(handle_payment())
Expected Output:
{"timestamp":"2026-06-19T14:32:01.123456+00:00","level":"INFO","logger":"observability","message":"payment authorized","module":"__main__","line":47,"trace_id":"0af7651916cd43dd8448eb211c80319c","span_id":"b7ad6b7169203331"}
{"timestamp":"2026-06-19T14:32:01.124991+00:00","level":"ERROR","logger":"observability","message":"payment failed","module":"__main__","line":51,"exception":"Traceback (most recent call last):\n File \"...\", line 49, in handle_payment\n raise ValueError(\"gateway declined\")\nValueError: gateway declined","trace_id":"0af7651916cd43dd8448eb211c80319c","span_id":"b7ad6b7169203331"}
If you prefer a maintained library over a hand-written subclass, python-json-logger produces equivalent output with declarative field renaming:
from pythonjsonlogger import jsonlogger
formatter = jsonlogger.JsonFormatter(
"%(asctime)s %(levelname)s %(name)s %(message)s",
rename_fields={"asctime": "timestamp", "levelname": "level"},
)
Expected Output:
{"timestamp": "2026-06-19 14:32:01,123", "level": "INFO", "name": "observability", "message": "payment authorized", "amount": 42.5}
A second production pattern combines a redaction filter, a structured-extras formatter, and a record factory into one configuration so a single declarative block produces sanitized, correlated, deterministic JSON. This is closer to what a real service ships: enrichment in the factory, sanitization in the filter, and serialization in the formatter, each doing exactly one job. Routing this handler through the non-blocking machinery in handler architecture keeps the serialization cost off the request thread.
import json
import logging
import logging.config
import re
from datetime import datetime, timezone
_SECRET = re.compile(r"(api_key=)[A-Za-z0-9]+")
class RedactFilter(logging.Filter):
def filter(self, record: logging.LogRecord) -> bool:
record.msg = _SECRET.sub(r"\1[REDACTED]", record.getMessage())
record.args = ()
return True # sanitize-only, never drops
class ServiceFormatter(logging.Formatter):
def format(self, record: logging.LogRecord) -> str:
obj = {
"timestamp": datetime.fromtimestamp(record.created, tz=timezone.utc).isoformat(),
"level": record.levelname,
"logger": record.name,
"message": record.getMessage(),
}
obj.update(getattr(record, "fields", {})) # namespaced extras
if record.exc_info and record.exc_info[0] is not None:
obj["exception"] = self.formatException(record.exc_info)
return json.dumps(obj, separators=(",", ":"), default=str)
logging.config.dictConfig({
"version": 1,
"disable_existing_loggers": False,
"filters": {"redact": {"()": "__main__.RedactFilter"}},
"formatters": {"svc": {"()": "__main__.ServiceFormatter"}},
"handlers": {
"console": {
"class": "logging.StreamHandler",
"formatter": "svc",
"filters": ["redact"], # filter runs before the formatter
"level": "INFO",
}
},
"root": {"level": "INFO", "handlers": ["console"]},
})
logging.getLogger("billing").info(
"calling gateway api_key=sk_live_9f2c results pending",
extra={"fields": {"gateway": "stripe", "attempt": 1}},
)
Expected Output:
{"timestamp":"2026-06-19T14:32:01.500000+00:00","level":"INFO","logger":"billing","message":"calling gateway api_key=[REDACTED] results pending","gateway":"stripe","attempt":1}
Common mistakes
Formatter returns None for normal records.
Root cause: the return json.dumps(...) statement sits inside an if record.exc_info: block, so non-exception records fall through and the method implicitly returns None.
Remediation: make the serialization the final, unconditional statement of format(); the exception field is added conditionally, but the return is not.
Local-clock timestamps break cross-region aggregation.
Root cause: datetime.now() or formatTime without a timezone emits naive local time, so records from different regions sort and bucket incorrectly.
Remediation: derive the timestamp from record.created with tz=timezone.utc and emit ISO 8601, as shown above.
Unescaped control characters corrupt the JSON stream.
Root cause: embedding a raw multi-line traceback or a user-supplied newline directly in a value breaks line-delimited parsers that split on \n.
Remediation: let json.dumps escape values, and store exception text as one string field rather than printing it across lines.
Constructing a new formatter per record.
Root cause: building the formatter inside a hot path or per-handler-call multiplies allocation pressure under load.
Remediation: instantiate the formatter once at configuration time and reuse it; dictConfig already does this for you.
Reading trace context inside format().
Root cause: a coroutine may yield between emitting the record and the handler running the formatter, so the ambient context can belong to a different request by then.
Remediation: capture context in the record factory at creation time and have the formatter read the attached attribute.
Frequently Asked Questions
How does formatter configuration impact Python application latency?
Formatters execute synchronously on the thread that called the logger. Heavy string manipulation or JSON serialization increases CPU time per event and raises P99 latency under high request rates. Pre-allocating a single formatter instance and using compact JSON separators keeps the overhead in the low tens of microseconds.
Should I use the standard library formatter or a third-party JSON library?
A stdlib subclass eliminates dependency and supply-chain risk and is fast enough for most services. Reach for a maintained library like python-json-logger only when you need its field-renaming and schema features out of the box. Both approaches produce identical wire output if you control the field set.
How do I safely inject OpenTelemetry trace IDs into log formatters?
Use contextvars to propagate trace context across async boundaries, then capture it in a custom LogRecord factory installed with logging.setLogRecordFactory. The formatter reads the attached attribute, so no call site needs to pass a trace_id explicitly and there is no thread-local contention.
What is the recommended approach for handling exception formatting in JSON logs?
Call logging.Formatter.formatException to render the traceback, then store it as a single escaped string field rather than embedding raw multi-line text. Newlines inside a JSON string value are legal but break line-delimited parsers, so escape them at the serialization boundary.
What is the difference between a logging Formatter and a logging Filter?
A Filter decides whether a record is emitted and may mutate it to add attributes, running before serialization. A Formatter never makes pass or drop decisions; it only turns an already-approved record into the final string. Keep enrichment in a filter or the record factory and keep the formatter a pure serializer.