Non-Blocking Logging with QueueHandler and QueueListener
When a logger writes directly to a file or network sink, the thread or event loop that called log.info pays the full I/O latency, which destroys tail latency under load. The fix is to make logging an in-memory enqueue and push the slow writes onto a background thread using QueueHandler and QueueListener. This guide is part of the handler architecture guide within the Python Logging Fundamentals and Structured Data guide.
Prerequisites
# QueueHandler and QueueListener are part of the standard library.
python --version # CPython 3.8+ recommended
No third-party packages and no environment variables are required. Everything here uses logging, logging.handlers, and queue from the standard library.
How QueueHandler and QueueListener work internally
QueueHandler is a deliberately thin handler. Its emit method does three cheap things: it calls self.prepare(record), then self.enqueue(record). The prepare step is the subtle part — it formats the message and, in recent CPython, detaches args, exc_info, and exc_text so that a stale or unpicklable argument cannot blow up later on the listener thread. After prepare, the record is a self-contained, safe-to-move object, and enqueue simply calls self.queue.put_nowait(record). No formatting of the final output, no file write, and no socket call happens on the producer.
QueueListener is the consumer half. Its start method spawns a single non-daemon thread running a _monitor loop that blocks on self.queue.get(), hands each record to self.handle(record), and calls self.queue.task_done(). handle walks the listener's owned handlers and calls each one's handle method, which is where the real StreamHandler or RotatingFileHandler finally does its blocking I/O — on the background thread, off the hot path. Because only one listener thread drains the queue, the owned handlers are never called concurrently, which sidesteps a whole class of interleaving bugs that plague directly shared file handlers.
The single-consumer design also gives you a free ordering guarantee: records are written to the sinks in the exact order they were enqueued, even when dozens of producer threads raced to enqueue them. This is why a queue-fronted file is more trustworthy during an incident than a file handler shared directly across threads, where the GIL plus per-handler locking still admits subtle reordering at flush boundaries. The trade-off is that the listener is a single point of throughput: if your aggregate log rate exceeds what one thread can serialize and write, the queue backs up and your drop policy (below) starts shedding load. In practice one thread comfortably absorbs tens of thousands of records per second to a local file, and the bottleneck only appears with synchronous network sinks, which is itself an argument for exporting via a batching collector rather than a per-record socket handler.
Implementation
Step 1 — Create a bounded queue. An unbounded queue lets memory grow without limit when the downstream sink stalls. A maxsize caps that growth and lets you choose what happens when it fills.
import queue
# Cap at 10k records; tune to absorb a realistic burst during a sink stall.
log_queue: "queue.Queue" = queue.Queue(maxsize=10_000)
Step 2 — Attach a QueueHandler as the only handler. The QueueHandler.emit method calls put_nowait, so emitting a record is a fast in-memory operation. Make it the sole handler on the application logger so no synchronous sink is reachable from the hot path.
import logging
from logging.handlers import QueueHandler
queue_handler = QueueHandler(log_queue)
app_log = logging.getLogger("app")
app_log.setLevel(logging.DEBUG)
app_log.addHandler(queue_handler)
app_log.propagate = False # avoid duplicate emission via the root logger
Step 3 — Run a QueueListener over the real handlers. The listener owns the slow handlers and drains the queue from a single background thread. respect_handler_level=True makes each downstream handler apply its own level, so a console handler can stay at INFO while a file handler captures DEBUG. This matters because, without it, the listener ignores per-handler levels entirely and every owned handler sees every record that cleared the logger's level — the console would then print the DEBUG lines you meant to keep on disk only.
import sys
from logging.handlers import QueueListener
console = logging.StreamHandler(sys.stdout)
console.setLevel(logging.INFO)
console.setFormatter(logging.Formatter("%(asctime)s %(levelname)-8s %(name)s | %(message)s"))
file_handler = logging.handlers.RotatingFileHandler(
"app.log", maxBytes=10_485_760, backupCount=5
)
file_handler.setLevel(logging.DEBUG)
file_handler.setFormatter(logging.Formatter(
"%(asctime)s [%(levelname)s] %(name)s %(threadName)s - %(message)s"
))
listener = QueueListener(
log_queue, console, file_handler, respect_handler_level=True
)
listener.start() # spawns the background draining thread
Step 4 — Stop the listener on shutdown. listener.stop() enqueues a sentinel (an internal _sentinel, by default None), the _monitor loop drains everything ahead of the sentinel, sees it, exits, and stop joins the thread. The ordering guarantee is the point: every record enqueued before stop is processed before the thread terminates. Skipping stop lets the interpreter tear the thread down mid-drain and lose whatever was still buffered. Wire it into your shutdown path or an atexit hook.
import atexit
atexit.register(listener.stop) # flush buffered records before the process exits
app_log.info("service started")
app_log.debug("warming caches") # reaches the file, not the console
Graceful shutdown ordering
Shutdown order is a correctness concern, not a cleanup detail. Stop the producers first — finish serving in-flight requests so no new records are created — then call listener.stop() so the queue drains to completion, and only then close the underlying handlers or flush an OTLP exporter. Reversing this drops the tail of your logs: if you close the file handler before the listener finishes, the listener's final writes hit a closed stream. With multiple listeners (for example, separate stdout and network pipelines), stop them in reverse order of how records flow so an upstream stage never feeds a stopped downstream one.
Optional drop policy for a full queue
By default QueueHandler blocks the caller when the queue is full, which reintroduces the latency you were avoiding. To degrade gracefully, override enqueue to drop lower-severity records while preserving errors. This mirrors the backpressure approach in handler architecture, and the severity floor you pick should match your log levels and severity mapping contract.
class DroppingQueueHandler(QueueHandler):
"""Drop records below ERROR when the queue is full instead of blocking."""
def enqueue(self, record: logging.LogRecord) -> None:
try:
self.queue.put_nowait(record)
except queue.Full:
if record.levelno >= logging.ERROR:
self.queue.put(record) # block briefly to keep errors
# else: drop the record silently
Threading versus multiprocessing queues
queue.Queue is the right choice for the overwhelmingly common case: threads and asyncio coroutines inside a single process share memory, so a plain in-memory queue is fast and lossless. It does not cross process boundaries. When you run a pre-fork server such as Gunicorn or a multiprocessing worker pool and want every worker process to funnel records into one logging process, you need multiprocessing.Queue, which serializes each record and pipes it to the consumer. That serialization is exactly why QueueHandler.prepare strips unpicklable args and exc_info into a pre-formatted message — a record carrying a live socket or a lambda would otherwise fail to pickle on the way across. Run a single QueueListener in the dedicated logging process so only that process touches the files, and let every worker hold just a QueueHandler.
There is a fork-specific hazard to plan for. If you create the queue and start the listener before forking workers (the typical Gunicorn lifecycle), each child inherits a copy of the in-memory queue.Queue, but the listener thread does not survive the fork — only the forking thread is carried into the child. The result is workers happily enqueuing into a queue that nothing drains, a slow memory leak that ends in an OOM kill. The robust pattern is to build the logging stack in a post-fork hook so each process owns a live listener, or to switch to multiprocessing.Queue with the listener pinned to the parent. The same caution applies to file handles: two processes appending to one file without an external lock will interleave partial lines, which is the durability concern covered in best practices for log rotation in Python.
Use under asyncio and threads
QueueHandler.emit does not block the event loop because put_nowait returns immediately, so the same logger is safe to call from coroutines and threads alike. The blocking writes happen only on the listener thread. For correlating those records with request context across tasks, attach trace identifiers through structured logging with the Python standard library rather than relying on the listener thread, which has no access to per-request context.
Configuration options
| Option | Where | Effect |
|---|---|---|
maxsize |
queue.Queue |
Caps buffered records; 0 means unbounded (avoid in production). |
respect_handler_level |
QueueListener |
When true, each downstream handler applies its own level. |
propagate = False |
application logger | Prevents the root logger from re-emitting enqueued records. |
handler setLevel |
each real handler | Per-sink filtering applied on the listener thread. |
listener.stop() |
shutdown | Drains the queue and joins the background thread. |
multiprocessing.Queue |
cross-process | Required when separate processes feed one logging process. |
Verification
Run the script above. The console (INFO and up) and the file (DEBUG and up) diverge, proving respect_handler_level works and that I/O happened off the producer thread.
Expected Output (stdout):
2026-06-19T12:18:44 INFO app | service started
Expected Output (app.log):
2026-06-19T12:18:44 [INFO] app MainThread - service started
2026-06-19T12:18:44 [DEBUG] app MainThread - warming caches
The DEBUG line is absent from the console but present in the file, and both records flushed because listener.stop ran via the atexit hook before exit.
Common mistakes
Forgetting to stop the listener
Records buffered in the queue at interpreter exit are lost when the daemon-like listener thread is torn down mid-drain. Always call listener.stop() during graceful shutdown or register it with atexit so the sentinel flushes the queue.
Leaving the queue unbounded
A maxsize=0 queue grows without limit during a sink outage and triggers an OOM kill instead of shedding load. Set an explicit bound and pair it with a drop policy that protects ERROR and CRITICAL records.
Attaching real handlers to the logger as well
Adding the console or file handler to the logger in addition to the QueueHandler reintroduces synchronous writes on the hot path and produces duplicate output. The listener must be the only owner of the real handlers.
Closing handlers before the listener drains
Calling handler.close() or shutting down an exporter before listener.stop() means the listener's final writes land on a closed sink and the tail of your logs vanishes. Stop producers, then stop the listener, then close handlers — in that order.
Frequently Asked Questions
Does QueueHandler make logging safe under asyncio?
Yes. Enqueuing a record is a fast in-memory operation that does not block the event loop, and the blocking file or network writes happen on the QueueListener background thread instead.
What happens to buffered logs if I forget to stop the listener?
Records still sitting in the queue at interpreter exit may be lost because the listener thread is killed without draining. Always call listener.stop during shutdown so the sentinel flushes the queue.
How big should the queue be?
Size it to absorb a realistic burst during a sink stall, often a few thousand records. Pair the bound with a drop policy on ERROR-and-below so a full queue degrades gracefully instead of blocking callers.
Should I use a multiprocessing queue or a threading queue?
Use queue.Queue for threads and asyncio in one process, which is the common case. Use multiprocessing.Queue only when separate worker processes must funnel records to one logging process, and run the listener in that dedicated process.