Best Practices for Log Rotation in Python
Reliable log rotation prevents a long-running Python service from exhausting its disk, losing records during rollover, or stalling request threads on rename and metadata operations. This guide is a focused walkthrough within the handler architecture guide and the broader Python Logging Fundamentals and Structured Data reference, covering thread-safe rollover, multi-process locking, and keeping rotation off the critical path.
Prerequisites
Rotation uses only the standard library. The fcntl locking shown below is POSIX-only (Linux and macOS). A bounded queue keeps rollover off the request path; the queue pattern itself is detailed in non-blocking logging with QueueHandler.
# Standard library only; Python 3.10+ recommended.
python --version # Python 3.12.x
Set a writable log directory and an explicit umask so log-aggregation agents can read the files:
export APP_LOG_DIR="/var/log/app"
umask 0022
Implementation
Step 1 — Choose the handler by constraint. Use RotatingFileHandler when disk capacity is the binding limit; it caps each file at maxBytes and keeps backupCount archives. Use TimedRotatingFileHandler when retention is defined in time, setting when="midnight" and interval=1 for daily audit windows. Time-based rotation depends on the system clock and only rolls over on the next emit, so an idle process can defer a rollover and then spike disk usage when traffic resumes.
Step 2 — Serialize rollover with an advisory lock. The stock RotatingFileHandler has no multi-process safety: concurrent doRollover() calls interleave lines and can raise OSError: [Errno 11] Resource temporarily unavailable. Subclass it to take an exclusive fcntl.flock while the stream is open and release it before the rename so the rollover is atomic per worker.
Step 3 — Keep rollover off the request path. Rollover performs file renames and metadata updates that are slow under load. Place the rotating handler behind a QueueHandler/QueueListener so the request thread only enqueues and the background listener absorbs the rollover latency.
import fcntl
import json
import logging
import queue
from logging.handlers import RotatingFileHandler, QueueHandler, QueueListener
log_queue: queue.Queue = queue.Queue(maxsize=10000) # bounded to cap memory
class JSONFormatter(logging.Formatter):
def format(self, record: logging.LogRecord) -> str:
payload = {
"timestamp": self.formatTime(record),
"level": record.levelname,
"message": record.getMessage(),
"module": record.module,
"trace_id": getattr(record, "trace_id", None),
"span_id": getattr(record, "span_id", None),
}
return json.dumps(payload, separators=(",", ":"))
class SafeRotatingFileHandler(RotatingFileHandler):
"""RotatingFileHandler that holds an exclusive flock during writes."""
def _open(self):
stream = super()._open()
fcntl.flock(stream.fileno(), fcntl.LOCK_EX) # block until exclusive
return stream
def doRollover(self):
if self.stream:
fcntl.flock(self.stream.fileno(), fcntl.LOCK_UN) # release before rename
self.stream.close()
self.stream = None
super().doRollover() # rename + shift backups, then reopen (re-locks)
def setup_production_logger(log_path: str = "/var/log/app/service.log"):
handler = SafeRotatingFileHandler(
filename=log_path,
maxBytes=50 * 1024 * 1024, # 50 MiB per file
backupCount=5, # keep 5 archives, ~300 MiB ceiling
encoding="utf-8",
delay=True, # defer open until first emit
)
handler.setFormatter(JSONFormatter())
listener = QueueListener(log_queue, handler, respect_handler_level=True)
listener.start()
logger = logging.getLogger("app")
logger.setLevel(logging.INFO)
logger.addHandler(QueueHandler(log_queue))
logger.propagate = False
return logger, listener
if __name__ == "__main__":
logger, listener = setup_production_logger()
logger.info(
"Payment processed",
extra={"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
"span_id": "00f067aa0ba902b7"},
)
listener.stop() # flush queue and release the lock on shutdown
Step 4 — Validate under load. Drive enough volume to force several rollovers, then confirm the numbered backups appear in sequence and no line is truncated or interleaved.
How the rotation handlers work internally
RotatingFileHandler and TimedRotatingFileHandler share a parent (BaseRotatingHandler) and differ only in when they decide to roll. On every emit, the handler calls shouldRollover(record). For the size-based handler that check formats the record, seeks to the end of the stream, and compares stream.tell() + len(message) against maxBytes; if the next write would cross the limit, it calls doRollover() first. doRollover() closes the current stream, then renames service.log to service.log.1, shifting service.log.1 to .2 and so on up to backupCount, deleting anything past it, and finally reopens a fresh empty service.log. The rename cascade is backupCount filesystem operations, which is why rollover is the expensive moment and why it must stay off the request path.
TimedRotatingFileHandler computes a rolloverAt timestamp at construction from when and interval, and shouldRollover simply tests time.time() >= rolloverAt. There are two consequences engineers underestimate. First, rotation is emit-driven: the file rolls only when a record arrives after the deadline, so a process that goes idle over a boundary defers its rollover and can then write a single oversized file when traffic resumes. Second, the handler tracks time, not size, so a traffic spike inside one interval can produce a backup far larger than any size cap — pair backupCount with monitoring rather than assuming time-based rotation bounds file size. Both handlers also support utc=True (timed) so boundaries do not drift with daylight-saving changes, and both honor delay=True to defer the initial open until the first emit.
Multi-process rotation hazards
Neither stock handler is safe across processes. When several workers each hold an open descriptor to service.log and one calls doRollover(), it renames the file out from under the others; the workers that did not rotate keep writing to the now-renamed inode, so their records land in service.log.1 while the rotating worker writes a fresh service.log. Concurrent doRollover() calls can also collide on the rename cascade and raise FileExistsError or, with the advisory lock contended, OSError: [Errno 11] Resource temporarily unavailable. The flock-based subclass above serializes writes and rollover within a single host, which is the right fix when one machine runs several workers.
When the operating system owns rotation instead, use WatchedFileHandler. It never rotates; on every emit it stats the path and compares the current inode and device against the descriptor it holds. If an external rotator (logrotate, journald) has moved or recreated the file, the inode changes, the handler reopens the path, and writing continues into the new file with no lost records — exactly the failure mode that copytruncate cannot avoid. The pairing is: logrotate with create and a postrotate SIGHUP, plus WatchedFileHandler on the Python side, so rotation and reopening are cleanly separated across the process boundary.
import logging
from logging.handlers import WatchedFileHandler
# Python writes; logrotate rotates; WatchedFileHandler reopens on inode change.
handler = WatchedFileHandler("/var/log/app/service.log", encoding="utf-8")
handler.setFormatter(JSONFormatter())
logging.getLogger("app").addHandler(handler)
Expected Output:
# After `logrotate --force`, writes continue into the new inode with no gap:
service.log # new, active (current inode)
service.log.1.gz # rotated by logrotate
Disk-space and retention math
Sizing rotation is arithmetic, not guesswork. With RotatingFileHandler, the worst-case footprint per process is maxBytes * (backupCount + 1) — the active file plus its archives. The configuration above (maxBytes=50 MiB, backupCount=5) caps one process at roughly 300 MiB; an eight-worker host therefore needs about 2.4 GiB of headroom for that one logger, before compression. If logrotate gzips archives, assume a 6–10x reduction on JSON logs and budget for the brief window where both the uncompressed and compressed copies exist during postrotate.
For TimedRotatingFileHandler, size is a function of throughput rather than a hard cap. Estimate average bytes per record (a structured JSON line with trace context is commonly 300–600 bytes), multiply by records per second and the interval length, and that is the expected daily file. The formula daily_bytes = rate * avg_record_bytes * 86400 turns an SLO ("retain 14 days") into a concrete reservation: with 200 records/second at 400 bytes, a day is roughly 6.4 GiB and a 14-day backupCount reserves about 90 GiB uncompressed. Always set backupCount to a finite number; the default of 0 keeps every archive forever and is the most common cause of a disk filling silently weeks after deploy.
Configuration Options
| Parameter | Handler | Purpose | Production guidance |
|---|---|---|---|
maxBytes |
RotatingFileHandler |
Size threshold to roll | 10–100 MiB; 0 disables size rotation |
backupCount |
both | Archives retained | Sets the storage ceiling with maxBytes |
when / interval |
TimedRotatingFileHandler |
Time-based trigger | "midnight", 1 for daily windows |
encoding |
both | File text encoding | Always "utf-8" |
delay |
both | Defer file open | True to avoid opening files in idle workers |
mode |
both | File open mode | "a" to append across restarts |
Verification
Force a rollover with a tight write loop and inspect the resulting files:
ls -1 /var/log/app/
Expected Output:
service.log
service.log.1
service.log.2
The active file holds one valid JSON object per line:
{"timestamp":"2026-06-19 10:00:00,123","level":"INFO","message":"Payment processed","module":"__main__","trace_id":"4bf92f3577b34da6a3ce929d0e0e4736","span_id":"00f067aa0ba902b7"}
Confirm no file-descriptor leak after repeated rollovers; the count should stay flat:
ls /proc/self/fd | wc -l
Common Mistakes
Using copytruncate with OS logrotate on a Python process
Python keeps the original file descriptor open. copytruncate copies then truncates the file in place, but the process keeps writing to the same inode, so everything after truncation is lost until the process reopens the file. Configure logrotate with create plus a postrotate SIGHUP instead, and reopen the handler on that signal.
Synchronous rotation blocking the request thread
Rollover renames files and updates metadata, which is slow under load and spikes latency if it runs inline. Place the rotating handler behind a QueueHandler so a background QueueListener thread absorbs the rollover cost.
Assuming os.rename is atomic on network filesystems
Rollover relies on atomic renames. NFS and EFS mounts can break that guarantee, leaving partial or duplicated files. Keep logs on a local volume, or stage writes in a local directory and ship from there.
Leaving backupCount at its default of zero
With backupCount=0 the size handler keeps no archives at all, so a rollover discards the previous file and you lose the history you most want during an incident. Always set a finite, deliberate value and derive it from your retention SLO and the disk-space math above, so the storage ceiling is a decision rather than an accident.
Frequently Asked Questions
How do I prevent log loss during rotation in multi-process Python applications?
Use a handler that takes an exclusive fcntl advisory lock around writes and rollover, and never use copytruncate with an external rotator. Each worker must reopen its file descriptor after rotation, either programmatically or on SIGHUP.
Should I use Python's built-in rotation or rely on OS-level logrotate?
For containerized or ephemeral environments, write JSON to stdout and let the platform handle it, or use Python's RotatingFileHandler with explicit size limits. For long-lived VM and bare-metal deployments, OS logrotate with a postrotate signal is preferred for centralized management.
How can I verify rotation integrity without impacting production performance?
Enable internal logging diagnostics, monitor that backupCount files are created in order, and track open file descriptors. Run a synthetic load test that forces several rollovers and measure the rollover latency before production rollout.
What is WatchedFileHandler and when should I use it instead of RotatingFileHandler?
WatchedFileHandler does not rotate itself; it watches the file's inode and device and reopens the file when an external tool like logrotate moves it. Use it on VMs where the OS rotator owns rotation, and use RotatingFileHandler when Python should own the size or time trigger directly.