Instrumenting Flask with Prometheus Metrics
You need request counts and latency histograms out of a Flask app, exposed on /metrics, and correct when gunicorn runs several workers. This guide does it with the official prometheus_client and a before/after request hook, no third-party exporter required. It is a focused companion to the broader Prometheus client instrumentation guide and part of the Python Metrics and Instrumentation guide.
The core idea is small: define instruments once, record them in before_request/after_request so you do not touch every view, and mount the client's WSGI exposition app alongside Flask so scraping uses the same port.
Prerequisites
Pin the client and the WSGI server. Flask itself is WSGI, so the client's make_wsgi_app() mounts directly with no adapter.
pip install "prometheus-client>=0.20.0,<1.0.0" \
"flask>=3.0.0,<4.0.0" \
"gunicorn>=21.2.0,<23.0.0"
For multi-worker deployments, export the multiprocess directory before gunicorn starts. It must exist, be writable, and be cleared on each boot so stale worker files do not resurrect old series.
export PROMETHEUS_MULTIPROC_DIR=/tmp/flask_prom
mkdir -p "$PROMETHEUS_MULTIPROC_DIR" && rm -f "$PROMETHEUS_MULTIPROC_DIR"/*
Implementation
Step 1 — Define instruments in their own module. Constructing a metric twice raises ValueError: Duplicated timeseries, so declare the request counter and latency histogram once and import them. Choose the endpoint label from the route rule rather than the raw path, and tune histogram buckets to web latency in seconds.
# metrics.py
from prometheus_client import Counter, Histogram
REQUEST_COUNT = Counter(
"flask_http_requests_total",
"Total Flask HTTP requests",
["method", "endpoint", "status"],
)
REQUEST_LATENCY = Histogram(
"flask_http_request_duration_seconds",
"Flask request latency in seconds",
["method", "endpoint"],
buckets=(0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0),
)
Step 2 — Record latency with before/after hooks. before_request stamps a monotonic start time onto Flask's request-scoped g. after_request reads it back, observes the elapsed seconds, and increments the counter. Using request.url_rule.rule keeps the endpoint label bounded to your route patterns instead of one series per concrete URL, which is the practice detailed in controlling label cardinality in Prometheus.
# app.py
import time
from flask import Flask, g, request, Response
from metrics import REQUEST_COUNT, REQUEST_LATENCY
app = Flask(__name__)
@app.before_request
def _start_timer():
g._start = time.perf_counter() # request-scoped start time
@app.after_request
def _record(response: Response):
# request.url_rule is None for 404s; fall back to a constant label
endpoint = request.url_rule.rule if request.url_rule else "<unmatched>"
elapsed = time.perf_counter() - getattr(g, "_start", time.perf_counter())
REQUEST_LATENCY.labels(request.method, endpoint).observe(elapsed)
REQUEST_COUNT.labels(request.method, endpoint, response.status_code).inc()
return response
@app.route("/users/<int:user_id>")
def get_user(user_id: int):
return {"id": user_id}
The 404 fallback matters: an unmatched route has request.url_rule is None, and a scanner hitting random paths would otherwise crash the hook. Folding all unmatched requests into one <unmatched> series both fixes the crash and caps cardinality.
Step 3 — Mount the metrics endpoint on the same port. Wrap the Flask WSGI app with DispatcherMiddleware so /metrics is served by the client's exposition app while everything else routes to Flask. This avoids opening a second port and inherits Flask's bind, TLS, and any front-proxy auth.
# wsgi.py — the gunicorn entrypoint
from werkzeug.middleware.dispatcher import DispatcherMiddleware
from prometheus_client import make_wsgi_app
from app import app as flask_app
application = DispatcherMiddleware(flask_app, {
"/metrics": make_wsgi_app(), # client renders the exposition
})
Step 4 — Aggregate across gunicorn workers. With PROMETHEUS_MULTIPROC_DIR set, instruments write to per-worker files instead of memory, and the scrape must merge them through a MultiProcessCollector. Build a fresh registry inside a custom /metrics app so the default per-process collectors do not double-count, and register the child_exit hook so a dead worker's files are cleaned up.
# wsgi.py — multiprocess-aware variant
import os
from werkzeug.middleware.dispatcher import DispatcherMiddleware
from prometheus_client import (
CollectorRegistry, make_wsgi_app, multiprocess,
)
from app import app as flask_app
def metrics_app(environ, start_response):
registry = CollectorRegistry()
multiprocess.MultiProcessCollector(registry) # merge per-worker files
return make_wsgi_app(registry)(environ, start_response)
target = metrics_app if os.environ.get("PROMETHEUS_MULTIPROC_DIR") else make_wsgi_app()
application = DispatcherMiddleware(flask_app, {"/metrics": target})
# gunicorn.conf.py
from prometheus_client import multiprocess
def child_exit(server, worker):
multiprocess.mark_process_dead(worker.pid) # clean dead-worker files
Run it:
gunicorn -c gunicorn.conf.py --workers 4 --bind 0.0.0.0:8000 wsgi:application
This per-hook approach replaces what prometheus-flask-exporter would do automatically; building it by hand keeps you in control of the endpoint label, which is the single biggest cardinality risk in a Flask deployment.
Why before/after instead of a decorator. A decorator on each view would miss requests handled by Flask itself — 404s, 405s, error handlers, static files — and would force you to remember to wrap every new route. The before_request/after_request pair runs for the whole request lifecycle regardless of which view fires, so coverage is complete by construction and new routes are instrumented automatically. The one gap is exceptions that escape the view: after_request does not run when an unhandled exception propagates, so pair it with teardown_request if you must count failures that bypass the normal response path.
@app.teardown_request
def _record_failure(exc):
if exc is not None: # an exception escaped the view
endpoint = request.url_rule.rule if request.url_rule else "<unmatched>"
REQUEST_COUNT.labels(request.method, endpoint, 500).inc()
Configuration Options
| Option | Where | Purpose | Notes |
|---|---|---|---|
PROMETHEUS_MULTIPROC_DIR |
env | Per-worker metric files | Required for gunicorn aggregation; clear on boot |
request.url_rule.rule |
hook | endpoint label value |
Route pattern, not raw path; bounds cardinality |
make_wsgi_app(registry) |
mount | Renders exposition | Mount via DispatcherMiddleware on /metrics |
Histogram(buckets=...) |
instrument | Latency buckets in seconds | Tune to your SLO thresholds |
child_exit hook |
gunicorn | Cleans dead-worker files | Calls mark_process_dead(worker.pid) |
multiprocess_mode |
Gauge only | Cross-worker reduction | Use livesum for in-flight gauges |
Verification
After a few requests to /users/1 and /users/2, scrape the endpoint and confirm the counter is labeled by route pattern, not concrete ID, and the histogram exposes bucket/sum/count series.
curl -s localhost:8000/metrics | grep flask_http
Expected Output:
# HELP flask_http_requests_total Total Flask HTTP requests
# TYPE flask_http_requests_total counter
flask_http_requests_total{endpoint="/users/<int:user_id>",method="GET",status="200"} 7.0
# HELP flask_http_request_duration_seconds Flask request latency in seconds
# TYPE flask_http_request_duration_seconds histogram
flask_http_request_duration_seconds_bucket{endpoint="/users/<int:user_id>",le="0.005",method="GET"} 5.0
flask_http_request_duration_seconds_bucket{endpoint="/users/<int:user_id>",le="0.05",method="GET"} 7.0
flask_http_request_duration_seconds_bucket{endpoint="/users/<int:user_id>",le="+Inf",method="GET"} 7.0
flask_http_request_duration_seconds_sum{endpoint="/users/<int:user_id>",method="GET"} 0.041
flask_http_request_duration_seconds_count{endpoint="/users/<int:user_id>",method="GET"} 7.0
The single endpoint="/users/<int:user_id>" series across both IDs is the signal that cardinality is controlled. If you instead saw /users/1 and /users/2 as separate series, the hook is labeling by request.path and must be fixed.
Common Mistakes
Hook crashes on 404 requests
Error signature: AttributeError: 'NoneType' object has no attribute 'rule' whenever a scanner hits an unknown path.
Root cause: unmatched requests have request.url_rule is None, so reading .rule raises.
Remediation: guard with request.url_rule.rule if request.url_rule else "<unmatched>", which both fixes the crash and collapses all unmatched traffic into a single low-cardinality series.
Metrics differ on every scrape under gunicorn
Error signature: totals fluctuate and never accumulate; numbers look far too small for the real request rate.
Root cause: prefork workers each own a private registry and the scrape lands on one of them.
Remediation: set PROMETHEUS_MULTIPROC_DIR, render the scrape through MultiProcessCollector on a fresh CollectorRegistry, and add the child_exit hook. Detailed wiring lives in the Prometheus client instrumentation guide.
Frequently Asked Questions
Do I need prometheus-flask-exporter to instrument Flask?
No. The official prometheus-client gives you everything: define a Counter and Histogram, record them in before_request and after_request hooks, and mount make_wsgi_app on a route. The exporter is a convenience wrapper, not a requirement.
Where do I get the route label without exploding cardinality?
Use request.url_rule.rule, which is the route pattern such as /users/
Why are my Flask metrics inconsistent across gunicorn workers?
Each prefork worker keeps its own registry, so a scrape sees one worker. Set PROMETHEUS_MULTIPROC_DIR and render the scrape through a MultiProcessCollector, and call mark_process_dead in the gunicorn child_exit hook.