Replay a Production Incident

Verify your fix against the exact requests that caused the incident — before you deploy.

Scenario: you've fixed the Stripe timeout bug. You want to confirm the incident is resolved without deploying to production blind.

Step 1 — Identify the incident time window

From flux tail, your alert, or your monitoring tool, find the start and end time of the incident:

$ flux tail --filter status=500 --since 2h

  POST /signup  500  44ms  req:550e8400  14:22:01
  POST /signup  500  51ms  req:7a8b9c0d  14:22:44
  POST /signup  500  38ms  req:1b2c3d4e  14:23:10

  # Incident window: 14:00 → 14:30

Step 2 — Replay that window against your current code

$ flux incident replay 14:00..14:30

  Replaying 47 requests from 14:00–14:30…

  Side-effects: hooks off · events off · cron off
  Database writes: on · mutation log: on

  ✔  req:4f9a3b2c  POST /create_user   200  81ms
  ✔  req:a3c91ef0  GET  /list_users    200  12ms
  ✔  req:550e8400  POST /signup        200  88ms  ← was 500
  ✔  req:7a8b9c0d  POST /signup        200  91ms  ← was 500

  47 replayed · 47 passing · 0 still failing  ✔ incident resolved

All passing — safe to deploy.

What is safe during replay

Side-effect typeDuring replayNotes
Outbound HTTP (webhooks, Stripe, Slack)StubbedReturns the recorded response from the original trace
Email / SMS sendingDisabledSend calls return { status: "stubbed" }
Cron / scheduled jobsDisabledNo jobs are dispatched during replay
Async job enqueueRecorded, not dispatchedJobs appear in mutation log but don't run
Database readsLiveReads your current database state
Database writesEnabledMutations are recorded in the log for comparison

Important: because database writes are enabled, replaying against a development database is recommended if your production schema is sensitive. Replay against production is safe for read-heavy workloads but will produce real DB mutations.

Replay a single request

If you only want to replay one specific request:

$ flux incident replay --request 550e8400

  Replaying req:550e8400…
  ✔  POST /signup  200  88ms  ← was 500

How determinism works

Replay uses the request_input from the original execution record (HTTP body, headers, auth context) as the input to the re-execution. Outbound API calls return the same recorded responses from the original trace, so your function sees identical data at every step — even if the external service would return different data today.

See the production guide for a detailed breakdown of the determinism guarantees.


← Debug a Production Incident  ·  Inspect Database Mutations →