Why Flux Exists
The problem with production debugging today — and the idea that fixes it.
The problem
A request fails in production. You get an alert. Now what?
With a traditional backend you open five tools in five tabs:
| Tool | What you find | What's missing |
|---|---|---|
| Log aggregator | A stack trace with a timestamp | No request body, no DB state |
| Metrics dashboard | A spike in p99 latency | No idea which request caused it |
| Trace UI | Spans — if you remembered to instrument | Requires SDK setup, no DB mutations |
| DB console | Current row state | No history of what changed it |
| Queue dashboard | A failed job | No link back to the HTTP request that enqueued it |
Each tool holds a fragment of evidence. None of them share a common identifier. You spend more time correlating evidence than understanding the bug.
This is the normal production debugging experience in 2026. It is slow, frustrating, and completely avoidable.
The insight
Every production bug is caused by a request that did something unexpected.
If you had a complete record of exactly what that request did — every span, every database write, every input and output — you could root-cause any bug in seconds instead of hours.
That record already exists in your system. It is scattered. Flux assembles it.
The solution: execution history
Flux runs your backend inside a runtime that records every request automatically — no instrumentation, no SDK, no configuration. The result is an execution record: a single object that contains everything that request did.
Request: POST /signup req:550e8400
INPUT { email: "a@b.com" }
SPANS gateway → create_user → stripe.charge → db.insert
MUTATIONS users.plan: free → null (rolled back)
ERROR Stripe API timeout at payments/create.ts:42
RESPONSE 500 44ms
With that record, debugging collapses into one command:
$ flux why 550e8400
ROOT CAUSE Stripe API timeout after 10s
LOCATION payments/create.ts:42
DATA CHANGES users.id=42 plan: free → null (rolled back)
SUGGESTION → Add 5s timeout + idempotency key retry
And if you want to verify your fix before deploying, you can replay the exact requests that caused the incident against your current code:
$ flux incident replay 14:00..14:05
23 replayed · 22 passing · 1 still failing ← fix not complete yet
The analogy
Git records the history of your code. Every change is stored, attributable, diffable, and revertable.
Flux records the history of your code executing. Every request is stored, attributable, diffable, and replayable.
Git → what your code looked like
Flux → what your code did
These are complementary. Git tells you what changed. Flux tells you what happened when it ran.
What this enables
| Capability | Command | What was previously required |
|---|---|---|
| Root-cause any failure | flux why <id> | Grep logs, correlate trace IDs manually |
| See every DB change a request made | flux state history | DB console + guesswork |
| Replay an incident safely | flux incident replay | Not possible — side-effects make it unsafe |
| Find the breaking commit | flux bug bisect | Manual git bisect + redeploy loop |
| Compare two executions | flux trace diff | Not possible across tools |
Open source and self-hosted
Flux is open source (MIT). The runtime — gateway, runtime engine, data engine, queue — ships as a single binary. Run it on your own infrastructure with one Postgres database as the only dependency.
$ flux init my-app && cd my-app
$ flux dev # starts Flux + embedded Postgres on localhost:4000
No cloud account. No API keys. No telemetry. Your data stays on your server.
Why a runtime, not an SDK?
The first question developers ask: "Can't I just add a library to my Express app?"
No. Here's why:
| Capability | Bolt-on SDK | Flux runtime |
|---|---|---|
| Trace spans | ✔ (manual instrumentation) | ✔ (automatic) |
| Record DB mutations in the same transaction | ✗ Sits outside the database driver | ✔ Data engine wraps the query |
| Replay production traffic with side-effects disabled | ✗ Can't intercept outbound calls | ✔ Runtime controls all I/O |
| Bisect across git history | ✗ Can't re-deploy and re-execute per commit | ✔ Runtime manages deploys |
| Link every span to exact code version | ✗ No deploy awareness | ✔ Knows the code SHA for every execution |
An SDK can tell you that something failed. A runtime can tell you exactly what happened and let you replay it.
The tradeoff is explicit: your code runs inside Flux. But migration is wrapping, not rewriting — req.body becomes input, res.json() becomes return, db.query() becomes ctx.db.query(). Start with one endpoint. Run Flux alongside your existing stack. Migrate at your own pace.
What Flux is not
- Not a log aggregator. Logs are unstructured. Execution records are structured around request IDs and span trees.
- Not an APM / metrics platform. APMs tell you something is wrong. Flux tells you exactly what happened and lets you replay it.
- Not a distributed tracing system. Tracing requires SDK instrumentation. Flux records automatically at the runtime level.
- Not a serverless platform. Flux runs your functions, but it's not a cloud compute layer. It's a self-hosted runtime where the primary value is execution history and debugging — not autoscaling to zero.