Your agent re-sends the same file dump five times and drags stale tool output through every turn. tokdiet is a local proxy that puts that context on a diet — and then proves the model didn't get dumber.
One real agent request. Same file pasted 5×, stale logs, duplicate configs. Press run — nothing is deleted, it's collapsed or paged out, recoverably.
The meter is real arithmetic on the blocks at left — not a canned number.
"If I cut the context, does the model get dumber?" So we measured it — twice, on two models.
| metric | baseline | tokdiet | Δ |
|---|
Reproduce it yourself: node bench/run.mjs · needs an API key in env.
Hover the pipeline. Hot context stays resident; cold context is paged out to SQLite — recoverable, by id.
Savings stop before quality does.
Re-runs a sampled 5% of compacted requests against the un-compacted baseline and scores the divergence, 0 (identical) to 100. The measurement, not a guess.
A hard ceiling on measured degradation — 2% by default. As it nears the line, the compactor restricts itself to its safest strategies.
Cross the budget and the offending strategy is switched off, per-strategy. The proxy would rather spend your tokens than spend your quality.
A token diet only cuts a bill that's priced per token. We'd rather say so up front.
MiniMax, the Anthropic API, OpenAI — fewer input tokens is a smaller invoice. The −71% lands on your bill.
On a flat Claude plan there are no per-token charges to cut. The value is the live dashboard, budgets, and seeing exactly where tokens go.
No account. No telemetry. The proxy + a live dashboard on :7878.
copy# proxy on :7787, dashboard on :7878 (loopback only) $ npx tokdiet start
copy$ export ANTHROPIC_BASE_URL=http://localhost:7787 $ export OPENAI_BASE_URL=http://localhost:7787/v1