measured, not mocked

66-task A/B · 2 models
MiniMax-M3 · M2.5

An AI agent's context, on a diet.
It lost 71% of its weight —
and kept its mind.

Your agent re-sends the same file dump five times and drags stale tool output through every turn. tokdiet is a local proxy that puts that context on a diet — and then proves the model didn't get dumber.

↓ watch one request lose the weight

Watch one request
lose the weight.

One real agent request. Same file pasted 5×, stale logs, duplicate configs. Press run — nothing is deleted, it's collapsed or paged out, recoverably.

CONTEXT · before diet

Input tokens, this request

The meter is real arithmetic on the blocks at left — not a canned number.

0 tok

weightfull context

kept / saved waste removed

The scary question, answered.

"If I cut the context, does the model get dumber?" So we measured it — twice, on two models.

Primary model · 198 paired runs · ×3 majority-voted
metric	baseline	tokdiet	Δ

Reproduce it yourself: node bench/run.mjs · needs an API key in env.

Not "delete and pray."
Context as virtual memory.

Hover the pipeline. Hot context stays resident; cold context is paged out to SQLite — recoverable, by id.

● resident — hot

system prompt current question latest file read pinned answer on-topic results

◇ paged out — cold (recoverable)

stale test logs old git output dup file copies ×4 repeated paste ×1

the quality brake

Savings stop before quality does.

Shadow-eval

Re-runs a sampled 5% of compacted requests against the un-compacted baseline and scores the divergence, 0 (identical) to 100. The measurement, not a guess.

Quality budget

A hard ceiling on measured degradation — 2% by default. As it nears the line, the compactor restricts itself to its safest strategies.

Safe-mode

Cross the budget and the offending strategy is switched off, per-strategy. The proxy would rather spend your tokens than spend your quality.

measured degradationbudget 2.0%

● safe-mode armed · 0.4% observed

Where it's worth it.
Where it isn't.

A token diet only cuts a bill that's priced per token. We'd rather say so up front.

real dollar cut

Pay-per-token keys

MiniMax, the Anthropic API, OpenAI — fewer input tokens is a smaller invoice. The −71% lands on your bill.

metering, not money

Flat subscriptions

On a flat Claude plan there are no per-token charges to cut. The value is the live dashboard, budgets, and seeing exactly where tokens go.

→ keys forwarded only, never stored or logged → loopback bind only → fail-open: a hiccup never breaks your request → cache- & thinking-safe for Claude Code

Sixty seconds.

No account. No telemetry. The proxy + a live dashboard on :7878.

Start the proxy — nothing to install.

copy# proxy on :7787, dashboard on :7878 (loopback only)
$ npx tokdiet start

Point your agent at it instead of the real API.

copy$ export ANTHROPIC_BASE_URL=http://localhost:7787
$ export OPENAI_BASE_URL=http://localhost:7787/v1

Run Claude Code, Cursor, Codex — as usual. Open localhost:7878 and watch it tick.

AnthropicOpenAIGeminiMiniMax+ any compatible API