Intelligence Vault SOP

API Cost Optimization Protocol

Reducing ZED Claude API Cost by 70%+ — Tactical Breakdown

The Core Problem

Tokens IN
150,104
Tokens OUT
1,194
Ratio
125:1
Cost
$1.18

A 125:1 input/output ratio means you are paying almost entirely for context, not for answers.

The 5 Execution Levers

Lever 1: Switch Model (65% Savings)

Sonnet 4.6 Thinking is the most expensive configuration. Switch to Haiku 4.5 for routine coding tasks (73% cheaper on input).

Claude Sonnet 4.6 (Thinking)Avoid ($3 / MTok)
Claude Haiku 4.5Use This ($0.80 / MTok)

Lever 2: Stop Using 1M Context

You have zero use for 1M context on files that are 18–50KB. Switch to the standard context window variant to avoid premiums.

Lever 5: Mid-Session Switching

In Zed, switch models per message. Start complex planning with Sonnet, switch to Haiku for generation, revert to Sonnet if stuck.

Lever 3: Reduce Context Re-Sending

Giant context blocks (15-25k tokens) attached at session start resend on every message causing massive overhead.

  • Use `CLAUDE.md` files instead of attaching entire session thread summaries. They are dense, structured, and small.
  • Start fresh sessions when switching between major tasks to flush accumulated large file reading context.

Lever 4: Right Tool For Task

Reading files, small edits
Haiku 4.5
Writing HTML/JS extensions
Haiku 4.5
Writing session logs / docs
Haiku 4.5
Architecture decisions
Sonnet 4.5
Complex multi-file planning
Sonnet 4.5

Projected Impact

~$15–25 / mo
~$3–6 / mo

A massive 70–80% reduction in operating costs without compromising on output quality or execution speed.