Master Claude Code's Context Window: Avoid the Performance Cliff
A 1M-token context window sounds enormous, but the quality of the model’s output degrades long before you reach the limit.
At 60% capacity, the model is repeating itself and forgetting earlier constraints. Instructions are skipped. You find yourself re-explaining decisions you thought were settled.
Managing context keeps your reasoning sharp and your sessions productive. Here’s how to approach it.
Where Tokens Go
Some of your 1M window is used up before you type your first message:
- System overhead: ~33K tokens (Claude Code internals)
- MCP tool definitions: Each connected server (Playwright, database client) loads its tool definitions on every request
- CLAUDE.md: Project instructions are included on every message
- Memory files: Your persistent notes and summaries
- Conversation history: Each message, including code blocks and outputs, adds up
- Code context: Reading the codebase consumes tokens based on file size
In a 4-hour session, it’s easy to use 40-50% of your window without consciously spending tokens on anything expensive. The overhead is real but hard to track.
Monitor Context With /context
Before you can optimize, you need to see what’s happening.
Type /context in Claude Code to get a breakdown (see the /context command docs):
system overhead: 33KMCP servers: 12K (Playwright, postgres-client)CLAUDE.md: 18Kmemory files: 8Kconversation history: 142Kfree space: 787Ktotal usage: 213K (21% of 1M window)Run this every 30 minutes during a long session. When you hit 600K tokens used (60% capacity), that’s your signal to intervene.
The /context command also shows actionable optimizations:
- Disconnect idle MCP servers: If you’re not using Playwright in this session, run
/mcpand disconnect it. Each idle server wastes tokens on tool definitions you’re not calling - Trim CLAUDE.md: If your project instructions are eating 20K+ tokens, move task-specific rules into skills. Skills load on-demand only when invoked
- Clear stale conversation: If you switched from refactoring
auth.tsto building a new API endpoint, the old context is noise. Run/clearto start fresh
Three Checkpoints
Under 40%
Plenty of room. Read freely, ask for multiple approaches, explore directions. Early system prompts and CLAUDE.md still carry full weight.
40-60%
You’re in range, but degradation is starting. Use /context every 30 minutes. When you hit 50%, run /compact to compress conversation history. This keeps context continuity while freeing up 30-50K tokens.
60%+
Output quality noticeably drops. Repeated instructions get skipped. At this point, commit your work, document what you’ve learned in CLAUDE.md or git, then run /clear to start fresh. You lose conversation history but regain a clean, fast window.
Reduce Baseline Context With .claudeignore
Before you optimize token spending, reduce the baseline burn.
Create a .claudeignore file in your project root and list directories you don’t want Claude reading:
node_modules/dist/build/.git/.next/venv/__pycache__/*.log*.tmpcoverage/.env.env.localThe .claudeignore file works like .gitignore—it tells Claude Code to exclude matching files from search results and deny read operations. A well-configured ignore file can cut per-request context by 40-70% depending on what you exclude.
Most codebases have a lot of files that Claude shouldn’t touch: test fixtures, compiled output, vendored dependencies, generated code, caches. Excluding them saves tokens on every message.
Start with common excludes and add exceptions only if you hit cases where Claude legitimately needs something.
Keep CLAUDE.md Focused
Your project instructions are loaded on every message. Keep them essential.
Target: 300-500 lines. Include only:
- Conventions: Language, framework, naming, structure
- Constraints: What breaks the system, permission boundaries
- Links: Where to find key docs (not the docs themselves)
Move specialized knowledge into skills. Skills load on-demand when invoked, then unload—this keeps your baseline context small while keeping expert knowledge available when needed.
/compact vs /clear
/compact summarizes conversation history to free space while maintaining context continuity. Use it when you’re working on the same task for hours and approaching 50% capacity.
/clear wipes everything and starts fresh. Use it when switching to unrelated work. You lose context from the previous task, so document it first (git commit, update CLAUDE.md, etc.).
Quick Wins
Start with these three—they’re high-impact and low-friction:
- Configure
.claudeignore— Exclude build artifacts, node_modules, .git, test fixtures. See .claudeignore patterns for examples. Saves a significant amount per message. - Trim CLAUDE.md — Keep it to 300-500 lines of essentials only. Move specialized knowledge to skills.
- Disconnect idle MCP servers — If you’re not using Playwright in a session, run
/mcpto disconnect. Each idle server wastes tokens on tool definitions you’re not calling.
Example: A Refactoring Session
You’re refactoring an authentication module. Your context fills like this:
- Early phase (20-30% capacity): Load the module, understand the architecture, plan the refactoring. Plenty of room.
- Mid-phase (40-50% capacity): Running
/contextshows idle MCP servers. Disconnect them. Conversation history is growing. - Late phase (50-60% capacity): Refactoring is mostly done. Run
/compactto free space. Then you need to start a different feature. - Exit strategy: Commit your work, document decisions in git or CLAUDE.md, run
/clear, start fresh on the new feature.
Without this pattern, you’d be at 70%+ capacity by hour 3, with Claude forgetting earlier architectural decisions and getting confused on why tests are failing.
A 1M token window feels infinite until it isn’t. By monitoring usage, cutting baseline waste, and knowing when to compact or clear, you keep your sessions sharp. The threshold is real: at 60% capacity, output quality noticeably drops. Stay below that, and you can work productively for hours without Claude forgetting your earlier constraints.
Frequently Asked Questions
At what point does Claude Code performance start degrading?
Output quality degrades noticeably around 60% context capacity. At 40% fill, attention to earlier instructions weakens. Proactively compact or clear context at 60% to maintain peak performance.
How do I check how many tokens I'm using?
Type /context in Claude Code to see a detailed breakdown of token usage across all components—system overhead, MCP tools, memory files, conversation history, and free space.
What's the fastest way to reduce token usage?
Configure a .claudeignore file. A well-configured ignore file can reduce per-request context by 40-70% without changing your workflow—it's the single highest-impact optimization.
Should I keep my CLAUDE.md short or comprehensive?
Keep CLAUDE.md under 300-500 lines with only essential instructions. Move specialized or task-specific rules into skills—skills load on-demand only when invoked, preserving baseline context.
When should I use /clear vs. /compact?
/compact summarizes earlier conversation to free space while keeping context. /clear starts completely fresh. Use /compact for ongoing work; use /clear when switching to unrelated tasks.
Ready to ship faster?
Download spacecake and start building with Claude Code.
Get Started