The Prompt Caching Habit That Doubles Your Claude Code Session Limit
Nate Herk breaks down how Claude Code's prompt caching system works — cached tokens cost only 10% of normal input, but the cache expires after 1 hour of inactivity on a subscription plan. Three habits cover 95% of users: avoid pausing sessions longer than an hour, start fresh when switching tasks using /compact or /clear, and store large documents in Projects rather than pasting them into chat. A free session handoff skill bridges the gap by summarizing work-in-progress before clearing, so nothing is lost.
"The One Habit That Doubles Your Claude Code Session Limit" by Nate Herk — Watch on YouTube →
Key Takeaways
- Cached tokens cost 10% of normal input tokens — a 91M-token cached session effectively bills as only ~9M tokens processed.
- Claude Code subscription cache TTL is 1 hour; raw API access defaults to 5 minutes (extendable at extra cost); sub-agents always get 5 minutes regardless of plan.
- Switching models mid-session (e.g., via
/model) breaks the cache entirely — every prior token gets re-read as fresh input on the next turn. - "model opus plan" mode (Opus for planning, Sonnet for execution) also breaks cache because the model prefix changes — keep one model per session.
- Session handoff skill: ask Claude to summarize the session, copy the output, run
/clear, paste the summary into a new session — you pick up exactly where you left off without context loss.
Commands & Code Mentioned
/compact
/clear
/model
/session-handoff
cache_control: { type: "ephemeral" } (API prompt caching marker)