Published: 2026-05-21

The Prompt Caching Habit That Doubles Your Claude Code Session Limit

Nate Herk breaks down how Claude Code's prompt caching system works — cached tokens cost only 10% of normal input, but the cache expires after 1 hour of inactivity on a subscription plan. Three habits cover 95% of users: avoid pausing sessions longer than an hour, start fresh when switching tasks using /compact or /clear, and store large documents in Projects rather than pasting them into chat. A free session handoff skill bridges the gap by summarizing work-in-progress before clearing, so nothing is lost.

Source video

"The One Habit That Doubles Your Claude Code Session Limit" by Nate HerkWatch on YouTube →

Key Takeaways

  • Cached tokens cost 10% of normal input tokens — a 91M-token cached session effectively bills as only ~9M tokens processed.
  • Claude Code subscription cache TTL is 1 hour; raw API access defaults to 5 minutes (extendable at extra cost); sub-agents always get 5 minutes regardless of plan.
  • Switching models mid-session (e.g., via /model) breaks the cache entirely — every prior token gets re-read as fresh input on the next turn.
  • "model opus plan" mode (Opus for planning, Sonnet for execution) also breaks cache because the model prefix changes — keep one model per session.
  • Session handoff skill: ask Claude to summarize the session, copy the output, run /clear, paste the summary into a new session — you pick up exactly where you left off without context loss.

Commands & Code Mentioned

/compact
/clear
/model
/session-handoff
cache_control: { type: "ephemeral" }  (API prompt caching marker)