Managing context

June 13, 2026 AI

A terminal Claude Code session showing the output of the /session command

This is the third post in a series on the AI engineering toolkit. The first one moved you into the terminal, and the second taught the agent your project’s conventions. Both of those write something down for the agent to read. This post is about the place it does the reading: the context window, which is the model’s working memory for a single turn and the one resource you spend every time you say anything to it.

The model has no memory of its own, so the agent rebuilds everything it knows from scratch every turn, and the window that holds all of it is finite. Fill it carelessly and the answers degrade.

Long sessions go dumb

You have probably had the session that starts brilliant and ends senile. The first hour it reads your code and fixes the failing test. Three hours in it reintroduces a bug you watched it fix and asks again for a file it has already read twice. The model has not changed and neither has the project.

What changed is how much the agent is dragging around. Early on the window was nearly empty. By hour three it is full, and a full window causes trouble of its own even when the model behind it is fine. To see why, look at what the agent actually sends the model on each turn.

Rebuilt every turn

The model remembers nothing between turns. Each time you press enter, the agent assembles a fresh package and sends the whole thing: the system prompt, your AGENTS.md, the conversation so far, and the files and command output the agent has pulled in along the way. Your new one-line question comes at the end of all of it. The model answers, forgets everything, and the next turn rebuilds the package from scratch with your latest message added to the pile.

This is why the instruction file from the last post is not free. Every line of it is re-sent on every turn, sharing space with the actual conversation. It also explains the resume feature both earlier posts mention. Resuming is the agent loading an earlier package back up, cruft and all, which is handy right up until that baggage is what slows you down.

How much fits

So how big is this package allowed to get? It is measured in tokens, the model’s unit of text, each one worth roughly three quarters of a word. The ceiling is the context window, and as of June 2026 it is large. Anthropic’s Claude models run up to a million tokens on the Opus and Sonnet tiers, with the lighter Haiku at 200,000. OpenAI’s flagship GPT-5.5 runs just past a million, while its coding-tuned Codex line is smaller, with GPT-5.2-Codex at 400,000. OpenCode inherits whatever the model you connected allows, since it ships with no model of its own.

A million tokens sounds like all the room you could want. Treat it as a budget anyway. The biggest line items are rarely your own words. They are the things the agent fetches on your behalf, a fat grep across the whole repo or a thousand lines of test output from one failed run. Those swamp the window fast. A crowded window is where the agent starts making those careless mistakes again.

Why dumping everything backfires

Overfilling the window costs you more than tokens. It also leaves the model worse at using what is already in there, and two findings explain why.

Lost in the middle came first. A 2023 study by Liu et al. found that models recall information best at the very start or end of the input and worst when it is buried in the middle, even the ones built for long contexts. Drop the fact that matters into the middle of a huge paste and the model may miss it.

Chroma’s 2025 context rot report is blunter still. It ran eighteen models, the frontier ones included, and watched accuracy slide as the input grew, long before the window was anywhere near full. The slide was uneven and depended on the task, but every model showed it. More tokens quietly cost you accuracy.

So a long paste hurts twice. The fact you need gets buried in that weak middle, and the length alone drags the answer quality down a notch. It’s better to stop treating the window as a dumping ground and get choosier about what goes in.

Reading the gauge

The window is invisible by default, which is half the reason it sneaks up on you. Every one of these tools will show it to you, in its own way. OpenCode keeps a context percentage on screen the whole session, so you watch the number climb as you work. Claude Code’s /context prints a breakdown of what is eating the window, with /usage for the running token total. Codex folds the same figure into /status.

Glance at it the way you glance at a fuel gauge. When one fat grep or a verbose test run jumps the number, that is your cue that the next several turns will haul that weight around whether they need it or not.

Let the agent fetch

The agent already reads files for you. You saw it in the first post, where it opened the failing test and traced it back to the function on its own, with nothing pasted in. So name the file, or the function inside it, and let the agent pull in only what it needs. Naming a path costs a handful of tokens. Pasting the whole file in spends every line it holds, including the ones the agent would have skipped.

Picture it in tokens. Asking the agent to read src/auth/session.py and walk you through the token-refresh path pulls in that one file. Pasting the module yourself, plus the two neighbouring files you grabbed in case they mattered, can burn several thousand tokens before the agent has lifted a finger, most of it on files that turn out to be beside the point. The agent judges which lines it needs better than you do, because it can go and look first.

All three give you a shorthand for it. Type @ in the prompt and a fuzzy file finder pops up, so @src/auth/session.py drops that path into your message and the agent reads it when it runs. OpenCode, and Claude Code through its editor extension, let you go finer and name a line range, as in @session.py#40-60, so you hand over the refresh function and not the three hundred lines around it. Codex keeps its @ to whole files.

The same habit reaches past your repo. Point the agent at a URL and it can fetch the page for you, which saves pasting a wall of documentation out of a browser tab. Claude Code and OpenCode both read a link you give them, Claude through a WebFetch that asks the first time it reaches a new domain and OpenCode through a built-in webfetch tool. Codex is the exception, and it follows from the sandbox the first post described: network access is off by default, so a pasted link goes nowhere until you enable it, and even its web search serves a cached index until you ask for live results with --search.

Pasting still wins in a narrow set of cases. A snippet from a repo the agent cannot open, or a stack trace from a run it did not do itself, has to be handed over because the agent has no way to go and find it. When what you want it to see lives somewhere it can reach, point at it. Otherwise, paste. The default, though, should be pointing, because that is the version that keeps the window lean.

Clearing and compacting

Eventually a session fills up no matter how disciplined you are. You have two ways to take the window back, and they do different jobs.

Clearing wipes it. You drop the whole conversation and start clean, keeping your instruction file and nothing else. It is the right move at a task boundary, when the next thing you do has nothing to do with the last. Stale context from a finished job is pure cost on the new one, paid every turn.

Compacting is the gentler option. The agent writes itself a short summary of everything so far and carries on from that, so the decisions survive while the bulk goes. Reach for it mid-task, when you are deep in something, do not want to lose the thread, and the window is starting to crowd.

All three give you both. Claude Code and Codex use the obvious names, /clear and /compact, and Claude lets you aim the summary with a focus, like /compact focus on the test output (Claude, Codex). OpenCode calls the fresh start /new and the summary /compact, with /clear and /summarize as aliases and a /sessions list for hopping back to an earlier thread (OpenCode).

All three will also compact for you. As the conversation nears the window’s edge each one auto-compacts, summarising in the background so a long task does not just slam into a wall. OpenCode lets you turn its auto-compaction off in config if you would rather drive by hand. The automatic save is convenient and occasionally infuriating: the agent decides on its own which half of the conversation you can live without, and every so often it bins the one line you needed. That is the reason to know the manual controls even when the automatic one usually has you covered.

A note on caching

Keeping the front of your context stable (the system prompt and instruction file that lead every turn) pays off in money and speed, even though the machinery mostly runs out of sight. Models can cache that stable prefix, so when the front arrives unchanged it gets served from a cache instead of reprocessed, and a cached read is far cheaper. Anthropic charges about a tenth of the normal input price for cached tokens, per its prompt-caching docs, and OpenAI caches automatically for any prompt past about a thousand tokens.

Prefix is the operative word. The cache builds from the front and stops at the first thing that changed, so editing or reordering anything near the start forces a full reprocess of everything after it. You never touch this directly, the agent does, so the only thing to remember is to avoid reshuffling the front of your context for no reason. It is the same point as last post’s, that a stable instruction file is a cheaper one to keep. The full economics of caching are a later post.

Resume without the cruft

Resuming a session is the lever people misuse most often. Picking yesterday’s thread back up brings its full window along with it, summary or not. Sometimes that is exactly what you want. Often the cleaner move is to start fresh and let your instruction file carry the project knowledge across, because that is the part worth keeping and the rest was scaffolding for a task you already finished. Clearing, compacting, and resuming are three answers to one question, which is what to carry forward, and the right choice depends on whether your next task builds on the one you just finished.

Conclusions

Treat the window as working memory you actively tend. Point the agent at files and symbols instead of pasting them, so the signal does not drown in a wall of text. Clear at task boundaries and compact when a long task starts to crowd. Keep the front of your context stable so the cache keeps paying out, and let the automatic compaction catch what you miss, knowing the manual controls for the times it summarises away something you needed.

Do this and the agent holds up through a long working session, and you stop paying full freight for context you never use. This was only the mechanics, what the window is and which lever does what. Knowing what to pull into the window and what to leave out of it is a discipline of its own, and it gets a full post further along in the series.

All of this leaned on the agent being able to go and fetch what it needs, reading a file the moment you name it. That habit is worth a closer look on its own. The tools the agent already ships with, before you bolt on a single one, are the next post.