Plan.md

Ode to the software engineer

The plan is what separates spec-driven development from vibe coding. It is the most important lever for the long-term health of your codebase, and the place where the role of a modern AI engineer actually shows up: at the intersection of architecture, foresight, and decision making.

If spec.md captures what the product needs, plan.md captures the engineering judgment behind how it gets built.

If you outsource all of your technical decision making to the AI, you are operating as a technician - executing known solutions. Engineers reason from first principles to choose which trade-offs are worth making. The plan stage is where you earn the title.

If engineers are so valuable, why do we use AI generated code at all?

punch cards -> hexedit -> assembly -> compilers -> preprocessors -> garbage collection -> standard libraries -> linters -> The latest malware flavor on npm -> AI tools

Right? These are all just tools. They enhance our productivity. We get to focus on different things, not “should this variable be in a register or on the stack”. AI tooling feels very seductive because it can (kind of) do a lot of things. Kind of..

AI is finally good at writing code. It is poor at making trade-offs. Fix the bug - throw in some global state. Create the world’s narrowest unit test to satisfy the requirements. The small picture is where AI looks competent; the big picture is where you decide whether the codebase you’re left with in six months is one you’d want to work in.

The Developer’s intuition

In the before-times, engineering decisions largely happened out-of-sight. A lot of the time, they happened through writing code. Does this sound familiar? Hmm, I need to implement some widget. Let me scaffold the code. Oh, I need this data source from over here. I can connect it via the shared bus. But wait - the data flow won’t work properly when updates come from the widget back to the bus.

Used well, AI can help speed up this prototyping and uncovering phase. It doesn’t need to make decisions, it needs to uncover what the challenges are, and what potential solutions might exist.

Different factors will guide what types of decisions you’re looking to make. How important is the feature, or the codebase for that matter? Do you have pre-established patterns that you are trying to follow, or is this a new type of behavior? What’s already contributing to complexity or brittleness? Is this change an opportunity to address those, or will time pressure make them worse?

These factors combine into what I’ll call the Developer’s intuition. It draws from your overall experience, your knowledge of prior features, how the company works, and the obstacles you’ve already identified, and gives you guidance for how to proceed.

You tap into your developer’s intuition every time you touch a keyboard (or a dry-erase marker, or a hallway conversation), regardless of whether you’re using AI, spec-driven development, or any other tool.

Your job is to capture the essence of this intuition for your AI tools in this odd, AI-friendly spec format. This frame of reference is really important, because the /plan stage produces large amounts of text. Without this framing, you’ll focus on the wrong parts, and it will feel like a mountain of pseudo-code to wade through.

What’s in the plan, anyways?

The purpose of the planning stage is to narrow the scope of possibilities for how to accomplish the goals set out by spec.md. By focusing on the highest leverage decision points, a plan author (with their AI doc assistant) can most directly influence the direction and shape of the implementation.

Some specific points of high leverage are built into speckit’s template as a starting point: the data model, research, and API contract. We’ll look into these parts specifically, in order to determine why they are high leverage, and how to set a direction that is both precise and terse at the same time.

Once the high leverage direction has been established, the plan may include more implementation details needed to ensure outcomes consistent with the goals. These secondary, detailed instructions are usually AI generated, which means you can review them as expressions of intent rather than as a final code review.

Finally, plan documents also serve to place additional requirements for a successful solution that are engineering-rooted, drawing from your developer’s intuition for what criteria are important. This is a double-edged sword: it’s now easier than ever to add helper functions, abstractions, protocols, and other techniques of data and software modeling. AI generated code lowers the previous barriers around refactoring costs, maintenance costs, and the like. That’s a great thing, but it also removes the natural caution against making wholesale changes without proper justification.

research.md

The research file is a great piece of upward feedback from your AI harness to you. It tells you “These are the things the AI is unclear about”. Although the AI will make a decision about the path forwards, this is the clearest point of oversight, refinement, and prototyping (if need be).

For example, a research plan may call out which dependency to use, or to build out a custom version of some functionality. Not only is the AI often wrong here, but the choices you make with your software stack have a compounding impact. The APIs used will influence your future plans. The reliability and integration will affect how easy updates are, and so forth.

There’s no “right” way to generate the research, but the final output does matter. If the AI made the wrong choice, you not only need to correct it, but you need to make sure the framing of the research reflects the priorities as to why a decision was made. Don’t just say “Go with dependency B” - explain why. Then ask the AI to re-think the decision, along with other related implications. You’ll need to make sure the changes are reflected in all of the outputs, not just the research.md file. If this after-the-fact change is too large, consider doing a git stash and starting over with the guidance and justification provided up-front.

On the other hand, when the research wastes its time on a dead-end, obvious, or unrelated detail, this is an opportunity for quick copy-editing. Context window matters, not just in terms of the number of tokens, but also what attention each part of the context gets. Feel free to be snippy with your AI assistant, tell it that certain things are obvious, and not worthy of research. You can simply mark something as a requirement, or obvious decision point. Keep your robots focused on what matters.

data-model.md

The data model is perhaps the most pragmatic way to improve the quality of AI generated output. It allows for very targeted feedback (create this variable, update this data type, plumb this value over here), without having to create the full cookbook recipe of how to actually wire up and use the data being generated.

In a sense, data-model.md becomes your “check your intern’s output early and often” page. AI will often over-fixate on getting a specific feature working, or the most expedient way to “fix” a bug, without considering the proper system architecture. I have repeatedly told the AI to remove variables, and instead compute them from a single-source-of-truth. While small, these changes add up over time to produce software that remains understandable and maintainable.

Similar to interns and junior developers, incorrect access can also point to structural changes to improve the default pattern / golden path of the codebase. For example, a store abstraction with pure selectors and an orderly “change data, then run effects” lifecycle might be overkill in a vacuum for a software project. However, if it produces standardized patterns that are easy to get right, and difficult to get wrong, it may be worth adding an abstraction layer. This kind of consolidation also has the benefit of introducing single points to pay close attention to (such as new data being added to the store). Of course, the drawbacks of global, connected state, extra abstraction layers and such apply. This again is why we need adults driving the ship :)

So, sometimes you will really care about the data-model, because it’s the physical manifestation of the architecture and patterns you’re trying to build towards. Other times you don’t care as much, as long as the solution chosen is sane. Otherwise, these are your early warning signs that the AI-generated code is low quality.

contracts/api.md

“API” is a bit of a misnomer here. It could be HTTP endpoints, but it could also be function signatures, class definitions etc. The usual aspects of API design come into play here, such as how easy the functions are to call, and how easy they are to compose. It’s a place for you to think beyond just the immediate task being asked for, but to piece together a foundation to easily make future changes. Similar to the data model, the question becomes how much effort is worth spending now vs. refactoring at a later date.

Besides just the ease-of-use, another important part of the API contract is error handling, and other non-optimal codepaths. This is a case where using a broad notion of “API” is helpful. It’s not just about what the caller sees, but how your system increases resilience and observability through reusable patterns. For example, your codebase might use a lot of exceptions. If so, do the exceptions have structure and hierarchy? Where are they supposed to be caught and handled? How does resource cleanup on the error path happen? These may feel like implementation details, but they can actually be really well specified at the contract level. “When the file cannot be read, a FileNotFound error is thrown, the database entry is never committed, and the exception is processed at the MainRouting layer” packs a lot of guidance into the API layer without being prescriptive about code.

How to create a good plan

A plan only starts when we understand what is important. What does your prior experience in the codebase tell you? Where is the project headed? For the technical areas, what parts of the code are already amenable to the changes you want to make, and which parts are more difficult? SDD encourages iterative exploration as well, since code generation is quick, and you can use the output to revise your specifications. Try asking the AI to make a subset of changes to see where it gets stuck. Ask it to analyze the data flow and to try to get from A to B without implementing the rest of the changes. Try to create a sample integration or unit test to verify your behavior.

Develop your intuition.

From there, develop your own set of technical requirements. As someone who is going to have to live with the results (and make further changes later), what are must-haves for you? A user will never see the abstraction layers of the code, but the engineering team will feel them, so these requirements are for you.

Then, pass these requirements into the /plan prompt. /plan The api must focus on being idempotent, because the frontend may have interwoven calls If there’s a lot of guidance, I suggest writing your thoughts down in a temporary .md file, and then you can reference the file in your prompt: /plan Follow all of my requirements in @plan_input.md or similar

If you have remaining questions, you can also ask the AI to follow up with you interactively.

I am not sure as to whether an OO inheritance for these properties is correct, or whether we should use composable helper functions. Ask me probing questions about the requirements and help me come up with the best solution

Afterwards, the AI will spawn some subagents, do the research, and output the plan documents. Now what? I suggest going straight for the research.md file. Look over what decision points were considered, and which choices were made. In the same conversation, revise.

You correctly identified that using the realtime or batch API was a key decision point, but we need to use the batch API because we are going to use kafka in the future to tie the codebase to user events. Add batching support as a top-level technical requirement to the plan, then re-do the research with this new information. Finally, make sure to update all related planning document with changes found after the research completes

If the AI is making up a fake decision point, when there is only one realistic way forward, tell it:

There’s no realistic argument for forking the Redis codebase and standing up our own version, because the maintenance costs are too high. Add a technical requirement that code dependencies must be maintainable to keep up with security updates and other downstream requirements. Then, re-evaluate the research. Are there any other pragmatic solutions to solving the data syncing problem? If not, we don’t need to waste time considering what database solution to use, and we should just mark the path forward using postgres as a technical requirement, without having research about alternatives

Sometimes, the research will miss something that you were worried about. That’s not always a bad thing. In the name of efficiency, we don’t want to put everything into our planning prompt (as that also anchors the LLM into thinking about certain things). So, you can instead followup after the generation with more prompts:

I don’t see any research about whether we should continue to use the requests library given our new requirements. What would be the added cost of switching to an async library, and is it worth the cost of doing that now? You MUST consider this question objectively and not bias towards pleasing me

The sycophancy is real, y’all. The good news is that the weights know this, and various prompting techniques can help counteract this.

Once you are happy with the research, move on to the data-model.md file. This file is the place of modern code-review. Ask yourself:

If I were implementing this myself, would I find this approach reasonable?

The goal is not necessarily to coax the AI to output code identical to what you would normally produce, but to decide as to whether the proposal covers all the important pieces you care about, in a correct pattern. Here’s a quick rubric:

  1. Is there a single source of truth for the data?
  2. Is the data being stored at the right layer?
  3. Do the data changes properly encapsulate the type of information I need the system to hold onto?
  4. Should the code be following existing patterns, or do we need to invent a new mechanism for the task?

Iterate with your AI as needed, making sure that it not only updates the data-model but all other relevant files as it makes changes. Once the overall direction is correct, do a copy-editing pass on the file.

  1. Is it redundant with other documentation (API definitions belong in contracts/api.md)
  2. Does it have full code snippets? Bad. Tell it to revise to just demonstrate the new capabilities
  3. Is the copy otherwise over-specified? We are trying to describe the overall shape, not to dictate every single change that needs to be made in document form

Finally, move on to contracts/api.md. The pattern is very similar to the data-model, except the questions here are about calling and using the data, rather than the data itself. Probe it as if it were a design review meeting. Are edge cases considered? How about errors? Are the abstraction layers correct?

Once you’ve completed your revisions to all of the files, you can finish up with a /speckit.analyze command to uncover any gaps that have formed during plan generation.

One more thing: pair-programming the planning stage works really well. When two engineers co-author a plan, they’re also co-reviewing it as they go, which sidesteps the harder problem of dropping into someone else’s planning doc cold. If you have the option, grab a partner.

Rubric for evaluating planning documents

Code reviewing someone else’s plan has a different feel from authoring one. It’s harder to get into the mindset of parsing all of the document lines, especially without sitting in the space for a while. When it comes to formal review, the trick is to be efficient: look for gaps and tunnel vision, and lean on a stance that gives you something to react to. The rubrics below give each artifact a perspective to read it from, then a set of questions to answer.

research.md

Read it as: upward feedback from your AI assistant about what it was unsure about

Questions to ask:

  • Does each decision point reflect a real fork in the road, or did the AI invent alternatives just to perform consideration?
  • For each chosen path, is the rationale tied to your codebase’s specific constraints, or is it generic best-practice talk?
  • Are there decisions you would have wanted considered that aren’t here?
  • For dependencies and libraries chosen: do you trust the maintenance burden, the licensing, the integration cost?
  • Is anything here prescribing implementation details that belong in contracts/api.md instead?
  • Is the research wasting context on dead-ends, obvious points, or unrelated details that should be trimmed?

data-model.md

Read it as: a junior engineer’s first-draft architectural proposal

Questions to ask:

  • Is there a single source of truth for each piece of data?
  • Is the data being stored at the right layer of the system?
  • Do the proposed types capture the information the system needs to hold onto - no more, no less?
  • Should this follow an existing pattern in the codebase, or does the feature genuinely warrant a new mechanism?
  • Are full code snippets bleeding in here? They belong in code, not in a plan - push them out.
  • Is anything here specifying API behavior that belongs in contracts/api.md?
  • If you were implementing this yourself, would you find this approach reasonable?

contracts/api.md

Read it as: a design-review meeting for a public interface

Questions to ask:

  • Are the calls easy to use correctly and hard to use incorrectly?
  • How does this compose with existing APIs in the codebase?
  • Are error paths and non-optimal codepaths specified, or only the happy path?
  • For exceptions and errors: do they have structure, hierarchy, and a clear handling location?
  • Is resource cleanup on the error path described?
  • What does this API force future code to look like? Are you okay with that?
  • Is the contract leaking internal data-model details that belong in data-model.md?