Spec-Driven Development Will Collapse

Kapil Viren Ahuja 2026-05-03 Software Development Practices

Kapil Viren Ahuja argues that vibe coding has already collapsed and spec-driven development is collapsing next — both jam three distinct layers (Intent, Spec, Implementation) into one document. The fusion is the Four Crafts (Intent, Spec, Context, Prompt) running on a five-level Substrate Stack, with empirical memory as the prerequisite for intent-driven systems.

Contents

The industry's fight is the wrong fight
The three-layer schematic
What "the spec" is actually hiding
The Four Crafts
The Substrate Stack
The CXO question
Memory is the prerequisite
Three practitioner confessions
What rises from the fusion
Why this connects to the rest of the wiki
Related

Summary: Vibe coding has already collapsed. Spec-driven development is collapsing next. Both fail for the same reason — they treat one document as if it were three. The fusion is a clean separation of Intent, Spec, and Implementation layers, operated through Four Crafts (Intent, Spec, Context, Prompt) — two owned by the human, two owned by the system — and located on a five-level Substrate Stack from Vibe to Dark Factory.

The industry's fight is the wrong fight

For most of 2025–2026 the argument has been spec versus vibe.

Spec-driven development (SDD): AWS Kiro pulled 250,000 developers in three months. GitHub spec-kit crossed 16,000 stars in its first week. ThoughtWorks placed SDD in the "Assess" ring of their Technology Radar and called it 2025's "key new engineering practice." The pitch: AI is non-deterministic — tame it with structure.
Vibe coding: Cursor at $2B ARR, $50B valuation, used by ~70% of the Fortune 1000. Lovable $330M Series B at $6.6B. Replit $10M → $100M ARR in nine months. Karpathy coined the term in February 2025; it won Collins Dictionary's Word of the Year. The pitch: stop writing specs, just talk to the model.

Both confident. Both visibly working for a slice of users. Both wrong in the same way.

Andrej Karpathy himself has now called vibe coding "passé." Martin Fowler called Kiro-style SDD "like using a sledgehammer to crack a nut." ThoughtWorks paired its "Assess" ring placement with a warning: handcrafting detailed rules for AI doesn't scale — the "bitter lesson" returning in fresh clothes.

The waterfall framing misses the real failure. SDD doesn't fail because it's waterfall, and vibe doesn't fail because it's ungoverned. Both fail because neither side understands that what they're doing operates across three different layers jammed into one.

SDD: intent is buried inside the spec, spec is buried inside the spec, implementation is pre-locked inside the spec. Everything is spec, nothing is separable.
Vibe: there is no spec. Intent is implicit in whatever you said last. Implementation is whatever the model generated this minute. No memory, no contract, no audit trail.

Different compressions of the same structural failure: the three-layer collapse.

The three-layer schematic

A user says: "I want red shoes between ₹3,000 and ₹5,000, and they need to last eight hours on my feet without hurting my soles." That single sentence contains three things SDD documents usually fuse and lose:

Intent — authored by the user

What the user wants, under what constraints, with what success and failure conditions. Red shoes. ₹3,000–5,000. Eight hours, no sole pain. "If no red shoe is available, does the system recommend alternatives or fail silently?" — that branch is intent, not spec.

Critically, non-functional requirements belong in the intent, not the spec — they drive architecture decisions the spec never makes. "One million concurrent shoppers" is intent. "Must recover within thirty seconds of an AZ failure" is intent.

Two levels of intent run inside a real system:

Consumer intents — "find me red shoes, eight hours of comfort" — become epics.
Engineering intents — "commit grouped by concern, conventional commits, halt when an issue can't be mapped" — become plays.

Spec — the evaluable contract

Given the intent, what must be testable? Litmus test: can you write a spec that can be converted into an eval that will validate that this has been delivered? If not, it isn't a spec — it's either intent disguised as spec or noise disguised as spec.

A real spec is test-shaped. It passes or fails. It doesn't describe; it verifies.

Implementation — owned by the system

Which architecture delivers it? Microservice or monolith? In-process or distributed cache? This is not in the user's sentence and can't be, because the architecture depends on facts the user never said — scale, organizational history, existing stack, risk tolerance.

The decision has to stay within the system. The user should not put that into the spec.

Push implementation back into the spec and you destroy the separability that makes the whole system evolvable.

What "the spec" is actually hiding

When a team writes SDD's single "spec document," they typically produce something like:

"Build a microservice to search red shoes for users in ₹3,000–5,000 range."

Intent is buried. Spec is gone. Implementation is pre-locked (a microservice — based on what scale assumption?). The document is technically correct and methodologically useless.

The uncomfortable reference point: OpenAI's published Symphony specification runs to roughly 1,400 lines. Every agent role, every handoff, every failure condition. That is what complete enough actually means. Nobody else writes that — not the 250,000 Kiro adopters, not the SDD camps inside large enterprises. If you can write at Symphony-grade fidelity before touching the agent, you are no longer doing SDD — you are at Level 4 of the #substrate-stack, and the spec itself has become the system.

Everyone else is producing "Build a microservice." That gap is where the methodology collapses — and it's the same wall every team adopting Kiro, spec-kit, or Tessl will hit.

The Four Crafts

The vocabulary — Intent, Spec, Context, Prompt — comes from Nate B. Jones's 2026 framing "prompting has split into four skills." The reorganization — which crafts sit with the human, which sit with the system, and how all four map to the three-layer schematic and onto Kapil's P-CAM (Perception, Cognition, Agency, Manifestation) — is Kapil's.

Crafts the human performs

Intent Crafting — expressing what you want with enough structure that a machine can reason about it. Not a one-liner. A complex schema of constraints, successes, and failure conditions. "User wants to buy shoes" is a headline for intent — not intent itself.
Spec Crafting — writing the test. Every claim in the spec must be convertible to an evaluation. Spec crafting demands a tighter contract than prose.

Crafts the system performs

Context Crafting — the decision layer, backed by empirical memory. Given this intent, under these constraints, with this organizational history — which architecture, which patterns, which tools? The knowledge base is the differentiator. Without it, context crafting collapses back to architect guesswork, and architectural decisions get pre-locked at the top of the methodology again.
Prompt Crafting — the execution layer. In Kapil's system this manifests as plays and playbooks — structured, reusable, intent-encoded interaction patterns that give the system determinism. Plays are scaffolding, not destination: in a fully intent-driven model, prompt crafting thins as models mature enough to resolve intent directly. Until then, plays bridge the maturity gap.

The Substrate Stack

Agentic software development is moving through five maturity levels. Each has a distinct technological bet — not just a behavioral difference. Which level a team is at isn't a matter of opinion; it's a matter of what the organization has built underneath it.

Level	Name	Substrate	Status
1	Vibe	Model + IDE plugin (Copilot, Cursor, Claude Code). No memory, no structure, no contract.	Already collapsed as an enterprise methodology. Works solo on small surfaces.
2	Spec-driven	Structured tooling layered on the model (Kiro, spec-kit, Tessl).	The bet of 2025–early 2026. Collapsing now — one document doing three jobs.
3	Intent-driven	Plays + knowledge base + Four Crafts + clean three-layer separation.	Where the next wave sits, including the platform Kapil is building.
4	Autonomous	Guardrails move from individual plays to shared substrate; the system reasons across plays.	Theoretical for most teams today.
5	Dark Factory	Methodology runs itself. Intents in, software out. The precompiled plan inside the play is gone; live resolution replaces it.	Term coined by Dan Shapiro (January 2026). Nobody has lived it yet.

The CXO question

Are you jamming all three layers into one document and calling it "the spec"?

If yes — and your organization has adopted Kiro, spec-kit, Tessl, or an internal equivalent — you are inside the collapse. The ceremony will feel heavy. The specifiers will drown. The builders will revolt and try to vibe past the process. The agents will ignore instructions. The documents will get longer. The outcomes will not improve.

The math runs honestly: an epic that should save 50% in delivery time gives back 30% to drift recovery. Real savings land at 20%, not 50. The missing 30 doesn't feel like inefficiency — it feels like something was taken.

Memory is the prerequisite

Kapil's three-month build illustrates what SDD misses entirely:

"The code I burned three months on? It worked. But when I came back, the spec had drifted — and a drifted spec is worse than no spec, because it lies with confidence. The system had forgotten what it was building, why, and how. It had gone into a kind of dreamstate."

SDD solves for structure at the moment of creation. It has no answer for continuity across time. Reaching Level 4 requires the system to know where it's standing — not just what it was told to build on day one. Memory isn't a feature. It's the prerequisite.

Three practitioner confessions

Kapil writes these openly to model the honesty he says the industry lacks:

My own corpus drifted. The Four Crafts in his internal pitch deck listed "intent, context, spec, model" for months — the fourth craft should have been prompt. Model selection is a sub-decision inside prompt crafting, not a craft. Separation is harder than it sounds — SDD-style drift happens to intent-driven writers too.
We're at Level 2.5, not Level 3. The tool still makes pre-lock decisions at prepare-time that a truly Level 3 system would resolve at implement-time. A conscious compromise, not a failure — a bridge across the maturity gap.
Nobody knows what the dark factory looks like. Every time someone paints a confident picture of Level 5, a reader should be able to point back and say: Kapil admitted this was still theory.

What rises from the fusion

Vibe coding collapsed because it had no contract. Spec-driven development is collapsing because it has three contracts pretending to be one.

What rises isn't a new brand or a better tool. It's a separation of concerns — the oldest principle in software engineering — applied one layer up, to the documents we use to instruct the machines that write the documents.

The methodology is the product now. The discipline is unsexy and doesn't make YouTube videos: split one document into three, give the human the two crafts they own, give the system the two crafts it owns, let empirical memory deepen. "Get it right, or watch your specifiers drown."

Why this connects to the rest of the wiki

It is the next move in the conversation Return of the spec started — David Knott noted that spec-driven development would resurrect upfront design but warned that specs themselves could carry bugs. Kapil names the deeper bug: specs that compress three jobs into one.
It sharpens Design-First Collaboration — design-first emphasized intent over implementation; the three-layer schematic gives that intuition a structural home.
It rounds out the AI-and-development cluster: How AI coding companions will change the way developers work, The Learning Loop and LLMs, The Age of AI has begun, VS Code Custom AI SE Assistant, @ttgcacustom-ai-code-analyzer.
It reframes Encoding Team Standards — the standards become plays in the Prompt Crafting layer, not narrative documents.

Return of the spec — the predecessor argument from David Knott.
Design-First Collaboration — design as the upstream of intent.
How AI coding companions will change the way developers work — vibe-coding from inside AWS.
The Learning Loop and LLMs — why memory matters for any AI-assisted workflow.
Encoding Team Standards — pre-existing standards as candidate plays.
Framework vs Methodology vs Best Practice Key Differences in Architecture — distinguishing methodology from tooling.
VS Code Custom AI SE Assistant and @ttgcacustom-ai-code-analyzer — Noah's own intent-driven AI tooling experiments.
Modular Monolith Instead of Microservices — recommends SDD as the discipline process paired with the modular monolith in the AI-agent era.
Context Engineering — the hub page for Context Crafting and the "memory is the prerequisite" claim, connected to the vault's other context threads.

ai dev-practices architecture

Source: https://medium.com/activated-thinker/spec-driven-development-isnt-broken-it-will...