The AI Cascade Effect: Why One Bad Spec Costs You Forever

An engineer on a team I advise spent two days building a “simple” notifications feature with Claude Code. Day one: the agent produced a working prototype in about forty minutes. The next day and a half went into fixing it.

The feature stored notifications in a single table with a polymorphic target column — a string field holding a JSON blob that could be a user ID, a restaurant ID, or a reservation ID depending on the notification type. It was a defensible choice at 3pm on Monday. By Wednesday morning it was a nightmare.

Every new notification type required a new conditional branch. Every fix generated two more bugs. The agent kept suggesting helpers to parse the JSON, then helpers for the helpers. By the end, the prompts were 4,000 tokens long just to describe what the schema was supposed to mean. Every iteration got slower, more expensive, and more wrong.

Eventually the engineer did what he should have done on Monday afternoon: he stopped, threw away the code, and rebuilt it in thirty minutes with three separate foreign-key columns and a proper type discriminator.

The lesson was not that the engineer was bad at data modeling. The lesson was that AI has a specific failure mode that nobody warned him about: when the foundation is wrong, AI doesn’t slow down. It speeds up — toward the wall.

The Hierarchy of Technical Impact

Not all decisions in a software project carry the same weight. The ones that are easiest to change are the ones everybody argues about. The ones that are impossible to change are the ones nobody looks at.

Level	What	How Visible	Cost of Error
3	Aesthetics — colors, copy, spacing	🟢 High	🟢 Low
2	Behavior — flows, states, edge cases	🟡 Medium	🟡 Medium
1	Foundation — entities, business logic, integration contracts	🔴 Low	🔴 Critical

The pattern: what’s easiest to see is cheapest to fix. What’s invisible is expensive to change.

Level 3 is the button color. Level 3 is whether the submit button says “Confirm” or “Submit.” You can A/B test it. You can change it on a Friday afternoon without a ceremony.

Level 2 is behavior. What happens when the network drops during submit? What’s the confirmation flow? Which states can transition into which? Fixing a Level 2 mistake usually means a few PRs and some regression testing.

Level 1 is the foundation. Entities and their lifecycle. Business rules that are actual rules, not UI copy. Contracts between systems. Get these wrong and you’re not fixing bugs — you’re migrating data in production at 2am.

This hierarchy isn’t new. Every experienced engineer knows it. What’s new is what AI does to it.

What Changed With AI

In the old world, a Level 1 mistake had a natural speed limit. Fixing a bad data model took a week. That week was painful, but it was bounded. You did the migration, you updated the ORM, you fixed the queries, you redeployed. The pain ended.

AI removed the natural speed limit. Which sounds good — until you realize it removed it in both directions.

On a clean Level 1 foundation, AI compounds beautifully. Every new feature reuses the entities you already defined. Prompts stay short because the schema is self-explanatory. The agent suggests code that fits because the shape of the codebase is coherent. You get the 10x speed story that everyone talks about at conferences.

On a dirty Level 1 foundation, AI also compounds — in the opposite direction. Each subsequent prompt to fix a Level 1 mistake consumes more context. Explanations become more convoluted (“the target field is a string, but it’s actually a JSON, but if the type is ‘reservation’ then it might be a number, except when…”). The model starts patching patches. What was a small data model mistake becomes a tax you pay on every single future interaction with that part of the codebase.

This is the AI Cascade Effect. And the tax it produces has a name:

The Token Tax

Every ambiguous Level 1 decision becomes a line item in every future prompt.

Consider the notifications example. With the polymorphic target design, every prompt about notifications had to include:

A sentence explaining that target is a string.
A sentence clarifying that it’s actually serialized JSON.
A table mapping notification types to expected JSON shapes.
A warning about the type coercion in the legacy migration.
A reminder that the UI expects specific error messages when parsing fails.
An example of how other services have integrated with it.

All of that — maybe 400 tokens — in every prompt, forever. Multiply by the number of prompts the team will run against this area over the next year. Multiply again by the number of Level 1 mistakes embedded in the codebase. You’re looking at tens of millions of tokens in pure explanation overhead, billed monthly to your Anthropic invoice and paid daily in cognitive load.

And that’s the cheap version. The expensive version is when the tax compounds: Level 1 mistakes generate Level 2 workarounds, which the agent misremembers as Level 1 truths, which propagate into new features, which create new Level 1 mistakes.

Pay the token tax once and you lose some productivity. Pay it at scale across a codebase and you’ve built a system where AI makes things worse the more you use it.

Why Nobody Catches This In Review

The cruel part is that the output still looks correct. The code compiles. The tests pass (tests written by the same agent, against the same flawed mental model). The PR description is coherent. The demo works.

This is the single most underrated risk of AI-assisted development: confidently wrong code at unprecedented speed.

In the pre-AI world, a bad data model showed up as obviously-bad code. The queries were gnarly. The if/else trees were ugly. A senior reviewer would squint at the PR, ask three questions, and the rot would surface.

With AI, the rot is hidden. The agent generates clean gnarly code. The queries are formatted. The if/else trees are refactored into strategy patterns. The ugliness that used to be a smoke signal has been polished away. A senior reviewer might skim the PR and approve it, because nothing looks wrong on the surface.

The only way to catch Level 1 mistakes is to prevent them upstream — at the spec.

The Validation Paradox

Here’s the contradiction that breaks naive spec-driven approaches:

The Foundation must be defined first because it’s expensive to change. However, you often only realize the Foundation is wrong once you interact with the Visuals.

Engineers define “perfect” data models on paper all the time. The data model survives contact with the whiteboard. It survives the design review. It survives the first three sprints. Then the first real user clicks through the first real prototype and announces, “Wait, this flow makes no sense.”

You realize the user thinks about this problem as two entities, not one. Or as one entity with a status, not two entities with a relationship. Or as something that doesn’t fit your model at all. And now the Foundation — the thing you were supposed to lock down first — needs to change after three sprints of work built on top of it.

The paradox is real. The resolution is not to define Level 1 first and ignore validation, and it’s not to skip Level 1 definition and wing it. The resolution is to prototype the visible layer before locking the Foundation.

Prototype Before You Architect

The rule the best teams follow:

Prototype the visible behavior first. Wireframes, clickable mockups, Claude-generated HTML prototypes. Whatever lets users interact with the shape of the thing before you commit to its skeleton.
Use the prototype as a specification tool for the data model. Screens stress-test abstract entity decisions better than any whiteboard session. If a user’s natural flow doesn’t map to your data model, the data model is wrong.
Then lock the Foundation. Once the prototype validates the user experience, now you commit. Now you define entities, business rules, integration contracts. Now you give the AI a sharp, correct target to hit.

This is why the framework I work with separates Discovery (validate what to build) from Development (build it right). Discovery is where you burn the cheap iterations. Development is where you build on what survived.

The AI angle matters here too: prototypes are now free. Claude can generate a clickable mock of a flow in fifteen minutes. There is no longer an excuse for locking a data model before a human has clicked through the screens it’s meant to support.

What This Means For Your Stack

If you’re rolling out AI tooling to an engineering team, the Hierarchy of Technical Impact gives you a triage order:

For Level 3 work (UI polish, copy, cosmetics), let the AI run freely. Mistakes are cheap. Iteration is the point.
For Level 2 work (flows, states, edge cases), use skills and guided procedures to inject context and check assumptions. Mistakes are medium-cost; guardrails pay for themselves.
For Level 1 work (entities, business logic, integration contracts), require a validated spec before any prompt reaches a code-writing agent. This is where you spend humans, not tokens.

The teams I’ve seen do this well treat AI the way a surgical team treats a scalpel: unbelievably useful for the parts of the procedure where you’ve already done the imaging, the planning, the prep. Also the fastest way to cut the wrong thing if somebody skipped a step upstream.

The One Mental Model

If you remember one sentence from this article, make it this:

AI doesn’t slow down when your spec is wrong. It speeds up — in the wrong direction.

Every hour you spend getting Level 1 right is an hour you don’t spend paying the token tax for the rest of the project’s life. Every prototype you build before locking the Foundation is a sprint of refactoring you didn’t need.

This is not a plea for waterfall. This is not a plea for heavier process. It’s a plea for putting the thinking where the leverage is — upstream of the thing that now writes code faster than any human ever will.

The three layers that operationalize this →

The role that builds all of it →

“AI can write code 10 times faster than you. It can also take you to the wrong place 10 times faster.”