The Dashboards Are Thrilled. The Clients Are Gone.
An agency slashed their content team from 8 writers to 3. AI filled the gap. Output tripled in 90 days. Cost-per-piece dropped 60%. Every metric pointed up and to the right.
Then two of their biggest accounts called. The work felt generic. One wanted to renegotiate. The other just left.
What looked like efficiency on a dashboard was structural damage in real time. The agency had automated writing—the easy part. They'd eliminated the judgment about what to write, for whom, and why this particular client would hate that approach. That judgment lived in eighteen-month-old Slack threads, in memory of failed campaigns, in the institutional knowledge of which executive preferred data and which one wanted stories.
The writing was scaffolding. The judgment was structure. They kept the scaffolding and stripped out the structure.
I can't stop thinking about this because I've watched this exact pattern play out across three disruption cycles. Different technology. Same mistake.
We've Done This Before (And We'll Do It Again)
Fifteen years ago, I watched cybersecurity teams automate compliance checklists. We built beautiful dashboards that turned green when the boxes got checked. Firewalls: configured. Patches: deployed. Training: completed. The board presentations were immaculate.
The actual threats got worse.
We'd automated the measurable parts and lost the judgment about what was actually risky. A firewall rule that technically complied with policy but left the billing system exposed to the internet because someone misunderstood the exception process. Patches deployed on schedule to systems that didn't matter while critical infrastructure ran months behind because "it required downtime coordination." Training completion rates at 98% while phishing success rates climbed because nobody was testing whether people actually retained anything.
The dashboard said we were safer. The red team assessments said otherwise.
Same pattern, different decade. We optimized what we could measure and assumed the unmeasurable parts didn't matter. They mattered most.
The Thing That Looks Like Overhead Is Sometimes The Load-Bearing Wall
Here's what I'm wrestling with: how do you identify load-bearing judgment before you automate it away?
It sits in the same person as the replaceable execution. It costs the same on the spreadsheet. It doesn't show up in job descriptions. The person doing it often can't articulate it cleanly because it's pattern recognition built over years, not a documented process.
I was advising a financial services client last year on their fraud detection workflow. They wanted to automate the initial screening—made perfect sense on paper. The analyst's job was reviewing transactions, flagging suspicious patterns, escalating to investigators. Eighty percent of the work was routine. The AI could handle it faster and cheaper.
Except when we mapped the actual workflow, we found the analysts were doing something else entirely. They were maintaining an informal network of context about specific accounts. "This customer always does wire transfers on Wednesdays to this vendor because of their payroll cycle—flag it any other day." "This account spikes in December every year for inventory—not fraud, just seasonal." That context lived nowhere except in their heads and in the Slack channel where they'd swap notes.
The transaction review was scaffolding. The context maintenance was structure.
Automate the review, lose the context. The AI couldn't tell the difference between a legitimate pattern and a suspicious one without that layer of judgment. And the judgment layer wasn't in any training manual or standard operating procedure. It was emergent knowledge that developed through repetition and communication.
We didn't automate it. But I left wondering: how many other companies would have automated it, discovered the problem three months later when false positives spiked or actual fraud slipped through, and only then realized what they'd lost?
Every AI Deployment Treats The Org Chart Like A Parts List
This is where I get uncomfortable, because I don't have a clean answer.
The economics of AI make perfect sense in a spreadsheet. Take a $200K/year function, replace it with a $20K/year AI tool, redeploy the human to higher-value work. Efficiency gains compound. The CFO is happy. The board is happy.
But we're running structural analysis on a system we don't fully understand, using a framework designed for parts replacement, not load distribution.
When a bridge engineer analyzes a structure, they don't just look at which beams carry the most obvious weight. They model load paths. They identify which elements are under tension, which are under compression, which are redundant, which are quietly preventing catastrophic failure. Strip the wrong beam and the bridge doesn't just get weaker—it collapses in a way you didn't predict because you didn't understand the system.
We're not doing that with organizations. We're looking at job titles and output metrics and saying "this function produces widgets at this cost—can we produce the same widgets cheaper?" Without mapping the invisible load paths of judgment, context, relationship maintenance, institutional memory, and pattern recognition that flow through the same people.
I watched this play out in trading floors when electronic markets disrupted the NYSE. The floor traders weren't just executing orders—they were absorbing volatility, providing liquidity during weird market conditions, maintaining relationships that kept order flow stable. When the algorithms took over, the routine trades got faster and cheaper. The weird situations got worse, because the system had lost the shock absorbers. (Hello, Flash Crash. We perfected the math. We forgot the humans.)
The Gap I Can't Close
Here's the question that keeps me up: Is there a test for load-bearing judgment that works before you strip it out?
I've tried a few frameworks with clients:
The "What happens if nobody does this for six months?" test. If the answer is "immediate crisis," it's probably structural. If the answer is "things get messy but don't break," it might be scaffolding. But judgment often fails quietly and shows up as second-order effects—lost clients, degraded quality, increased risk—that take quarters to surface.
The "Can you write a complete procedure for this?" test. If the answer requires phrases like "use your judgment" or "it depends on the situation," you're looking at something that might not automate cleanly. But that's not a test for whether it's load-bearing, just whether it's automatable.
The "Who would notice first if we got this wrong?" test. If the answer is "our customers" or "our regulators" rather than "our internal dashboard," tread carefully. But by the time they notice, you've already done the damage.
None of these are satisfying. They're directional, not definitive. And I've been through enough disruption cycles to know that the confident frameworks are usually the dangerous ones. (Remember when we were certain that credit default swaps had "distributed risk efficiently"? Good times.)
What To Do Monday Morning
I don't have a silver bullet. But here's what I'm telling clients who are deploying AI into roles that mix execution with judgment:
Map the informal information flows first. Before you automate the content writer or the fraud analyst or the customer service rep, spend a week watching what they actually do. Not the job description—the Slack messages, the hallway conversations, the "Hey, quick question about this account" exchanges. That's where the load-bearing judgment lives.
Pilot with a canary, not a wedge. Don't cut 60% of the team on day one. Automate one person's workload while keeping them in the loop to QA the outputs and flag the edge cases. See what breaks. The problems will show up in what the AI doesn't escalate—the subtle wrongness it can't detect because it doesn't have the context.
Define "failure" before you define "success." Everyone builds dashboards for output metrics. How many people build dashboards for "client complained the work felt off" or "we missed a risk we would have caught last year"? The lagging indicators matter more than the leading ones, but they're harder to instrument.
Budget for re-learning. When you strip out the humans who held the judgment, that judgment doesn't automatically transfer to the AI or to the remaining humans. You've created a knowledge gap. Either you fill it deliberately (documentation, training, system design) or it fills itself with failure modes. There's no third option.
And maybe most importantly: stay skeptical of your own dashboards. They measure what's easy to measure, not what matters most. Three months of great metrics followed by a lost client is a system trying to tell you something. Listen before the next client calls.
Three disruption cycles in and I still don't have a clean test for identifying load-bearing judgment before you automate it away. It looks identical to replaceable execution on a spreadsheet. It sits in the same salary line. It often can't articulate itself cleanly.
But I know what it looks like when you strip it: the dashboards stay green while the structure quietly fails.
Do you have a test that works? I'm genuinely asking—because if we can't solve this, every AI deployment is a structural gamble dressed up as an efficiency gain.
And the house always wins when you're gambling without knowing the odds.
More Ai Posts
Why Solo AI Builders Are Your Market Canaries
Solo developers using AI are discovering pricing models and tools enterprises will demand in 2-3 years. Watch them to pr...
Season 1: Masterclass
Dive into the Season 1 Masterclass podcast episode, featuring highlights and diverse perspectives from the past 12 weeks...
Stop Waiting for AI: Your Competition Already Started
AI disruption isn't coming tomorrow—it's happening now. While most companies debate, competitors are shipping. Here's wh...
