Who Owns the Value When Your Data Trains AI? (Spoiler: Not You)
Authors are suing AI companies for training on pirated books. They'll lose. But in losing, they're accidentally stumbling onto the most important unanswered question in enterprise AI.
Let me tell you why this matters more than you think.
The Legal Case Is Already Dead
The lawsuits make headlines, but the legal reality is brutal. Courts have already ruled that training AI models on pirated copies is perfectly legal—as long as you can't prove the AI company itself did the pirating. And good luck proving that when training datasets contain billions of documents sourced from third parties.
The recent Anthropic settlement sounds impressive at first: $1.5 billion split across authors whose work was used without permission. Then you do the math. That works out to roughly $3,000 per author. Three thousand dollars. That's not compensation for fueling a trillion-dollar industry. That's a rounding error. That's shut-up money.
Here's the uncomfortable truth: copyright law is the wrong tool for this fight.
Copyright was designed for a world where copying meant reproduction. Where infringement looked like bootleg DVDs or plagiarized passages. It wasn't built for a world where your work gets atomized, digested, and reconstituted as statistical weights in a neural network that generates billions in enterprise value.
The Real Problem: We Have No Framework for Data as Input
Think about what actually happens when AI companies train on copyrighted material. Your book—the product of months or years of work—gets scraped from the internet. It's tokenized, fed into a massive language model, and becomes part of the statistical substrate that makes that model valuable. The model then generates billions in licensing fees, enterprise contracts, and market valuation.
You get nothing.
Not because the law explicitly says you shouldn't be compensated. But because the law has no vocabulary for what just happened to your work. It doesn't know how to measure "data as training input" versus "data as copied work." There's no legal framework for the value transfer that occurred when your intellectual property became someone else's training data.
The law is fighting with 20th-century weapons in a 21st-century battle.
This Isn't Just an Author Problem—It's Your Problem
Still think this is just about novelists and journalists? Think again.
Consider every company currently building proprietary AI models on internal data. You're training models on employee emails, customer communications, strategic documents, partner contracts, and years of accumulated institutional knowledge. These models will create enormous value—automating workflows, generating insights, driving competitive advantage.
Now ask yourself: What rights do your employees have to the value their communications create when they become training data? What about your customers? Your partners? The contractors who contributed to those documents?
Your lawyers don't have answers because these questions didn't exist until now.
Employment contracts cover work product, not the downstream value of that work when it's transformed into AI training data. Customer agreements cover data privacy and usage, not compensation when that data trains models that generate millions in efficiency gains. Partner NDAs protect confidentiality, not ownership of value created when confidential information becomes model weights.
The legal infrastructure simply doesn't exist.
The Solution Nobody Wants to Hear
Here's where I'm going to lose some of you: Blockchain actually solves this problem.
I can feel the eye-rolls already. "Crypto bro" is about to start echoing in your head. But stay with me.
The fundamental challenge is provenance and attribution at scale. When a model trains on millions of documents from thousands of sources, how do you track what data contributed to which outputs? How do you create an auditable, tamper-proof record of what trained on what? How do you enable micropayments to thousands of contributors without creating prohibitive transaction costs?
Blockchain technology—specifically, distributed ledger systems—was literally built for this use case. Provenance tracking. Immutable records. Micropayments enabled by smart contracts. The infrastructure exists right now to create transparent, auditable systems where data contributors can be compensated based on verified usage.
But because "blockchain" triggers the same reflexive dismissal as "crypto," we'll keep pretending there's no technical solution. We'll keep acting like the problem is unsolvable while trillion-dollar models train on everyone's work for free.
The technology isn't the problem. Our collective hangover from crypto hype is.
The Question Every Company Should Be Asking
The authors will lose their lawsuit. That's almost guaranteed. But the underlying problem isn't going anywhere. In fact, it's about to get significantly worse as more companies realize that their competitive advantage depends on AI models trained on proprietary data.
If you're a CISO, a General Counsel, or anyone responsible for AI governance, you need to be asking one critical question right now:
Who owns the value the model creates?
Not in theory. Not according to what seems fair. But according to existing contracts, employment agreements, and partnership terms that were written before anyone thought about AI training data.
Because right now—whether you've explicitly thought about it or not—the answer is: whoever builds the model. Not whoever built the data. Not the employees who wrote the emails. Not the customers who generated the support tickets. Not the partners who shared the documents.
The model builder captures 100% of the value. The data contributors get zero.
The Coming Reckoning
This can't last. The same logic that makes authors feel cheated when their books train ChatGPT will eventually make employees feel cheated when their institutional knowledge trains their employer's AI—especially if that AI later gets licensed to competitors or sold as a product.
The same resentment building in creative industries will spread to enterprise contexts. What happens when laid-off employees realize the AI that replaced them was trained on their own work? What happens when customers discover that their data didn't just train models to serve them better, but to create products sold to others?
We need new legal frameworks. We need new contractual language. We need technical infrastructure that makes attribution and compensation possible at scale.
And we need to have these conversations now—before the lawsuits multiply, before the resentment builds, before we've trained a generation of models on data whose provenance we can't prove and whose contributors we can't compensate.
The authors suing AI companies will lose their case. But they're asking exactly the right question: When data creates value, who should benefit?
Right now, we don't have an answer. We desperately need one.
Because the future of enterprise AI depends not just on better models—but on sustainable, equitable models for value creation that don't leave data contributors with nothing.
And that's a problem no amount of computing power can solve.
More Ai Posts
Why Solo AI Builders Are Your Market Canaries
Solo developers using AI are discovering pricing models and tools enterprises will demand in 2-3 years. Watch them to pr...
Season 1: Masterclass
Dive into the Season 1 Masterclass podcast episode, featuring highlights and diverse perspectives from the past 12 weeks...
Stop Waiting for AI: Your Competition Already Started
AI disruption isn't coming tomorrow—it's happening now. While most companies debate, competitors are shipping. Here's wh...
