AI licensing becomes a platform battle
Microsoft and Amazon launch AI licensing marketplaces
Foto/Video: ai-generated, Freepik
Microsoft announced its publisher content marketplace (PCM) last week, with Amazon set to follow this week. The two tech giants are establishing centralized platforms for licensing publisher content for AI training—an attempt to channel years of copyright disputes into orderly commercial arrangements.
Microsoft's PCM was co-designed with major U.S. content providers, including the Associated Press, Condé Nast, Hearst Magazines, Vox Media, and USA Today. Book publishers are explicitly absent from the initial lineup. Publishers define licensing and usage terms on the platform, while AI builders discover and license content for specific use cases. Usage-based reporting is intended to provide publishers with transparency about how their content is being utilized. Initially, the platform will supply content to Microsoft Copilot.
Amazon Web Services is reportedly planning to announce its own AI content licensing marketplace this week, according to The Information. Presentation slides circulated ahead of an AWS conference group the marketplace alongside core products like Bedrock and Quick Suite. Amazon itself remains reticent, though emphasizing long-standing relationships with publishers.
Context: The difficult path to licensing
Microsoft's and Amazon's initiatives fit into a broader context in which the AI industry is attempting to legitimize what publishing analyst Thad McIlroy calls AI's "original sin"—the non-consensual use of copyrighted materials for training large language models—through retroactive licensing deals. As McIlroy lays out in a comprehensive analysis, AI companies are following a proven tech playbook: "just do what we want and pretend that laws don't really apply to our technology because it's new, and then get so big and rich that we're able to sort of buy our way out of it on the back end."
To date, licensing deals have focused primarily on news organizations. OpenAI, Microsoft, Amazon, and Perplexity have struck agreements with AP, The Financial Times, The New York Times, News Corp, Springer, and The Atlantic. The book publishing side remains modest: Wiley has been by far the most aggressive deal seeker, Taylor & Francis/Informa appears three times on the Ithaka S+R tracking list, plus three university presses and one trade publisher—"Is that all there is?" asks McIlroy.
The dilemma: Invisibility versus capitulation
The question of whether licensing represents a solution or capitulation divides the industry. Authors feel—as McIlroy quotes one industry player—that AI companies have "not just broken into their homes — they then kidnapped the children." The anger stems from the use of the Book3 corpus and downloaded pirated copies from Library Genesis and Pirate Library Mirror for training several major language models.
Yet McIlroy points to a fundamental problem: without licensing, invisibility looms. "If Google Search can't find your blog post/article/research/etc. is it visible? Technically perhaps yes, but in practical terms, simply, no. So too in the emerging world of AI interactions." In a world where user interactions are increasingly mediated by LLM-based AI, failing to license could mean de facto non-existence.
Two types of training, different prospects
McIlroy distinguishes two types of AI training: general LLM training, where content contributes to language capabilities as "bags of words," and RAG (retrieval-augmented generation) for specific fact retrieval. For publishers, RAG offers the more promising perspective: here, actual content matters more than word count—and attribution becomes part of the model.
The financial realities remain sobering: while the widely cited HarperCollins offer of $5,000 for a children's book is considered an outlier, an industry-wide rate of around $100 per book has emerged. For word-based deals involving "tonnage" content, rates are even lower—approximately .001 cent per word, which would total $75 for an average book. McIlroy estimates the total value of all publicly known text licensing deals at $300 million—"just a drop in the bucket of overall AI training costs."
Perfect Information Problem
A structural issue persists: AI companies aspire to "perfect information"—comprehensive coverage of subject areas. For broad domains like economics, already captured data may suffice. For specialized topics—McIlroy cites the "no-trade theorem" in economics—30 to 40 targeted licenses would be needed, many behind paywalls. If AI truly wants to be authoritative rather than merely appearing intelligent, it needs access to everything—including the 56 bibliographic databases that, according to researcher Michael Gusenbauer, can contain authoritative data.
Competition among intermediaries
The uncertainty has spawned its own ecosystem: McIlroy lists 30 intermediaries seeking to broker between rights holders and AI companies—from Copyright Clearance Center to startups like Amlet, Cashmere, and Created by Humans to specialized services like Wiley and Bookwire. Microsoft's and Amazon's own marketplaces now compete with these providers.
Central questions remain unanswered: How will the frequently cited three-year contract terms work when trained content cannot be "removed" again? Will usage-based compensation models, which publishers increasingly demand, prevail? And can the industry agree on common licensing standards, or will it—as McIlroy fears—continue to act in fragmented ways and thereby cede control?
The Microsoft and Amazon platforms could set a de facto standard—whether to publishers' advantage or disadvantage remains to be seen. Over 30 intermediary services now compete for AI licensing of publisher content. With Microsoft and Amazon, two tech giants are entering the arena directly—creating their own marketplaces that potentially bypass existing intermediaries.