Are audiobook files about to go the way of the cassette tape?
An analysis of Carlo Carrenho
AI text-to-speech is advancing so rapidly that it raises an uncomfortable question for the publishing industry: what if the next format to disappear isn’t just the delivery mechanism, but the recorded audiobook itself?
Photo: AI-generated, Freepik
The audiobook industry has survived format after format: vinyl gave way to cassette tapes, which yielded to CDs, then MP3s, then streaming. Each technological shift disrupted the market, but the fundamental product remained constant—a human voice narrating a text, captured and delivered as an audio recording or file. That constancy may be ending. AI-powered text-to-speech technology is advancing rapidly enough to raise a provocative question: could audiobook files themselves become obsolete, replaced by on-demand AI narration of ebooks?
This is a hypothesis, not a prediction. But it’s worth examining seriously, because the early signs are starting to appear, and the implications for publishers, narrators, and the entire audiobook ecosystem could be profound.
The history of audiobooks is a history of format succession. In 1952, Caedmon Records pioneered the term “audiobook” with Dylan Thomas reading his poetry on vinyl. The cassette tape arrived in 1963, followed by the commercial CD in 1982 and the MP3 in 1995. Each technology replaced its predecessor. In 2006, Storytel pioneered streaming audiobooks, eliminating even the need to download files. The pattern suggests that new formats don’t just supplement old ones; they make them obsolete. But does that pattern necessarily continue? And if audiobook files did disappear, what would that mean?

In 1952, Caedmon Records pioneered the term “audiobook” with Dylan Thomas reading his poetry on vinyl

The cassette tape arrived in 1963.
Rights ultimately tend to adapt to market realities
The shift would fundamentally alter the industry’s structure. Consider RB Media, the world’s largest audiobook publisher, with over 100,000 titles. Its entire business rests on contracts licensing audiobook format rights and recordings. For most titles, the company is likely to hold no ebook rights or comprehensive digital rights. If audiobook files were to disappear, this substantial segment of the digital publishing industry could potentially be absorbed into ebook publishing. The specialized infrastructure, the narrator networks, the production studios—all potentially rendered redundant. Of course, this assumes that AI narration becomes truly competitive with human performance, which remains an open question.
The industry has confronted this threat before. In 2009, Amazon introduced text-to-speech functionality on Kindle devices. Users could plug in headphones and listen to their ebooks, albeit with primitive technology that couldn’t even distinguish titles and subtitles from body text. Despite its limitations, the feature served people with visual impairments or dyslexia. The industry response was swift and forceful. The Big Five American publishers argued that Amazon lacked audiobook licensing and that audio rights were separate. Within two months, Amazon capitulated, allowing publishers to opt out. Most commercial publishers still exercise this option today.
That was the rights battle—and it worked in 2009. But would the same strategy succeed today? Rights ultimately tend to adapt to market realities. The pattern is visible now with AI training negotiations. Publishers initially refused to allow their content on AI platforms but are now negotiating terms. History suggests that if the technology becomes compelling enough, and if consumer demand materializes, rights frameworks will adjust. The 2009 battle was won by the publishers, but it may have been just a delaying action. Then again, the audiobook industry is far larger and more established now than it was fifteen years ago, which could make rights holders more determined to protect their territory.
Dieser Inhalt ist durch deine Cookie Einstellungen blockiert. 🍪
Du kannst dies in den Cookie Einstellungen ändern oder es für diesen Besuch nur akzeptieren und laden.
How Amazon started with TTS in 2009
Current limitations may prove temporary, or they may represent deeper challenges
The more substantial barrier may be technological. AI narration isn’t fully mature yet. For English-language nonfiction with single-voice narration, it has essentially reached commercial viability. Listeners seeking content about personal finance or self-improvement may prioritize information over voice quality—the technology might already serve this market adequately. But fiction presents greater challenges. The emotional range doesn’t yet match human narration. The human connection that many listeners value in their narrators may not be there. Non-English languages, such as Portuguese, appear to lag behind English. Multi-voice automation remains unsolved—extensive editing is still required. In fact, AI audiobooks often demand more editing than human-narrated productions. While narrator costs disappear, editing costs increase. Whether this represents a temporary gap or a fundamental limitation remains unclear.
When AI narration first emerged, skepticism was widespread. That skepticism has diminished as the technology has improved. But the question isn’t simply whether AI can narrate books—it’s whether it can do so well enough that listeners won’t notice, or won’t care about, the difference. Current limitations may prove temporary, or they may represent deeper challenges than technology optimists expect.
Will platforms like Audible soon acquire ebook rights?
Meanwhile, commercial infrastructure is being built. ElevenLabs has launched ElevenReader, an app that enables users to upload DRM-free ebooks or PDFs for automatic narration. More significantly, the company is building a commercial store with two business models: a Storytel-like subscription for unlimited listening, or à la carte purchases at publisher-set prices. Not all books are available yet, but everything in the store is ebook-based. Publishers upload ebooks, edit the text to remove front matter, split chapters as needed, and when users listen, the audio is rendered in real time on their devices. Users can select their preferred narrator voice, including well-known broadcast voices that can narrate in multiple languages without accent artifacts.
This raises questions worth considering. Could we see a world in the near future where platforms like Audible, Spotify, and Nextory acquire ebook rights rather than audiobook rights, rendering them as audio on demand? What would happen to the infrastructure, expertise, and business relationships built around the audiobook file? What would become of the narrator profession, the recording studios, and the specialized audiobook producers who’ve built careers on understanding the unique demands of audio storytelling?
Dieser Inhalt ist durch deine Cookie Einstellungen blockiert. 🍪
Du kannst dies in den Cookie Einstellungen ändern oder es für diesen Besuch nur akzeptieren und laden.
ElevenReader enables users to upload DRM-free ebooks or PDFs for automatic narration.
The question isn’t how audiobook publishing adapts
These questions don’t have clear answers. The hypothesis may prove wrong—listener preference for human narration may be more durable than technology optimists expect, or AI limitations may persist longer than current progress suggests. The human voice carries emotional weight that algorithms may never fully replicate. Or perhaps that’s what every disrupted industry told itself.
The point isn’t to predict the future with certainty. It’s to raise the question: if audiobook files aren’t permanent, what does the industry need to be thinking about now? Because if this hypothesis proves correct, the question isn’t how audiobook publishing adapts. It’s whether it survives as a distinct industry at all.

Carlo Carrenho is a publishing consultant based in Sweden. This article is based on the presentation “The End of the Audiobook File: A Hypothesis” delivered at the Shifting Sounds seminar at Johannes Gutenberg-Universität in Mainz on January 30, 2025.
