"Every new book will be published in sync with an audio version"

Inside the ElevenLabs mission to bridge the global audio gap

In this interview, Madeleine Shue, head of publisher partnerships at ElevenLabs, outlines the rapid metamorphosis of the company from its 2022 debut as a text-to-speech startup into a comprehensive infrastructure for the global production and distribution of audio.

photo / Video: AI generated, Freepik

Addressing the "audio gap" where up to 90% of books remain unrecorded, Shue explores how the company’s evolution—spanning long-form narration in over 70 languages, the Iconic Voices marketplace, and a payout of over $11 million to voice actors—is shifting the industry toward a hybrid future. By balancing the scale of AI-driven accessibility for backlist titles with a continued reverence for human "craft and prestige," she argues that the conversation is moving away from a binary "AI vs. Human" conflict and toward a new era of immersive, multilingual storytelling where every written word finds its voice.

"We can enable publishers to do things that simply weren't possible before“

ElevenLabs is increasingly evolving from a speech generator to a complete ecosystem for production and distribution. How does this step change your company's self-image?

ElevenLabs launched 2022 with a leading text to speech model (TTS), and we've scaled quickly since then: TTS, speech to text, music, sound effects, voice agents, and now image and video capabilities. Our vision has always been to make content accessible globally in audio, and it's increasingly focused on changing how we interact with technology through audio.

When it comes to audiobooks specifically, the opportunity is significant. Around 90% of books have never been produced in audio. We believe we can close it, and that any written text should have an audio counterpart available in any language. That future isn't as far off as people might think. We've seen firsthand how complex it is to scale audiobook production, so we're building the tools to make it seamless, both for production with Audiobooks in ElevenCreative and for distribution through ElevenReader and our partner platforms.

How do you position yourself in relation to traditional publishers – more as a technical service provider or as a new competitor in the digital book market?

The nice thing is we can both partner with publishers who bring deep expertise, and connect consumers with books in new ways. On the service side, we offer human quality-checked audiobook production at a scale the industry has never seen with ElevenProductions, alongside publishing-specific creative tools for authors and smaller publishers who couldn't previously afford professional audio production. And on ElevenReader, we partner with major publishers to distribute eBooks with lifelike AI narration — for the first time, it's as simple as distributing an eBook, with no upfront costs or exclusivity requirements.

What's unique about ElevenLabs is that we can enable publishers to do things that simply weren't possible before — including dynamic narration, where we dynamically create narration from eBooks as listeners consume, versus producing audio ahead of time. This lets publishers bring their entire catalog into audio essentially overnight. Previous players who entered this space had to invest heavily in acquiring and owning content; our technology means publishers can capture this opportunity without those trade-offs, keeping full ownership of their content while we give them the tools and distribution to do more with it.

The central challenge in the field of AI-generated audiobooks is consistency over many hours and multiple languages. What specific technological innovations enable this new stability with the "Audiobooks" module?

Our latest text to speech models are specifically trained for long-form narration, and we back that with human quality review as part of our enterprise production workflow, because getting every word to sound right across a full-length book is a high bar. Our most recent model expanded language support from 32 to over 70 languages, while bringing greater emotional range and control to the output.

Fundamentally, ElevenLabs is a research company. That means as soon as new model capabilities come out of our research team, they become available in our publishing tools and don’t rely on third-party providers or integrations to catch up. Because we've built the technology ourselves, users benefit from every improvement we make in real time.

What's particularly exciting right now is how our v3 model is being refined specifically for narration consistency across languages — including nuances like a German word within an English story being pronounced with the correct original accent. It's still in development, but when fully released it will be a meaningful step forward for publishers and storytellers across global markets.

Dieser Inhalt ist durch deine Cookie Einstellungen blockiert. 🍪

Du kannst dies in den Cookie Einstellungen ändern oder es für diesen Besuch nur akzeptieren und laden.

"Human taste, judgment, and quality remain central to what makes a great audiobook"

Users can set different speaker profiles and accents. Could this granularly controllable voice design replace the traditional concept of "one speaker voice" for an audiobook?

This is where voice technology starts to introduce genuinely new capabilities for the industry. Multi-cast and dual narration have always been compelling for listeners, but they introduce real production complexity for traditional audiobook creation — coordinating multiple voice actors, managing consistency across a long recording, the cost and logistics of it all.

What we're seeing now is that publishers and producers can assign and design voices with a level of control that simply didn't exist before — across accents, tone, emotion, pacing — and do it consistently at scale. That opens up new possibilities not just for individual titles, but across entire catalogs and languages. A publisher can bring a multi-cast experience to a book that never would have justified that investment before.

And ultimately, what that unlocks is new experiences for consumers. Listeners who prefer a certain narration style, or who want to hear a story come alive with distinct character voices, or who are accessing a title in a language that might not have had a human narrator available — all of that becomes possible in a way it wasn't before. In fact, on ElevenReader we see that consumer preferences are extremely diverse – regional dialects, age, gender and beyond – with no single narrator ever capturing more than 10% of all listens.

Despite studio promises, post-production remains necessary. Do you see a point in the future where AI could completely replace human editors – or will humans remain indispensable in this process?

If you think about what AI does well, it scales manual processes — the kind of repetitive, time-intensive work that has historically made audiobook production challenging and expensive. That's where we're seeing the biggest gains. Even over the last six months, our internal team is spending less time perfecting each audiobook as the baseline quality keeps improving. For backlist titles and translated editions, that quality control step may be largely removed within the next few years.

At the same time, human taste, judgment, and quality remain central to what makes a great audiobook. There will always be a spectrum of involvement — complex fiction, rich voice design, multi-cast narration — these benefit from more hands-on creative direction. I'd think of it less as "AI vs. human" and more as AI enabling scale and production, and humans enabling quality, taste, and refinement, tailored to what the content demands.

You have some high-profile partners with whom you have produced AI-generated audiobooks, including Melania Trump. Your voice portfolio includes Sir Michael Caine, Liza Minelli, and Art Garfunkel. What attracts such celebrities to work with ElevenLabs?

One thing that makes ElevenLabs unique has been our commitment to partnering with high-profile names and bringing them into the process — not just licensing their voice, but working with them as genuine collaborators in how their identity is used and protected.

These are people who have built extraordinary careers around their voice and their likeness, and many of them have watched AI be used to replicate voices without consent. There's a real recognition that this change has arrived, and a desire to be part of shaping how it's used responsibly rather than reacting to it after the fact. Working with a company that takes IP protection seriously, and that is committed to ethical, fully licensed data, matters to them. The consumer opportunity is interesting too: there’s something unique about listening to Michael Caine read a British novel to you, or Judy Garland’s voice reading a production of the Wizard of Oz.

There's also a new commercial opportunity here that didn't exist before. Our Iconic Voices marketplace is a good example — it allows high-profile voices to share their likeness or voice replica for commercial use, earning royalties from its limited and verified use in advertising, audiobooks or other mediums. For many of these figures, that's an entirely new model for how their voice and identity can be monetized in the AI era, on their own terms.

"Voice libraries on ElevenLabs become strategic assets for publishers"

Professional narrators are increasingly feeling the pressure. How does ElevenLabs meet the ethical requirement of complementing human labour rather than replacing it?

We’ve worked hard to bring voice actors into the process and allow them to earn from the use of the voices they share. It is something we think about a lot and care deeply about getting right. Our Voice Marketplace is probably the most concrete example: narrators can earn every time their voice is used, and we've paid out over $11 million to voice actors to date. This is real income from an entirely new revenue stream that didn't exist before.

Beyond that, we're seeing voice libraries on ElevenLabs become strategic assets for publishers, who license voice clones from prominent actors and authors in their network and deploy them across multiple titles.

You plan to transparently label AI-generated content. How will you ensure that this labelling does not lead to a devaluation, but rather to a new content category in the market?

Transparent labeling is something we're committed to and already practice. In our partnership with Spotify, we've ensured that AI narration is labeled as such when new titles are distributed. That aligns with the APA's guidelines for how AI narration should be marketed, and we believe it's simply the right thing to do so consumers understand what they're listening to.

But beyond that, I think the framing matters. AI narration isn't a lesser version of human narration, it's a different product category that unlocks things that were never possible before: instant multilingual editions, personalized narration, dynamic voices, consumer choice, accessibility at scale. As those benefits become more tangible to consumers, the label may overtime signal innovation rather than quality compromise.

With production costs falling, a new door is opening for self-publishers. What support does ElevenLabs offer to raise the quality of their works to a professional level?

This is an area we're very excited about. Less than 10% of books exist in audio overall, but for self-published authors that number drops to <1%.

We want to level that playing field, and we've built an end-to-end pathway to do it. Authors can produce studio-quality narration through ElevenCreative, with professional post-processing built in on export, and distribute directly to Spotify, InAudio, or ElevenReader. On ElevenReader, they earn 60% royalties — higher than most traditional distribution models — and get access to audience trend data to understand how their content is actually performing.

We think professional audiobook production should be accessible to everyone, and have been pleased to see the adoption from the self-publishing community – dating back to ElevenLabs’ earliest days.

"It won't be a binary choice between fully AI or fully human"

You just mentioned that up to 90% of publisher's book content has not yet been made available as audiobooks. How will this change by 2030?

By 2030, I think the landscape looks dramatically different. Every new book will be published in sync with an audio version – that'll become the standard. Full backlist catalogs will have been transformed. Self-published authors will have real, accessible pathways to audio. And the gap will have shrunk significantly, very likely approaching an inversion where the majority of books do have an audio counterpart. Consumers will have more choice about where to find books and how to listen. The infrastructure to make that happen is being built right now.

Do you see a future in which publishers use "hybrid models" – for example, a prominent human narrator for the introduction, while the remaining 500 pages of non-fiction text are handled by AI?

Publishers are going to have a lot of options in the years ahead, and what's exciting is that we're giving them the tools and flexibility to decide what makes the most sense for their audience, their content, their authors, and their narrators. There's no single right answer — different titles, different formats, and different commercial goals will call for different approaches.

A hybrid model is certainly one of those options, and a compelling one. Some publishers are already building libraries of voice clones from voice actors in their network, or from authors who want to narrate their own books. That opens up a range of possibilities — an author could record a short introduction in their own voice, with AI handling the rest consistently across hundreds of pages, or even in their voice across other languages through literary translation. Things that simply weren't economically viable before.

But the broader point is that it won't be a binary choice between "fully AI" or "fully human." Publishers will have the ability to mix and match based on what serves the content and the listener, and our role is to make sure every option along that spectrum is accessible, high-quality, and easy to execute.

Beyond that, does the audiobook with human narrators still have a future?

I think it does, yes. From my view, I can see a bifurcation of the market, similar to what happened in other industries when transformative technology arrived. Human narration will become even more premium and valued – a signal of craft and prestige — while technology makes content more widely available and accessible. In a way, AI enables that shift: when publishers can easily scale and generate revenue from their entire catalog in audio, they may gain more resources and creative bandwidth to invest in human-narrated productions that can become cultural moments and define their brand.

Looking a few years ahead, what will the audiobook ecosystem look like in 2030? Will there still be a clear difference between AI-narrated and human-narrated formats?

At the end of the day, consumers want the best stories in the highest quality audio available. They want to be brought into the story — and how that audio is produced, whether single narrator, full cast, AI, human, or some blend of all of those, becomes secondary to the experience itself.

What I do think gains real prominence is high-production, immersive audio: full cast, rich sound design, the kind of experience we're already seeing with productions like the recent Harry Potter audiobooks. That format showcases what the medium can do at its best, and it'll likely blend human and AI talent in ways that are increasingly seamless.

The more interesting question is what entirely new formats emerge: personalized narration, conversational audio, real-time translation. The technology is going to unlock creative experimentation that wasn't economically viable before, and publishers who lean into that will find new ways to reach listeners they couldn't reach before. By 2030, the conversation will probably be less about AI versus human, and more about what kinds of audio experiences are resonating with audiences and why.

Madeline Shue is Head of Publisher Partnerships at ElevenLabs, the world’s leading AI audio research and product company.. She leads the company’s end-to-end audiobook production and distribution services, powered by ElevenLabs’ natural AI voices and enabled through ElevenReader, its integrated distribution platform. Madeline works closely with publishers and authors to unlock new creative and commercial opportunities in audio—faster production, serialized formats, immersive experiences with multicast, music, and sound design, and global reach through multilingual capabilities. She was previously was a technology investor at Bessemer Venture Partners.