Deepgram debuts Aura-2 TTS to tackle real-time enterprise voice AI challenges

AI

New TTS model aims to outperform ElevenLabs and OpenAI with low-latency, natural speech for professional applications

Aura-2 is Deepgram’s latest push into enterprise-grade voice AI, designed to deliver scalable, context-aware speech for critical business interactions.

The company says it is targeting a gap in the market where entertainment-driven TTS tools fall short for professional use.

Enterprise-ready performance and deployment flexibility

Deepgram, a voice AI platform known for its speech-to-text (STT) and speech-to-speech (STS) capabilities, has announced the launch of Aura-2—a next-generation text-to-speech (TTS) model designed for customer support, virtual agents, and enterprise AI assistants. The model runs on Deepgram Enterprise Runtime and supports deployment via cloud, private cloud, or on-premises environments.

According to Deepgram, Aura-2 delivers under 200 milliseconds latency and handles thousands of concurrent requests, making it suitable for businesses requiring both speed and scale. The platform includes features like model hot-swapping, real-time personalization, and extreme compression to minimize compute load.

“Aura-2 delivers the perfect balance of natural speech and enterprise-grade accuracy,” says Scott Stephenson, CEO of Deepgram.

These capabilities help organizations streamline interactions while enabling greater workforce agility through automation and scalable voice infrastructure.

Clarity and control for real-world use

Preference testing shows Aura-2 outperforming competitors such as ElevenLabs, Cartesia, and OpenAI nearly 60 percent of the time in enterprise scenarios. The system supports 40+ professional voice personas and handles domain-specific terminology without manual tagging. It also adjusts pacing, tone, and emphasis automatically depending on context, whether it’s delivering a phone number or handling a support escalation.

Nikhil Gupta, CTO at Vapi, says the unified voice stack is key: “Having both STT and TTS from a single provider significantly reduces integration complexity and latency.”

Deepgram offers a flat rate of $0.030 per 1,000 characters, undercutting ElevenLabs Turbo and Cartesia Sonic, and includes access to all available voices at no extra cost.

Infrastructure tailored to enterprise

Aura-2 benefits from Deepgram's custom-built infrastructure laye, Deepgram Enterprise Runtime (DER), which supports automation, compression, model customization, and security compliance. DER allows for model adaptation and runs symmetrically across public cloud, private VPC, and on-prem environments.

Natalie Rutgers, VP of Product at Deepgram, says the shared runtime environment enables continuous learning across tools: “Aura-2 directly leverages our acoustic models and pronunciation datasets to deliver precise, industry-specific speech synthesis in real time.”

Integrated voice AI stack

Aura-2 is part of Deepgram’s broader strategy to unify voice applications within a single infrastructure. Its integration with Nova-3 for STT and the Voice Agent API for conversational AI allows shared learning across the voice stack, reducing latency and enhancing consistency across applications.

Bernardo Aceituno, Co-Founder at Stack AI, describes the impact: “Aura-2 sets a new bar for enterprise-grade TTS. The clarity, consistency, and low latency it delivers have been game changers for our AI agent experiences.”

Stephenson adds: “Our customers need more than just voices that sound good—they need voices that communicate precisely and reliably in professional contexts.”

Previous
Previous

Amazon funds next-gen tech talent with $16 million in computer science scholarships and internships

Next
Next

Subak’s new Changemaking Programme connects climate action with data, policy, and AI