Sarvam AI

Bulbul V3

India's most accurate Text-to-Speech API

Lowest character error rates across Indian languages. 25+ natural voices, 11 languages, sub-250ms streaming.

Trusted by leading teams

Built for real workloads, not demos

Production-grade TTS with predictable latency, enterprise SLAs, and developer-first APIs.

Low latency streaming

Sub-250ms first byte with WebSocket streaming for real-time voice applications

Configurable controls

Fine-tune voice pace, expressiveness, and tone to match your brand

Plug-and-play integrations

Deploy a voice agent in under 10 minutes with SDKs for Python and Node.js

11 Indian languages

Native support for Hindi, Tamil, Telugu, Bengali, Marathi, and more

35+ unique voices

Choose from a wide range of voices across different styles and tones

Built for every use case

From voice agents to content platforms. Real use cases, already in production.

Mann Ki Baat

Dubbing & localization

Natural voiceovers for multilingual media and public communication.

Public announcements

Educational content

Marketing promos & ads

Podcast and informational videos

Customer Interaction

Voice agents

Real-time, human-like speech for customer-facing and internal agents.

Customer support

Sales & lead qualification

Edtech tutors

Social & companion bots

Training & Education

Enterprise training & communications

Clear, consistent voice for structured, informational content.

Company-wide announcements

Product walkthroughs

Employee training & enablement

Tata Capital

"Our partnership with Sarvam has enabled us to scale highly personalized, multilingual conversations across the customer lifecycle."

Shallu Kaushik

Shallu Kaushik

Chief Digital Officer, Tata Capital

The most accurate text to speech for Indian languages

Bulbul V3 delivers the lowest character error rates, outperforming global competitors across every category.

Listener preference rate (8kHz)

Higher is better

Competitor win rate
Tie rate
Bulbul V3 win rate

ElevenLabs Flash V2.5

10.37
11.68
77.95

ElevenLabs V3 Alpha

28.14
28.21
43.64

Cartesia Sonic-3

29.43
30.49
40.08
0%20%40%60%80%100%

Hear the difference

Expressive, accurate voices built for every Indian language.

Expressive

Emotion-rich and human-like voices

Code-switching

Effortless language switching

Pronunciation

Authentic pronunciation of Indian names

Abbreviations

Natural abbreviations, acronyms and numbers

Developer-first platform

OpenAI-compatible APIs. Drop-in SDKs for Python and Node.js. Go from zero to first audio in under 5 minutes.

REST & WebSocket APIs

Standard REST for batch, WebSocket for real-time streaming with sub-250ms first byte.

SDKs & libraries

Official Python and Node.js SDKs with TypeScript support. pip install sarvam-ai.

Complete documentation

Interactive API reference, code samples, and integration guides for every endpoint.

Free tier included

Start building immediately. No credit card, no sales call, no minimum commitment.

from sarvamai import SarvamAI

client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")

# Digitize a document
response = client.document_digitization.digitize(
    file_path="invoice.pdf",
    language="en-IN",
    output_format="md"
)

# Access extracted content
for page in response.pages:
    for block in page.blocks:
        print(f"[{block.layout_tag}] {block.text}")

Enterprise-ready. Responsible AI.

Built with safety, compliance, and data sovereignty at the core.

SOC 2 Type II & ISO 27001

Enterprise-grade security certifications. Annual audits, documented controls, continuous monitoring.

Data sovereignty

All data processed and stored in India. No cross-border transfers. Full compliance with Indian data regulations.

No training on your data

Your API inputs are never used for model training. Zero data retention after processing unless explicitly requested.

Consent-based voice cloning

Voice cloning requires verified consent from the voice owner. Built-in safeguards against unauthorized use.

Content safety filters

Automated detection and filtering of harmful, abusive, or misleading content before speech generation.

Audit-ready logging

Comprehensive API usage logs, access controls, and RBAC for enterprise governance and compliance reporting.

Simple, transparent pricing

Start free. Scale as you grow. No hidden costs.

Base plan

₹30 for 10K characters

Free trial included

No credit card required. Get API keys instantly.

Volume discounts available
Enterprise pricing available
Flexible pricing plans
Usage analytics
Integration with APIs
Best for startups

Frequently asked questions

Our Text to Speech API powered by Bulbul v3 supports 11 Indian languages: Hindi, Bengali, Tamil, Telugu, Gujarati, Kannada, Malayalam, Marathi, Punjabi, Odia, and English (Indian accent). Each language supports multiple speaker voices with different characteristics.
With our latest Bulbul v3, we offer 35+ distinct speaker voices. These are our top speaker voices: Aditya, Ritu, Priya, Neha, Rahul, Pooja, Rohan, Simran, Kavya, Amit, Dev, Ishita, Shreya, Ratan, Varun, Manan, Sumit, Roopa, Kabir, Aayan, Shubh, Ashutosh, Advait, Amelia, and Sophia - a significant upgrade from the previous version.
Bulbul v3 provides control over voice parameters including pace (0.5x to 2x speed) and temperature (0.01 to 1.0) for fine-tuned output quality. Text preprocessing is automatically enabled for better handling of numbers, dates, currencies, and mixed-language content.
The TTS API supports 8 audio formats: MP3, WAV, AAC, OPUS, FLAC (lossless), PCM (LINEAR16), MULAW (μ-law), and ALAW (A-law). You can also configure sample rates at 8kHz, 16kHz, 22.05kHz, or 24kHz depending on your quality requirements.
We offer two API types: REST API for instant audio generation (best for quick conversions up to 500 characters), and Streaming API via WebSocket for real-time, low-latency audio generation ideal for voice agents and live applications. Streaming supports up to 2,500 characters per request.

Start building with India's best TTS. Go live in minutes.