Sarvam AI

Saaras V3

Sarvam Speech to Text API

Transcribe noisy audio with high accuracy across 22 Indian languages. Real-time streaming, native code-switching, speaker diarization.

Transcript

Trusted by leading teams

Production-grade automatic speech recognition

Handles noise, accents, and code-switching. Go live in under 30 minutes.

Streaming-first architecture

Sub-150ms time to first token. Configurable Accurate, Balanced, and Fast modes for every latency requirement.

Code-switching & noise robust

Trained on 1M+ hours of real-world audio. Handles code-mixed speech, noisy telephony, and diverse accents.

Drop-in SDKs

Go live in under 10 minutes with official Python and Node.js SDKs. Pipecat & LiveKit ready.

23 Indian languages

All 22 scheduled languages plus English. Unified multilingual model with automatic language detection.

Beyond raw transcripts

Speaker diarization, word-level timestamps, output format control, and automatic language detection built in.

Powering real-world audio experiences

From contact centers to voice agents. Real use cases, already in production.

Code Mixing

Seamless code-mixing

Understands when speakers switch between Hindi, English, and regional languages mid-sentence.

Cross-language detection

Mid-sentence switching

Natural transcription

Call Center

Telephony-optimized

Handles real call center audio: 8kHz, background noise, multiple speakers.

8kHz audio support

Multi-speaker handling

Call center grade

Noisy Audio

Handle noisy audio

Background noise, cross-talk, poor connections. Our models maintain accuracy even in challenging acoustic conditions.

Noise robust

Cross-talk handling

Poor connection tolerant

Developer-first platform

Drop-in SDKs for Python and Node.js. Go from zero to first transcription in under 5 minutes.

REST & WebSocket APIs

Standard REST for batch transcription, WebSocket for real-time streaming with sub-150ms time to first token.

SDKs & libraries

Official Python and Node.js SDKs with TypeScript support. pip install sarvam-ai.

Streaming modes

Choose Accurate, Balanced, or Fast modes depending on your latency vs. accuracy needs.

Free tier included

Start building immediately. No credit card, no sales call, no minimum commitment.

from sarvamai import SarvamAI

client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")

response = client.speech_to_text.transcribe(
    file_path="audio.wav",
    language="hi-IN",
    model="saaras:v3"
)

print(response.transcript)

for word in response.words:
    print(f"[{word.start:.2f}s] {word.text}")

Battle-tested at scale

Saaras v3 runs in production across call centers, voice agents, and live applications.

<250msMedian latency
100M+Minutes transcribed
>99.5%Uptime
22Indian languages

Works with your stack

Plug Sarvam ASR into LiveKit, Pipecat, n8n, and more. Pre-built integrations, ready to go.

Enterprise-ready. Data stays in India.

Compliance, control, and data sovereignty. Not bolted on. Built in from day one.

No training on your data

Your API inputs are never used for model training. Zero data retention after processing unless you explicitly request it.

  • Data deleted after processing by default
  • Opt-in retention with configurable TTL
  • Separate data and model training pipelines
  • Full DPDP compliance

Deploy on your terms

All processing happens within India. No cross-border transfers. For regulated workloads, we support VPC and on-premise deployment.

  • India-only data processing
  • VPC and on-premise options
  • Consent-based voice cloning
  • Content safety filters built in

Security and governance

Every API call is logged and traceable. Role-based access, audit trails, and data residency controls built into the platform.

SOC 2 Type IIISO 27001DPDP compliantRole-based accessFull audit trailData residency controls

Simple, transparent pricing

Start free. Scale as you grow. No hidden costs.

Base plan

₹1.5 per minute

Free trial included

No credit card required. Get API keys instantly.

Volume discounts available
Enterprise pricing available
Flexible pricing plans
Usage analytics
Integration with APIs
Best for startups

Frequently asked questions

Start building with India's best speech recognition. Get API keys in 30 seconds.