Question 1

What languages does Saaras V3 support?

Accepted Answer

Saaras V3 supports all 22 scheduled Indian languages plus English within a unified multilingual model. This includes Hindi, Bengali, Tamil, Telugu, Gujarati, Kannada, Malayalam, Marathi, Punjabi, Odia, Assamese, Konkani, Maithili, Dogri, Santali, Kashmiri, Sindhi, Nepali, Manipuri, Bodo, Sanskrit, and Indian-accented English. It also handles code-switching between language pairs natively.

Question 2

What are the streaming latency modes?

Accepted Answer

Saaras V3 offers configurable streaming modes: Realtime Fast delivers sub-150ms time to first token for voice agents; Realtime Balanced provides strong accuracy with reduced latency; Realtime Accurate optimizes for transcription fidelity. A simulated streaming mode via VAD + batch is also available today via WebSocket API.

Question 3

Can it handle noisy audio and telephony?

Accepted Answer

Yes. Saaras V3 is trained on 1 million+ hours of curated multilingual audio spanning diverse acoustic conditions, accents, and recording quality. It maintains high accuracy across telephony (8kHz), field recordings, ambient noise, crosstalk, and mobile audio.

Question 4

What audio formats and sample rates are supported?

Accepted Answer

The STT API accepts WAV, MP3, FLAC, OGG, WebM, and raw PCM audio. Supported sample rates include 8kHz (telephony), 16kHz, 22.05kHz, 44.1kHz, and 48kHz. Audio is automatically resampled if needed.

Question 5

What structured output features are available?

Accepted Answer

Saaras V3 provides word-level timestamps, speaker diarization with stable attribution across conversational turns, automatic language detection, and fine-grained output format control for text and numeral formatting. These features work in both batch and streaming modes.

India's most accurate
Speech-to-Text API

Built for real workloads, not demos

Streaming-first architecture

Code-switching & noise robust

Drop-in SDKs

23 Indian languages

Beyond raw transcripts

Streaming-first architecture

Code-switching & noise robust

Drop-in SDKs

23 Indian languages

Beyond raw transcripts

Same audio, multiple formats

Transcribe

Translate

Transliteration

Verbatim

Powering real-world audio experiences

Seamless code-mixing

Telephony-optimized

Handle noisy audio

Developer-first platform

Enterprise-ready. Responsible AI.

SOC 2 Type II & ISO 27001

Data sovereignty

No training on your data

PII redaction

Content safety filters

Audit-ready logging

Simple, transparent
pricing

Frequently asked questions

What languages does Saaras V3 support?

What are the streaming latency modes?

Can it handle noisy audio and telephony?

What audio formats and sample rates are supported?

What structured output features are available?

India's most accurate Speech-to-Text API

Built for real workloads, not demos

Streaming-first architecture

Code-switching & noise robust

Drop-in SDKs

23 Indian languages

Beyond raw transcripts

Streaming-first architecture

Code-switching & noise robust

Drop-in SDKs

23 Indian languages

Beyond raw transcripts

Same audio, multiple formats

Transcribe

Translate

Transliteration

Verbatim

Powering real-world audio experiences

Seamless code-mixing

Telephony-optimized

Handle noisy audio

Developer-first platform

Enterprise-ready. Responsible AI.

SOC 2 Type II & ISO 27001

Data sovereignty

No training on your data

PII redaction

Content safety filters

Audit-ready logging

Simple, transparent pricing

Frequently asked questions

What languages does Saaras V3 support?

What are the streaming latency modes?

Can it handle noisy audio and telephony?

What audio formats and sample rates are supported?

What structured output features are available?

India's most accurate
Speech-to-Text API

Simple, transparent
pricing