SARVAM - TRANSLATE

Now supporting 22 Indian languages and structured long-form text

Download the model from Hugging Face, try it on our playground, and build with our APIs.

Making content available across languages has been a key consideration for digital accessibility. While translation systems have continued to evolve, there remains work to be done in three broad directions - one, support for more languages; two, support for more natural translation of stylised (such as idiomatic) long-form text; and three, support for structured text in different formats. These formats could be varied depending on the source such as a math textbook with equations, or a web page with HTML code around content, or an output of digitising an image with potential OCR-related errors.

Although multilingual large language models have demonstrated the ability to do long-form translation, their performance on Indian languages still trails behind. At Sarvam, we’re working to address this by focusing on the 22 Scheduled Indian languages, with an emphasis on more natural translation while supporting long-form structured content.

We are happy to share that, in partnership with AI4Bharat, we have made a significant step forward in this with our latest model, Sarvam-Translate. Trained by fine-tuning Gemma3-4B-IT, Sarvam-Translate supports 22 Indian languages - Hindi, Bengali, Marathi, Telugu, Tamil, Gujarati, Urdu, Kannada, Odia, Malayalam, Punjabi, Assamese, Maithili, Santali, Kashmiri, Nepali, Sindhi, Dogri, Konkani, Manipuri (Meitei), Bodo, Sanskrit. It supports paragraph-level translation for 22 languages, and supports translating diverse structured content for 15 languages.

In human evaluation by language experts, Sarvam-Translate is identified to be significantly better than much larger models such as Gemma3-27B-IT, Llama4 Scout, and Llama-3.1-405B-FP8. Further, in automated evaluation on ability to conform to structured long-form content, Sarvam-Translate shows high accuracy (> 4.9 over 5) for 15 languages.

Sarvam-Translate is available to try out and use in your applications on our API store. Further, to enable others to use and build upon the model, we are releasing an open-weights model on Hugging Face. This continues our commitment to build in the open for enabling a sovereign AI ecosystem for India.

Below is a feature matrix for each language supported by Sarvam-Translate.

Language	Sentence	Paragraphs	Structured Document
Hindi	✅	✅	✅
Bengali	✅	✅	✅
Marathi	✅	✅	✅
Telugu	✅	✅	✅
Tamil	✅	✅	✅
Gujarati	✅	✅	✅
Urdu	✅	✅	✅
Kannada	✅	✅	✅
Odia	✅	✅	✅
Malayalam	✅	✅	✅
Punjabi	✅	✅	✅
Maithili	✅	✅	✅
Assamese	✅	✅	✅
Sanskrit	✅	✅	✅
Santali	✅	🟨	❌
Kashmiri	✅	🟨	❌
Nepali	✅	✅	✅
Konkani	✅	✅	🟨
Sindhi	✅	🟨	🟨
Dogri	✅	🟨	❌
Bodo	✅	🟨	🟨
Manipuri	✅	🟨	❌

‍

Example Use-Cases of Sarvam-Translate

Below, we share examples to highlight the different ways in which Sarvam-Translate could be used.
‍

Translating web pages without breaking the HTML structure

Translating web content is one of the central use-cases of translation models. However, extracting, translating, and reinserting text is often error-prone and tedious. Sarvam-Translate streamlines this by translating only the visible textual content, preserving all HTML tags and structure. As seen in the image below various elements including emphasis etc are maintained in the text.

‍

Translating LaTeX documents while preserving syntax

Academic and technical documents in LaTeX combine human-readable text with formatting and command syntax. Maintaining the integrity of LaTeX code during translation is challenging. Sarvam-Translate identifies and translates only the human-readable content while preserving LaTeX syntax and structure. In the example below tables, and formatting are retained. Also notice the model's choice of retaining author names in cited papers in English.

‍

Translating chemistry documents

Chemistry documents often mix specialised chemical notations and equations. Ensuring these remain unaltered during translation requires fine control. Sarvam-Translate accurately translates the surrounding text while preserving chemical equations and formatting. Also notation such as x and y are retained in Roman characters.

‍

Translating idioms, slang and cultural references

Idiomatic expressions, figures of speech, and culturally specific phrases often lose meaning when translated literally. We find that Sarvam-Translate often produces translations that preserve the original tone, intent, and nuance. Notice in the following how several idiomatic phrases are correctly translated. For example, 'being behind the eight ball' an analogy from the game of pool meaning being in trouble is translated appropriately as 'జయేష్ అప్పటికే చాలా కష్టాల్లో ఉన్నాడు' inTelugu.

‍

Translating folk tales while preserving cultural nuance

Narrative text such as folk tales and fiction carries cultural depth and stylistic nuance. Capturing this while maintaining readability across languages is complex. Our model handles narrative flow, cultural expressions, and stylistic elements to produce faithful translations. For example, in Maithili the phrase "One who interferes in other's work" has been translated to "जे दोसरक काजमे टाँग अड़ाबैत अछि" which is an appropriate use of a common idiom.

‍

Translating social media posts with slang & emojis

Social media posts frequently contain informal language, slang, emojis, and unconventional structure, posing unique translation challenges. We find that Sarvam-Translate is often able to handle such content, accurately conveying both meaning and tone. Notice in the example below both usage of various emojis and stylised content. Example "Wut u doing rn?" is translated appropriately in Kannada as "ಏನ್ ಮಾಡ್ತಾ ಇದೀಯಾ?"

‍

Translating subtitle files while maintaining timing and formatting

Subtitle (SRT) files require precise alignment of translated text with timing and formatting cues. Our model translates only the spoken dialogue while maintaining subtitle structure and synchronization. This can enable a single API click to enable accessible SRT files.

‍

Translating documents with embedded foreign-language text

Multilingual documents often contain embedded foreign-language text that in most situations is expected to remain untranslated. Sarvam-Translate demonstrates the ability to identify such segments and preserve them while translating surrounding content. In the example below, we have traditional Chinese characters which are preserved.

‍

Translating legal documents with precision

Legal texts demand high precision in terminology and structure. Translating such documents requires preserving legal references, clauses, and formatting. We find that Sarvam-Translate is able to ensure accuracy and consistency suitable for professional legal use. For example, consider the complex sentence "Upon a meticulous perusal of the statutory scheme, the legislative history of the Parent Act, and the precedents cited at the Bar, this Court is disinclined to accede to the appellant's submissions". The model is able to translate this in Assamese as "বিধিবদ্ধ আঁচনি, মূল আইনৰ বিধিবদ্ধ ইতিহাস, আৰু বাৰত উল্লেখ কৰা পূৰ্ব উদাহৰণসমূহৰ এক নিখুঁত পৰ্যালোচনাৰ পিছত, এই আদালতে আপীলকাৰীৰ যুক্তিসমূহ মানি ল'বলৈ অনিচ্ছুক" which is a more natural sentence structure and correctly conveys the technical content.

‍

Translating code files while protecting syntax

Translating code files requires distinguishing between executable code and human-readable text, such as comments or documentation. We find that Sarvam-Translate selectively translates natural language content while leaving code unchanged. For example, the comment "Function to perform Bubble Sort" is translated in Bengali as "বাবল সর্ট করার ফাংশন" retaining technical terms like bubble sort without undue complicated translation.

Given the generic abilities of the model, we believe new use-cases can be unlocked. We encourage you to try Sarvam-Translate and share your findings with us on social media or our discord.

In the remainder of this blog, we will go through some technical details in terms of evaluation and training of the model.

Automatic Evaluation on Structured Content

To test the abilities of Sarvam-Translate, we conducted a large-scale evaluation that spans multiple languages, document styles, and content formats. For this evaluation, we used a curated dataset of articles covering a range of real-world formats and content types. The dataset included GitHub Markdown files, scanned PDFs that were digitized into Markdown using Vision-Language Models (VLMs), documents containing mathematical equations written in LaTeX, and chemistry documents featuring complex chemical notations. It also incorporated code files with embedded comments and documentation, as well as web page content extracted from HTML. We created 1,000 documents in each of these categories.

Given the scale of the evaluation, conducting human reviews for every document type and language pair would be impractical. To address this, we used Gemini Flash 2.5 to perform automatic evaluations. For each document type, we designed prompts that direct Gemini to focus on the specific aspects of translation quality that matter most for that format. For example, the evaluation criteria for math equations differ from those for code or HTML. We describe these criteria below.

Markdown Content (GitHub)
Goal: Ensure that the translated document preserves the Markdown structure including headings, bullet points, links, code blocks, and that the translated content fits naturally within the structure. The formatting should remain exactly as in the source.

Digitized Markdown (VLM / OCR output)
Goal: Evaluate how robust the translation is when the source document comes from a digitized, OCR-extracted source which may contain slight errors or inconsistencies. The structure (tables, headings) should be preserved, and the translation should handle noise gracefully without introducing additional errors.

‍Math Content (LaTeX equations)
Goal: Ensure that LaTeX equations are preserved exactly in the translated document, while surrounding text is translated naturally. No part of the equation syntax should be altered or corrupted. The translation must maintain the integrity of mathematical notation.

Chemistry Content (Equations, Symbols)
Goal: Validate that chemical equations, including subscripts, superscripts, arrows, and special symbols, are retained correctly. The text around the equations should be translated naturally, while chemical notations should remain untouched and precisely formatted.

Code Content (Code with Comments)
Goal: Ensure that code remains exactly the same in the translated document, and that only the comments and documentation are translated. The programming syntax must not be altered. The evaluation should also check that no unintended changes (indentation, special characters) were introduced.

HTML Content (Web pages)
Goal: Verify that HTML tags and structure are preserved exactly. The visible text content within the tags should be translated fluently, but the HTML itself should remain unchanged. If certain elements (e.g. italics, bold) are present in the original, the corresponding translation should maintain the same styling.

‍

Evaluation Results

The table below summarises evaluation scores (on a scale of 5) across languages averaged for different content types described above:

‍

Human Evaluations

While automatic evaluations are useful at scale, human evaluations are necessary given the subjectivity of translation quality and limited abilities of frontier models in specific languages. To do this, we curated 100 English documents covering a diverse mix of content types, including technical material such as scientific, mathematical, and chemistry-based content; informal and spoken text drawn from speech transcripts and conversational blogs; structured content such as Markdown documents, HTML pages, and code snippets; and formal content like news articles and textbook excerpts.

These documents were translated using Sarvam-Translate, as well as with leading open-source LLMs, such as Gemma3-27B-IT, Llama-3.1-405B-FP8 and Llama4 Scout. The translated outputs were then evaluated by professional human annotators. The evaluators were professional language experts, each with multiple years of professional experience in translation creation and validation, and with deep proficiency in both English and their target Indian language.

The human evaluators assessed the translations on several key dimensions, including fluency, adequacy, faithfulness to the source structure, and inclusivity. They were shown two translations at random and asked to pick if one is more preferred or both are equally preferred. The results of the human evaluation are summarised in the following tables.
‍

Gemma3 27B ITvs Sarvam Translate
‍

‍

Llama 4 Scout vs Sarvam Translate
‍

‍

Llama 3.1 405B FP8 vs Sarvam Translate
‍

Across all Indian languages, Sarvam-Translate consistently outperformed other models, particularly in its ability to handle structured content, maintain coherence over longer contexts, and deliver inclusive and culturally sensitive translations.

‍

How Sarvam-Translate was trained

This journey has been years in the making, built on sustained effort and deep expertise in developing Indian language technologies in the open source. We invested heavily in building robust data-cleaning pipelines and sophisticated annotation workflows to ensure the highest quality datasets. We also leveraged the Gemma 3 open-source models which provided the best starting point to build Sarvam-Translate in comparison to any other model.

‍

Data

Sarvam-Translate was trained on a rich and diverse dataset of translation pairs between English and 22 Indian languages. This dataset combines multiple sources. First, we incorporated cleaned data from past open-data efforts, including BPCC, which itself contains both mined and manually validated data. We carefully cleaned this data using robust internal pipelines. Second, we generated new translation pairs from carefully curated English source content. This spanned a wide range of domains: scientific and historical content, conversational and modern text, and structurally complex formats such as code, LaTeX, HTML, and chemistry equations. In this process, we recognised the need for very high quality filters on the data. Even large models such as Llama-3.1-405B-FP8make many errors in generating output in Indian languages.

‍

Training Process

We trained Sarvam-Translate on top of Gemma3-4B-IT. We fine-tuned this in a two-stage process. In the first stage, we fine-tuned the full model on a larger dataset with broad coverage, including some noisier but domain-diverse data to establish wide-ranging translation capability. This is also required to provide language ability to the model in languages it is not already fluent in. In the second stage, we used LoRA to fine-tune the model further on a smaller, highly curated, format-diverse dataset, paying careful attention to format preservation and style consistency. Through various ablations we found this two-stage process to be effective.

‍

Inference Efficiency

We were able to quantise Sarvam-Translate using Post-Training Quantization (PTQ), leveraging a large and diverse calibration dataset to ensure robust 8-bit inference performance. The inference system is finely tuned to run on NVIDIA NIM with the TensorRT engine, utilising full FP8 kernels for improved throughput and efficiency. This optimised set up is available for use in our API store and can be tried in the dashboard.

‍

Known Limitations

While this model supports 22 languages across a variety of tasks, performance can vary depending on the language. These differences stem from the balance of pre-training data, post-training resources, and each language’s representation in the tokeniser. Document translation is a key capability, but performance is more limited for certain languages such as Bodo, Dogri, Kashmiri, Manipuri, Santali, Sanskrit, and Sindhi, where we have observed lower translation quality and occasional incomplete outputs.

For better-supported languages, the model performs well on most document formats. However, it has not been extensively trained on long-form LaTeX or HTML documents. As a result, it may sometimes miss tags or other structural elements in very large .tex files or .html files. To maintain accuracy, we recommend splitting large files of code or latex into smaller sections and then translating individual sections, when possible.

In addition, we have infrequently observed that some outputs may include transliterations or code-mixed segments, particularly in low-resource or highly inflected languages.

‍

Conclusion

Translation technology has progressed substantially over the past few years. With Sarvam-Translate, we extend these advancements to 22 Indian languages, ensuring their representation across a wide range of content types.

What makes this achievement especially meaningful is the model’s ability to handle real-world, mixed-format documents such as Markdown, HTML, scientific notation, code, and more. The model not only preserves structure but also respects context, style, and gender nuances, ensuring that every translation feels authentic and natural.

We believe that this is a key enabler for:

• Making the web more accessible in Indian languages
• Supporting education and research in native languages
• Empowering government and public services to reach citizens in their preferred language
• Catalyzing the creation of Indian language digital content at scale

This is just the beginning. As content formats continue to evolve, our mission remains the same: to ensure that Indian languages are well-represented in the digital landscape.

We remain committed to advancing this work through collaboration with the open-source community, researchers, industry, and government partners, making high-quality translation accessible for all 22 Indian languages.

We invite you to explore Sarvam-Translate, try it out, give us feedback, and join us in shaping the next frontier of Indian language technology.

-- Draft Elements --

BULBUL

Meera

Professional and articulate

Arvind

Conversational and articulate

Maitryee

Engaging and informational

Amol

Narrational and mature

Pavithra

Dramatic and engaging

Amartya

Expressive and distinct

E-commerce support

E-commerce requires clear communication of order details, prices, and delivery timelines, often mixing English terms with regional languages. Pick a voice for your brand and keep it consistent across all your communications and languages.

TTS Input: "Your order will be delivered in 2 days""Your order for 2 pairs of Allen Solly jeans and 1 Nike T-shirt has been confirmed. Total price: ₹3,999. Your order will be delivered in 2 days."

Hindi

Kanada

Odia

Telugu

Fintech Applications:

Financial services demand precise pronunciation of monetary values and financial terms, often involving large numbers and specialized vocabulary.

TTS Input: "Your account balance is ₹10,435.26. Kya aap ek FD open karna chahenge?"

Hindi

Punjabi

Tamil

Healthcare Communication:

Healthcare communication requires accurate pronunciation of medical terms, dosages, and instructions, often involving complex terminology and precise numerical information.

TTS Input: "Namaste Sharma ji, Dr. Gupta ne aapko Metformin 500mg prescribe kiya hai. Ise daily two times, subah aur shaam ko khana ke baad lena hai. Kya aapko koi side-effects ka anubhav ho raha hai?"

Hindi

Multilingual Audiobooks:

Audiobooks require consistent voice quality across languages, natural code-mixing, and expressive narration to bring stories to life. Give a unique voice to your characters in the same language.

TTS Input: "भगवान कृष्ण कहते हैं, सुखी जीवन जीने और स्वर्ग प्राप्त करने के लिए तपस्या और दान जैसे कुछ कार्य करने चाहिए। पुण्य कर्म करने से अनजाने में किए गए पाप भी नष्ट हो जाते हैं। इस प्रकार मनुष्य को नरक में नहीं जाना पड़ता।"

Hindi

Bengali

E-Learning Platform

Educational content often involves technical terms, mathematical expressions, and the need to maintain student engagement through varied intonation.

TTS Input: "आज हम Einstein की Theory of Relativity के बारे में पढ़ेंगे। Theory कहती है कि समय और space एक दूसरे से जुड़े हुए हैं और इन्हें एक साथ space-time कहा जाता है। यह theory बताती है कि जब कोई object बहुत high speed से move करता है, तो उसके लिए time slow हो जाता है। इसे mathematically इस equation से express किया जा सकता है:

E = mc^2

जहाँ E energy है, m object का mass है, और c speed of light in vacuum है, जो लगभग 3 times 10^8 meters per second होती है। यह equation दिखाती है कि mass और energy interchangeable हैं और एक दूसरे में convert हो सकते हैं।"

Hindi

Multilingual news broadacasting

TTS Input with lots of abbreviation: "The ISRO (Indian Space Research Organisation) has successfully launched its latest satellite, GSAT-30, from the Satish Dhawan Space Centre. The satellite will enhance communication services across India. This achievement marks another milestone for ISRO following their earlier successful missions this year."

English

Tamil

Astrology Bot

Astrology applications need to convey mystical and predictive content with an appropriate tone and handling of astrological terminology.

TTS Input: "Namaste! Aaj aapka din shubh hai. Venus ki position se aapko aaj ek good news mil sakti hai. Office mein kisi senior se important task assign ho sakta hai. Stay confident!"

Hindi

Gujarati

Giving a Desi Touch to Google Maps:

Navigation services need to provide clear, timely instructions with accurate pronunciation of street names and landmarks.

TTS Input: “Head south on Netaji Subhash Marg toward Dayanand Road. In 12 meters, turn left onto Dayanand Road. Continue straight for 350 meters, passing the United Bank of India ATM on your left."

Hindi

Speak to your users via IoT

Smart home devices need to convey information clearly and handle queries in natural, conversational language.

TTS Input: "Good morning! It's 7:00 AM. The temperature today is 28 degrees Celsius, and the weather is very pleasant. You have a busy day ahead. Your first meeting is scheduled for 9:30 AM with the marketing team to discuss the upcoming campaign strategies.”

Marathi

Legal Documents

The powers of judicial review in the matters involving financial implications are also very limited. The wisdom and advisability of the Courts in the matters concerning the finance, are ordinarily not amenable to judicial review unless a gross case of arbitrariness or unfairness is established by the aggrieved party.

Key Feature: With Formal Mode, you can create legal documents in different Indic languages while maintaining the formal tone.

Colloquial mode now empowers millions of Indians to access these complex documents by translating it in colloquial Indic language.

Other Translation Models
‍वित्तीय निहितार्थ से जुड़े मामलों में न्यायिक समीक्षा की शक्तियाँ भी बहुत सीमित हैं। वित्त से संबंधित मामलों में न्यायालयों का ज्ञान और सलाह आम तौर पर न्यायिक समीक्षा के लिए अनुकूल नहीं होते हैं जब तक कि पीड़ित पक्ष द्वारा मनमाने या अन्यायपूर्ण का एक गंभीर मामला स्थापित नहीं किया जाता है।

Mayura (Formal + Preprocessing)
वित्त-संबंधी मामलों की समीक्षा करने के लिए न्यायपालिका की शक्ति काफी सीमित है। आम तौर पर, अदालतें वित्तीय मामलों में हस्तक्षेप नहीं करती हैं जब तक कि अन्याय या मनमाने ढंग से काम करने का स्पष्ट मामला न हो। यह आम तौर पर केवल तभी होता है जब निर्णय से प्रभावित व्यक्ति इसे साबित कर सकता है।

Mayura (Colloqiual + Preprocessing)
Judiciary की financial-related cases को review करने की power बहुत restricted है। आमतौर पर, courts financial matters में interfere नहीं करते हैं जब तक कि unfairness या arbitrariness का clear case ना हो। ये आमतौर पर तभी होता है जब decision से प्रभावित व्यक्ति उसे prove कर सके।

Unlock colloquial translation

I can help you sign up for our courses in just a few steps. Can you please provide your name and email address to get started?

‍

She's the GOAT when it comes to baking.

Formal
मैं कुछ ही चरणों में हमारे पाठ्यक्रमों के लिए साइन अप करने में आपकी मदद कर सकता हूँ। क्या आप कृपया अपना नाम और ईमेल पता प्रदान कर सकते हैं?

Colloquial
मैं आपको बस कुछ ही steps में हमारे courses के लिए sign up करने में मदद कर सकता हूँ। क्या आप अपना नाम और email address बता सकते हैं ताकि हम शुरू कर सकें?

Other Models
जब बेकिंग की बात आती है तो वह बकरी है।

Colloquial Mode:
वे बेकिंग में महारत रखती हैं, उनके केक शानदार होते हैं।
‍

Visual

E-commerce requires clear communication of order details, prices, and delivery timelines, often mixing English terms with regional languages.

TTS Input: "Your order for 2 pairs of Allen Solly jeans and 1 Nike T-shirt has been confirmed. Total price: ₹3,999. Your order will be delivered in 2 days"

Hindi

Kanada

Healthcare Communication:

Healthcare communication requires accurate pronunciation of medical terms, dosages, and instructions, often involving complex terminology and precise numerical information.

TTS Input: "Namaste Sharma ji, Dr. Gupta ne aapko Metformin 500mg prescribe kiya hai. Ise daily two times, subah aur shaam ko khana ke baad lena hai. Kya aapko koi side-effects ka anubhav ho raha hai?"

Hindi

Gujarati

Multilingual Audiobooks:

Audiobooks require consistent voice quality across languages, natural code-mixing, and expressive narration to bring stories to life. Give a unique voice to your characters in the same language.

TTS Input:
कृष्ण: "अर्जुन, धर्म का मार्ग अक्सर चुनौतियों से भरा होता है, लेकिन विश्वास और संकल्प के साथ, सबसे अंधेरी रातें भी सुबह में बदल जाती हैं।"

‍अर्जुन: "कृष्ण, आपका ज्ञान हमारा मार्गदर्शक तारा है। मैं धर्म की रक्षा करने और अपने लोगों की रक्षा करने का प्रयास करूंगा।"

‍द्रौपदी: "कृष्ण, मेरा हृदय अन्याय के बोझ से भारी है, लेकिन आपकी उपस्थिति मुझे आशा से भर देती है। मुझे विश्वास है कि न्याय की जीत होगी।"

Krishna

Arjun

Draupadi

Male Professional newscaster voice in English:

TTS Input: "The ISRO (Indian Space Research Organisation) has successfully launched its latest satellite, GSAT-30, from the Satish Dhawan Space Centre. The satellite will enhance communication services across India. This achievement marks another milestone for ISRO following their earlier successful missions this year."

TTS Output

Hindi (Female voice):

TTS Input: "इसरो, Indian Space Research Organisation ने अपना latest satellite, GSAT-30, Satish Dhawan Space Centre से, successfully launch कर दिया है। , ये satellite पूरे India में, communication services को improve करेगा। , ये इस साल ISRO के successful missions के बाद , एक और बड़ी achievement है।"

TTS Output

Tamil (Male voice):