Copied
Models:
Prisma v2.5
ElevenLabs Scribe v2
Microsoft Azure STT
Sarvam Saaras v3
10.0%
12.5%
7.6%
14.4%
27.7%
13.9%
13.0%
20.4%
17.1%

June 16, 2026

Gnani.ai Launches Prisma v2.5

The most accurate speech-to-text model for India. 15% lower WER for rural Hindi. 18% lower WER across Dravidian languages. Available now via API.

8 / 9
Languages ranked #1
real-world & noisy
14M
Hours proprietary Indic
speech training data
27 / 49
Benchmark evaluations
ranked #1 overall
The Context

Most ASR models were not built for India.

Indian enterprise calls happen over GSM lines compressed to 8kHz. On factory floors and in bank branches where ambient noise is not an exception but a constant. In sentences that shift between Hindi and English mid-clause. In accents from Lucknow, Coimbatore, Patna, and Surat that no studio corpus has attempted to capture.

The consequence of this mismatch is not a higher error rate. It is operational failure. Consider a single sentence from a loan origination call.

Real-world transcription error — what 10% WER costs in production
Prisma v2.5
mereko loan chahiye thees karod rupaye ka. Kab tak milega?WER 0%
Competitor
mereko loan chahiye theen karod rupaye ka. Kab tak milega?WER 10%
One word substitution. In a loan origination call, that error misrepresents the amount by thirty crore rupees. At 30 million calls per day, error rates are not a quality metric. They are a huge business risk.
The Foundation

14 million hours. 14x more than any competitor.

Gnani Prisma v2.5 is trained on 14 million hours of proprietary Indic speech — 14x more than any competing model. The corpus spans 12 languages with real dialect variation, ambient noise, and natural code-switching baked into the training distribution. This is a significant milestone towards ensuring pan-India coverage with Sovereign AI models.

14M
Hours proprietary
Indic speech data
14x
More Indic data than
any competitor
12
Indian languages with
real dialect variation
“Most ASR models are built for ideal studio conditions. Indian calls happen over compressed network lines, in at least two languages inside a single sentence, in accents no studio corpus has ever captured. Gnani Prisma v2.5 is built for that reality.”
“Consider a single sentence: ‘mereko loan chahiye thees karod rupaye ka. Kab tak milega.’ A model that mishears ‘thees’ as ‘theen’ produces a 10% WER on that utterance. In a loan origination call, that single error misrepresents the amount by thirty crore rupees. At 30 million calls per day, error rates are not a quality metric. They are a huge business risk.”
“CODEC handling for GSM and VoIP is native. Code-switching across Hindi-English, Tamil-English, and regional-English pairs works at the word level without language tagging. Architectural improvements through post-training optimization have doubled throughput over the previous version without accuracy loss.”
Full Benchmark Results

Interactive benchmark explorer

All results from independent evaluations on publicly available datasets. Same audio input evaluated across all four models. Select any language, dataset, and metric.

Prisma v2.5 Benchmark Explorer
9 languages  ·  8 datasets  ·  4 metrics  ·  vs ElevenLabs Scribe v2 · Microsoft Azure STT · Sarvam Saaras v3
Language
Dataset
Metric
Overall Performance

27 wins out of 49 evaluations.

Across all 9 languages and all Standard WER datasets, Prisma v2.5 ranked #1 in 27 of 49 benchmark evaluations — more than ElevenLabs, Microsoft, and Sarvam combined.

Win distribution across 49 evaluations

Model comparison
5-axis normalised performance radar
Higher = better. Best performer on each axis = 1.0.
Head-to-head
Model comparison matrix
Prisma v2.5 vs each competitor — Standard WER, all 9 languages.
MatchupPrisma winsAvg WER deltaBest languageWorst language
Complete Benchmark Heatmaps

Every number, all at once.

All 9 languages × all datasets × all 4 models. Winner highlighted per cell. Hover any cell for the full four-way comparison.

Full benchmark heatmap
Winner highlighted per cell — hover for all values
Prisma leads
ElevenLabs leads
Microsoft leads
Sarvam leads
Acoustic Robustness

When audio degrades, competing models collapse.

Average WER degradation moving from clean (kathbath) to noisy (kathbath_noisy) conditions across all 9 languages. The gap is structural, not marginal.

Main slope chart
Average WER: clean vs noisy, 9 languages
Prisma v2.5
ElevenLabs Scribe v2
Microsoft Azure STT
Sarvam Saaras v3
+3.6%
Prisma degradation
+2.6%
Sarvam degradation
+7.8%
Microsoft degradation
+12.2%
ElevenLabs degradation
Small multiples
Per-language noise degradation — all 9 languages
Each panel: clean WER (left) to noisy WER (right), all 4 models.
Dataset Analysis

Not all benchmarks are equal.

Average WER across all 4 models, ranked hardest to easiest. Context for why Gramvaani results matter far more than IndicTTS when evaluating production readiness.

Difficulty ranking
Average WER across all models per dataset
Higher = harder. Gramvaani and kathbath_noisy are most production-relevant.
LLM Correction

Prisma starts clean. The gap only widens.

Standard WER vs LLM WER for each model across 9 languages on kathbath. Prisma starts so clean that its lead increases post-correction on most languages.

LLM correction lift
Standard WER (faded) vs LLM WER (solid) per language
Kathbath dataset. Lower = better. LLM-corrected WER shown solid; raw WER shown faded.
Raw WER
LLM-corrected WER
Prisma v2.5
ElevenLabs
Microsoft
Sarvam
Dravidian Language Leadership

South Indian languages require a different metric.

Tamil, Telugu, Kannada, and Malayalam are agglutinative. WER is structurally broken for these scripts (NAACL 2025, Jain & Bhowmick, Springer 2025). On CER, Prisma v2.5 leads across all four South Indian languages on both clean and noisy conditions.

Metric:
⚠  WER inflates error rates for agglutinative Dravidian scripts. CER is methodologically correct.
Real India Coverage

The Hindi that 500 million people actually speak.

Gramvaani captures spontaneous, conversational Hindi from semi-urban and rural speakers — the only benchmark that approximates how Hindi is spoken by the majority of Hindi speakers in India. On this benchmark, the performance gap is the widest in the entire dataset.

22.0%
Prisma v2.5 WER
Gramvaani
27.5%
ElevenLabs WER
5.5pp worse
26.4%
Microsoft WER
4.4pp worse

On standard Hindi (kathbath), Prisma v2.5 achieves 7.6% WER — the lowest of any model tested. Under noise it holds at 7.9%, a degradation of just 0.3pp where competitors lose 1–3pp.

Business Impact

What the WER gap costs at your call volume.

Enter your daily call volume and per-event cost assumptions. The calculator uses actual WER deltas from benchmark data to estimate misrouted calls and escalations generated by each competitor versus Prisma v2.5.

At-Scale Impact Calculator
Daily cost of competitor WER gap
Based on Gramvaani WER differentials — assumptions editable below
Daily call volume
5M
WER threshold for misroute (%)
10%
Cost per misrouted call (₹)
Cost per agent escalation (₹)
Model Capabilities

What else Prisma v2.5 improves.

Beyond aggregate WER, the v2.5 release addresses specific transcription failure modes that generate downstream errors in enterprise workflows.

01
Native CODEC handling
GSM and VoIP compression handled at the model level. Native to the architecture, not a pre-processing step.
02
Word-level code-switching
Hindi-English, Tamil-English, regional-English pairs resolved at the word level without language tagging or pre-detection.
03
Numerals & alphanumerics
Spoken numbers, policy IDs, account references, and mixed alphanumeric strings. The category most likely to cause CRM and compliance errors.
04
Named entity accuracy
Person names, company names, and locations in Indian languages with significantly improved recognition in BFSI and insurance vocabulary.
05
Short utterance recognition
“Haan”, “nahi”, “theek hai” — confirmations, negations, and short commands with higher reliability, reducing IVR failure rates.
06
2× throughput improvement
Doubled inference throughput over v2.0 with no accuracy regression. High-concurrency deployments require no additional infrastructure scaling.
What this means by deployment type
BFSI & Collections
Lower WER on loan amounts, account numbers, and EMI figures means fewer misrouted cases and fewer compliance flags from transcription errors.
Insurance
Named entity accuracy on policyholder names and policy IDs reduces manual correction rates in post-call CRM workflows.
Telecom IVR
Short utterance recognition and code-switching performance directly reduces containment failure rates on Hindi-English mixed calls.
Rural & Tier-2 markets
Gramvaani-grade accuracy means Prisma v2.5 is the first model that can reliably serve spoken Hindi at the edges of the Hindi belt.
Per-language results

Win-loss scorecard, all 9 languages.

Kathbath and kathbath_noisy — the two most production-representative datasets — side by side. Green = Prisma v2.5 ranked #1.

Complete results

Full benchmark table.

Every language, every dataset, all four models. Filter by language or metric. Click any column header to sort. Prisma v2.5 wins highlighted in orange.

Language:
Metric:
Transparency

Methodology and dataset notes.

All benchmark results use publicly available evaluation datasets computed against the same audio inputs for all four models. No post-processing specific to any one model was applied.

View full dataset reference and methodology

Datasets used in this evaluation:

Kathbath (AI4Bharat) Real-world read speech
Kathbath Noisy + acoustic noise
Gramvaani Conversational rural Hindi
CommonVoice Mozilla crowdsourced
FLEURS (Google) Few-shot eval
IndicTTS Studio TTS read speech
MUCS 2021 Code-switching corpus
IITM Eval IIT Madras eval set

CER is reported as the primary metric for Tamil, Telugu, Kannada, and Malayalam per NAACL 2025 recommendations. WER reported for all languages for completeness. English benchmarks excluded. IndicTTS results represent studio-synthesised speech and should not be used as the primary production performance indicator.

Citations

References

  1. 01Kathbath — AI4Bharat. “Towards Building ASR Systems for the Next Billion Users.” AAAI 2022. arxiv.org/abs/2111.03945
  2. 02“Advocating Character Error Rate for Multilingual ASR Evaluation.” NAACL 2025. arxiv.org/abs/2410.07400
  3. 03Jain & Bhowmick. “Evaluating ASR for Indic Languages: WER vs CER.” Springer Nature, Feb 2025.
  4. 04SCRIBE. “WER is Structurally Broken for Agglutinative Indic Languages.” arxiv.org/abs/2605.20712
  5. 05Gramvaani — spontaneous Hindi speech from rural and semi-urban speakers.
  6. 06FLEURS: Few-Shot Learning Evaluation of Universal Representations of Speech. Google Research, 2022.
  7. 07MUCS 2021: Multilingual and Code-Switching ASR Challenges. Interspeech 2021.
Share this research
If you found this useful, pass it on.
LinkedIn X / Twitter
Gnani.ai Research Lab
Continue reading.
Explore original research on AI, voice, and language from the Gnani.ai team.
Browse all papers →