June 16, 2026
The most accurate speech-to-text model for India. 15% lower WER for rural Hindi. 18% lower WER across Dravidian languages. Available now via API.
Indian enterprise calls happen over GSM lines compressed to 8kHz. On factory floors and in bank branches where ambient noise is not an exception but a constant. In sentences that shift between Hindi and English mid-clause. In accents from Lucknow, Coimbatore, Patna, and Surat that no studio corpus has attempted to capture.
The consequence of this mismatch is not a higher error rate. It is operational failure. Consider a single sentence from a loan origination call.
Gnani Prisma v2.5 is trained on 14 million hours of proprietary Indic speech — 14x more than any competing model. The corpus spans 12 languages with real dialect variation, ambient noise, and natural code-switching baked into the training distribution. This is a significant milestone towards ensuring pan-India coverage with Sovereign AI models.
All results from independent evaluations on publicly available datasets. Same audio input evaluated across all four models. Select any language, dataset, and metric.
Across all 9 languages and all Standard WER datasets, Prisma v2.5 ranked #1 in 27 of 49 benchmark evaluations — more than ElevenLabs, Microsoft, and Sarvam combined.
| Matchup | Prisma wins | Avg WER delta | Best language | Worst language |
|---|
All 9 languages × all datasets × all 4 models. Winner highlighted per cell. Hover any cell for the full four-way comparison.
Average WER degradation moving from clean (kathbath) to noisy (kathbath_noisy) conditions across all 9 languages. The gap is structural, not marginal.
Average WER across all 4 models, ranked hardest to easiest. Context for why Gramvaani results matter far more than IndicTTS when evaluating production readiness.
Standard WER vs LLM WER for each model across 9 languages on kathbath. Prisma starts so clean that its lead increases post-correction on most languages.
Tamil, Telugu, Kannada, and Malayalam are agglutinative. WER is structurally broken for these scripts (NAACL 2025, Jain & Bhowmick, Springer 2025). On CER, Prisma v2.5 leads across all four South Indian languages on both clean and noisy conditions.
Gramvaani captures spontaneous, conversational Hindi from semi-urban and rural speakers — the only benchmark that approximates how Hindi is spoken by the majority of Hindi speakers in India. On this benchmark, the performance gap is the widest in the entire dataset.
On standard Hindi (kathbath), Prisma v2.5 achieves 7.6% WER — the lowest of any model tested. Under noise it holds at 7.9%, a degradation of just 0.3pp where competitors lose 1–3pp.
Enter your daily call volume and per-event cost assumptions. The calculator uses actual WER deltas from benchmark data to estimate misrouted calls and escalations generated by each competitor versus Prisma v2.5.
Beyond aggregate WER, the v2.5 release addresses specific transcription failure modes that generate downstream errors in enterprise workflows.
Kathbath and kathbath_noisy — the two most production-representative datasets — side by side. Green = Prisma v2.5 ranked #1.
Every language, every dataset, all four models. Filter by language or metric. Click any column header to sort. Prisma v2.5 wins highlighted in orange.
All benchmark results use publicly available evaluation datasets computed against the same audio inputs for all four models. No post-processing specific to any one model was applied.
Datasets used in this evaluation:
CER is reported as the primary metric for Tamil, Telugu, Kannada, and Malayalam per NAACL 2025 recommendations. WER reported for all languages for completeness. English benchmarks excluded. IndicTTS results represent studio-synthesised speech and should not be used as the primary production performance indicator.