Gnani.ai Launches Prisma v2.5 — Frontier Indic Speech Recognition

The Context

Most ASR models were not built for India.

Indian enterprise calls happen over GSM lines compressed to 8kHz. On factory floors and in bank branches where ambient noise is not an exception but a constant. In sentences that shift between Hindi and English mid-clause. In accents from Lucknow, Coimbatore, Patna, and Surat that no studio corpus has attempted to capture.

The consequence of this mismatch is not a higher error rate. It is operational failure. Consider a single sentence from a loan origination call.

Real-world transcription error — what 10% WER costs in production

Prisma v2.5

mereko loan chahiye thees karod rupaye ka. Kab tak milega?WER 0%

Competitor

mereko loan chahiye theen karod rupaye ka. Kab tak milega?WER 10%

One word substitution. In a loan origination call, that error misrepresents the amount by thirty crore rupees. At 30 million calls per day, error rates are not a quality metric. They are a huge business risk.

The Foundation

14 million hours. 14x more than any competitor.

Gnani Prisma v2.5 is trained on 14 million hours of proprietary Indic speech — 14x more than any competing model. The corpus spans 12 languages with real dialect variation, ambient noise, and natural code-switching baked into the training distribution. This is a significant milestone towards ensuring pan-India coverage with Sovereign AI models.

14M

Hours proprietary
Indic speech data

14x

More Indic data than
any competitor

Indian languages with
real dialect variation

“Most ASR models are built for ideal studio conditions. Indian calls happen over compressed network lines, in at least two languages inside a single sentence, in accents no studio corpus has ever captured. Gnani Prisma v2.5 is built for that reality.”

“Consider a single sentence: ‘mereko loan chahiye thees karod rupaye ka. Kab tak milega.’ A model that mishears ‘thees’ as ‘theen’ produces a 10% WER on that utterance. In a loan origination call, that single error misrepresents the amount by thirty crore rupees. At 30 million calls per day, error rates are not a quality metric. They are a huge business risk.”

“CODEC handling for GSM and VoIP is native. Code-switching across Hindi-English, Tamil-English, and regional-English pairs works at the word level without language tagging. Architectural improvements through post-training optimization have doubled throughput over the previous version without accuracy loss.”

Full Benchmark Results

Interactive benchmark explorer

All results from independent evaluations on publicly available datasets. Same audio input evaluated across all four models. Select any language, dataset, and metric.

Prisma v2.5 Benchmark Explorer

9 languages · 8 datasets · 4 metrics · vs ElevenLabs Scribe v2 · Microsoft Azure STT · Sarvam Saaras v3

Language

Dataset

Metric

Overall Performance

27 wins out of 49 evaluations.

Across all 9 languages and all Standard WER datasets, Prisma v2.5 ranked #1 in 27 of 49 benchmark evaluations — more than ElevenLabs, Microsoft, and Sarvam combined.

Win distribution across 49 evaluations

Model comparison

5-axis normalised performance radar

Higher = better. Best performer on each axis = 1.0.

Head-to-head

Model comparison matrix

Prisma v2.5 vs each competitor — Standard WER, all 9 languages.

Matchup	Prisma wins	Avg WER delta	Best language	Worst language

Complete Benchmark Heatmaps

Every number, all at once.

All 9 languages × all datasets × all 4 models. Winner highlighted per cell. Hover any cell for the full four-way comparison.

Full benchmark heatmap

Winner highlighted per cell — hover for all values

Prisma leads

ElevenLabs leads

Microsoft leads

Sarvam leads

Acoustic Robustness

When audio degrades, competing models collapse.

Average WER degradation moving from clean (kathbath) to noisy (kathbath_noisy) conditions across all 9 languages. The gap is structural, not marginal.

Main slope chart

Average WER: clean vs noisy, 9 languages

Prisma v2.5

ElevenLabs Scribe v2

Microsoft Azure STT

Sarvam Saaras v3

+3.6%

Prisma degradation

+2.6%

Sarvam degradation

+7.8%

Microsoft degradation

+12.2%

ElevenLabs degradation

Small multiples

Per-language noise degradation — all 9 languages

Each panel: clean WER (left) to noisy WER (right), all 4 models.

Dataset Analysis

Not all benchmarks are equal.

Average WER across all 4 models, ranked hardest to easiest. Context for why Gramvaani results matter far more than IndicTTS when evaluating production readiness.

Difficulty ranking

Average WER across all models per dataset

Higher = harder. Gramvaani and kathbath_noisy are most production-relevant.

LLM Correction

Prisma starts clean. The gap only widens.

Standard WER vs LLM WER for each model across 9 languages on kathbath. Prisma starts so clean that its lead increases post-correction on most languages.

LLM correction lift

Standard WER (faded) vs LLM WER (solid) per language

Kathbath dataset. Lower = better. LLM-corrected WER shown solid; raw WER shown faded.

Raw WER

LLM-corrected WER

Prisma v2.5

ElevenLabs

Microsoft

Sarvam

Dravidian Language Leadership

South Indian languages require a different metric.

Tamil, Telugu, Kannada, and Malayalam are agglutinative. WER is structurally broken for these scripts (NAACL 2025, Jain & Bhowmick, Springer 2025). On CER, Prisma v2.5 leads across all four South Indian languages on both clean and noisy conditions.

Metric:

⚠ WER inflates error rates for agglutinative Dravidian scripts. CER is methodologically correct.

Real India Coverage

The Hindi that 500 million people actually speak.

Gramvaani captures spontaneous, conversational Hindi from semi-urban and rural speakers — the only benchmark that approximates how Hindi is spoken by the majority of Hindi speakers in India. On this benchmark, the performance gap is the widest in the entire dataset.

22.0%

Prisma v2.5 WER
Gramvaani

27.5%

ElevenLabs WER
5.5pp worse

26.4%

Microsoft WER
4.4pp worse

On standard Hindi (kathbath), Prisma v2.5 achieves 7.6% WER — the lowest of any model tested. Under noise it holds at 7.9%, a degradation of just 0.3pp where competitors lose 1–3pp.

Business Impact

What the WER gap costs at your call volume.

Enter your daily call volume and per-event cost assumptions. The calculator uses actual WER deltas from benchmark data to estimate misrouted calls and escalations generated by each competitor versus Prisma v2.5.

At-Scale Impact Calculator

Daily cost of competitor WER gap

Based on Gramvaani WER differentials — assumptions editable below

Daily call volume

WER threshold for misroute (%)

10%

Cost per misrouted call (₹)

Cost per agent escalation (₹)

Model Capabilities

What else Prisma v2.5 improves.

Beyond aggregate WER, the v2.5 release addresses specific transcription failure modes that generate downstream errors in enterprise workflows.

Native CODEC handling

GSM and VoIP compression handled at the model level. Native to the architecture, not a pre-processing step.

Word-level code-switching

Hindi-English, Tamil-English, regional-English pairs resolved at the word level without language tagging or pre-detection.

Numerals & alphanumerics

Spoken numbers, policy IDs, account references, and mixed alphanumeric strings. The category most likely to cause CRM and compliance errors.

Named entity accuracy

Person names, company names, and locations in Indian languages with significantly improved recognition in BFSI and insurance vocabulary.

Short utterance recognition

“Haan”, “nahi”, “theek hai” — confirmations, negations, and short commands with higher reliability, reducing IVR failure rates.

2× throughput improvement

Doubled inference throughput over v2.0 with no accuracy regression. High-concurrency deployments require no additional infrastructure scaling.

What this means by deployment type

BFSI & Collections

Lower WER on loan amounts, account numbers, and EMI figures means fewer misrouted cases and fewer compliance flags from transcription errors.

Insurance

Named entity accuracy on policyholder names and policy IDs reduces manual correction rates in post-call CRM workflows.

Telecom IVR

Short utterance recognition and code-switching performance directly reduces containment failure rates on Hindi-English mixed calls.

Rural & Tier-2 markets

Gramvaani-grade accuracy means Prisma v2.5 is the first model that can reliably serve spoken Hindi at the edges of the Hindi belt.

Per-language results

Win-loss scorecard, all 9 languages.

Kathbath and kathbath_noisy — the two most production-representative datasets — side by side. Green = Prisma v2.5 ranked #1.

Complete results

Full benchmark table.

Every language, every dataset, all four models. Filter by language or metric. Click any column header to sort. Prisma v2.5 wins highlighted in orange.

Language:

Metric:

Transparency

Methodology and dataset notes.

All benchmark results use publicly available evaluation datasets computed against the same audio inputs for all four models. No post-processing specific to any one model was applied.

View full dataset reference and methodology

Datasets used in this evaluation:

Kathbath (AI4Bharat) Real-world read speech

Kathbath Noisy + acoustic noise

Gramvaani Conversational rural Hindi

CommonVoice Mozilla crowdsourced

FLEURS (Google) Few-shot eval

IndicTTS Studio TTS read speech

MUCS 2021 Code-switching corpus

IITM Eval IIT Madras eval set

CER is reported as the primary metric for Tamil, Telugu, Kannada, and Malayalam per NAACL 2025 recommendations. WER reported for all languages for completeness. English benchmarks excluded. IndicTTS results represent studio-synthesised speech and should not be used as the primary production performance indicator.

Citations

References

01Kathbath — AI4Bharat. “Towards Building ASR Systems for the Next Billion Users.” AAAI 2022. arxiv.org/abs/2111.03945
02“Advocating Character Error Rate for Multilingual ASR Evaluation.” NAACL 2025. arxiv.org/abs/2410.07400
03Jain & Bhowmick. “Evaluating ASR for Indic Languages: WER vs CER.” Springer Nature, Feb 2025.
04SCRIBE. “WER is Structurally Broken for Agglutinative Indic Languages.” arxiv.org/abs/2605.20712
05Gramvaani — spontaneous Hindi speech from rural and semi-urban speakers.
06FLEURS: Few-Shot Learning Evaluation of Universal Representations of Speech. Google Research, 2022.
07MUCS 2021: Multilingual and Code-Switching ASR Challenges. Interspeech 2021.

Share this research

If you found this useful, pass it on.

LinkedIn X / Twitter

Gnani.ai Research Lab

Explore original research on AI, voice, and language from the Gnani.ai team.

Browse all papers →