Indian ASR that actually understands Bharat
Real-time transcription for 30+ Indian languages. Under 90ms latency. Domain-adapted for banking, healthcare, and retail — right out of the box. Trained on 50,000+ hours of real Indian audio, not translated corpora.
Enterprise-grade ASR features
Every feature is production-tested across millions of minutes of real Indian enterprise audio.
Sub-90ms Streaming ASR
WebSocket and gRPC streaming endpoints deliver first-word hypothesis in under 90ms. Character-level incremental output lets you start processing before the speaker finishes. Designed for real-time call centre and voice agent pipelines where latency is revenue.
30+ Indian Languages & Dialects
Hindi, Tamil, Telugu, Kannada, Malayalam, Marathi, Bengali, Gujarati, Punjabi, Odia, Urdu, Assamese, Maithili, Bhojpuri, Rajasthani, Haryanvi, and 14 more. Each language model is trained on native speaker data — not translated from English corpora.
Code-Switching Intelligence
Hinglish, Tanglish, Manglish — India's natural multilingual speech is handled natively. No post-processing hacks. Our models are trained on real code-switched call recordings from BPO, banking, and retail environments, covering 200M+ code-switching instances.
Domain Vocabulary Adaptation
Pre-trained domain models for BFSI (account numbers, IFSC codes, EMI vocabulary), healthcare (ICD codes, drug names, clinical terms), retail (SKUs, order IDs, courier jargon), and legal. Reduces domain WER by up to 35% vs generic models.
Automatic PII Redaction
Real-time redaction of Aadhaar numbers, PAN cards, bank account numbers, IFSC codes, credit card numbers, phone numbers, and UPI IDs before transcript storage. Configurable redaction policies. Full DPDP Act 2023 compliance with audit trails.
Speaker Diarization
Automatically separate and label agent vs customer speech in dual-channel call recordings. Outputs speaker-timestamped JSON. Critical for call QA, compliance, and AHT calculation. Supports multi-speaker meeting transcription up to 8 participants.
How our ASR pipeline works
Send raw PCM audio over WebSocket or upload a file via REST. Supports 8kHz telephony and 16kHz wideband audio.
Voice Activity Detection filters silence, segments speech, and handles overlapping speech in real-time.
Automatic language identification per utterance. No need to pre-declare language for multilingual calls.
CTC-based acoustic model + language model beam search. Domain vocabulary injected at decode time.
Punctuation, number normalization, entity formatting, PII redaction applied in a streaming post-processor.
JSON response with text, confidence scores, speaker labels, timestamps, and redacted fields.
Native models for every Indian language
Each language model is independently trained on native speaker audio — not derived from cross-lingual transfer or English model fine-tuning. This means our models capture real phonetic patterns, regional accents, and dialectal variation that generic multilingual models miss.
- Separate acoustic + language models per language for maximum accuracy
- Dialect variants: Bhojpuri-Hindi, Haryanvi-Hindi, Tulu-Kannada support
- Script-aware: Devanagari, Tamil, Telugu, Kannada, Bengali scripts
- Automatic Language Identification per utterance — no need to declare language
- Continuous language model updates based on new enterprise feedback data
| Language | BCP-47 | WER | Key Domains |
|---|---|---|---|
| Hindi | hi-IN | 3.2% | BFSI, BPO |
| Tamil | ta-IN | 4.1% | Healthcare, Retail |
| Telugu | te-IN | 4.5% | Agri, Govt |
| Kannada | kn-IN | 5.0% | EdTech, IT |
| Malayalam | ml-IN | 4.8% | Healthcare |
| Marathi | mr-IN | 4.3% | BFSI, BPO |
| Bengali | bn-IN | 4.7% | EdTech, Gov |
| Gujarati | gu-IN | 4.4% | BFSI, Trade |
| + 22 more languages available | |||
Integrate in minutes, not weeks
Official SDKs for Python and Node.js. REST API for any language. WebSocket for real-time streaming. All APIs documented with runnable examples.
- Python SDK (pip install verbalyze)
- Node.js SDK (npm install @verbalyze/sdk)
- REST API — works with any language
- WebSocket streaming endpoint
- Postman collection available
import verbalyze as vb
client = vb.Client(api_key="vb_sk_...")
# Batch transcription
result = client.transcribe(
audio="call_recording.wav",
language="hi-IN",
domain="banking",
diarize=True, # separate speakers
pii_redact=True, # auto-redact PII
)
print(result.text)
# → "नमस्ते, मेरा [ACCOUNT] बंद हो गया है"
print(result.speakers)
# → [{"speaker": "agent", "start": 0.0, "end": 2.1},
# {"speaker": "customer", "start": 2.3, "end": 5.6}]
print(f"Latency: {result.latency_ms}ms | WER confidence: {result.confidence}")Built for India's most demanding voice use cases
Live Call Transcription
Real-time agent+customer transcript for live call monitoring, QA scoring, and supervisor alerts. Reduces manual QA effort by 80%.
EMI & Loan Collections
Transcribe collection calls, extract promise-to-pay commitments, and auto-populate CRM with disposition outcomes. 47% AHT reduction.
Doctor Dictation
Clinical vocabulary ASR for doctor notes, OPD prescriptions, and discharge summaries. ICD-10 code recognition built-in.
Customer Support Automation
Transcribe and classify inbound support calls to auto-route to the right resolution flow. Handles 10,000+ calls/day.
Voice-Based Assessments
Evaluate spoken answers in Hindi and regional languages for language learning, pronunciation scoring, and oral exams.
Government Field Surveys
Field data collection in regional languages. Voice forms for census, agriculture, and healthcare surveys in rural India.
Common questions
What audio formats do you support?
WAV, MP3, FLAC, OGG, M4A, WebM, and raw PCM. Streaming accepts 16-bit PCM at 8kHz (telephony) or 16kHz (wideband). Automatic format detection for batch uploads.
How does streaming work?
Connect to wss://api.verbalyze.in/v2/stt/stream via WebSocket. Send audio chunks and receive incremental transcription tokens in real-time. Supports backpressure and reconnection.
Can I get word-level timestamps?
Yes. Set word_timestamps=true in your request to receive start and end times for each word in the transcript. Useful for subtitle generation and call analytics.
How is accuracy measured?
We report Word Error Rate (WER) on a held-out benchmark dataset of native Indian audio. Our Hindi model achieves 3.2% WER. Domain-fine-tuned models perform 15–35% better on domain vocabulary.
Is on-premise deployment available?
Yes. Our ASR models are available as Docker containers for on-premise or private cloud deployment. Contact us for GPU requirements and deployment support.
Ready to transcribe India's voice?
Get 10,000 free API minutes. No credit card required.