Voice AI Glossary
Key terms in Indian speech AI, NLP, and enterprise voice technology — clearly defined.
Automatic Speech Recognition
Technology that converts spoken audio into written text. Also called Speech-to-Text (STT). Verbalyze's ASR models are fine-tuned on 50,000+ hours of native Indian audio to handle accents, dialects, and code-switching.
Text-to-Speech
Technology that synthesises human-sounding speech from written text. Verbalyze offers 60+ native Indian voices across 30+ languages with SSML support and sub-200ms synthesis latency.
Neural Machine Translation
Deep learning-based translation between languages. Verbalyze's NMT is optimised for Indian language pairs — including transliteration-aware models that handle mixed-script text.
Natural Language Understanding
A branch of NLP focused on machine comprehension of human language — extracting intent, entities, and context from spoken or written utterances. Used in Verbalyze's voice agents to interpret what callers mean, not just what they say.
Large Language Model
A deep learning model trained on massive text corpora that can generate, understand, and reason about language. Verbalyze uses Indic fine-tuned LLMs (e.g. Llama 3 Indic, Gemma-Indic) to power voice agent reasoning.
Word Error Rate
The standard metric for ASR accuracy — the percentage of words incorrectly transcribed compared to a reference transcript. Lower is better. Verbalyze achieves 3.2% WER on Hindi ASR benchmarks.
Speaker Diarization
The process of segmenting and labelling audio by speaker — answering 'who spoke when'. Critical for call centre analytics where agent and customer speech must be separated for downstream analysis.
Code-Switching / Code-Mixing
The practice of alternating between two or more languages in a single conversation. Common in India: Hinglish (Hindi + English), Tanglish (Tamil + English). Verbalyze's models are trained on real code-switched audio.
IETF BCP-47 Language Tag
A standardised format for identifying human languages. Used in all Verbalyze API calls to specify language and locale. Example: hi-IN (Hindi, India), ta-IN (Tamil, India), en-IN (English, India).
Speech Synthesis Markup Language
An XML-based markup language that controls TTS output — adding pauses (<break>), adjusting pitch and rate (<prosody>), and specifying phonemes. Fully supported in Verbalyze TTS.
4-bit / 8-bit Integer Quantization
Model compression techniques that reduce LLM/ASR model weight precision from 32-bit floats to 4 or 8-bit integers. Dramatically reduces memory footprint and inference latency, enabling on-premise deployment on smaller GPU setups.
Digital Personal Data Protection Act
India's primary data privacy law (2023). Governs how companies collect, process, and store personal data of Indian citizens. Verbalyze is DPDP-compliant — offering PII redaction, consent APIs, data residency in India, and audit trails.
Voice Activity Detection
An algorithm that detects the presence or absence of human speech in an audio stream. Used to segment audio into speech and silence regions before ASR processing, improving accuracy and reducing compute cost.
Real-Time Factor
A ratio measuring ASR processing speed — RTF of 0.1 means the system processes 10 seconds of audio in 1 second. Verbalyze's streaming ASR operates at RTF < 0.05, enabling sub-90ms first-word latency.
Mutual TLS (Transport Layer Security)
A security protocol where both client and server authenticate each other using certificates. Verbalyze supports mTLS for enterprise on-premise deployments requiring maximum API security.
Personally Identifiable Information
Any data that can be used to identify an individual — Aadhaar numbers, PAN, phone numbers, bank accounts. Verbalyze's DPDP compliance layer automatically detects and redacts PII from transcripts before storage.