Reference

Voice AI Glossary

Key terms in Indian speech AI, NLP, and enterprise voice technology — clearly defined.

ASR

Automatic Speech Recognition

Technology that converts spoken audio into written text. Also called Speech-to-Text (STT). Verbalyze's ASR models are fine-tuned on 50,000+ hours of native Indian audio to handle accents, dialects, and code-switching.

TTS

Text-to-Speech

Technology that synthesises human-sounding speech from written text. Verbalyze offers 60+ native Indian voices across 30+ languages with SSML support and sub-200ms synthesis latency.

NMT

Neural Machine Translation

Deep learning-based translation between languages. Verbalyze's NMT is optimised for Indian language pairs — including transliteration-aware models that handle mixed-script text.

NLU

Natural Language Understanding

A branch of NLP focused on machine comprehension of human language — extracting intent, entities, and context from spoken or written utterances. Used in Verbalyze's voice agents to interpret what callers mean, not just what they say.

LLM

Large Language Model

A deep learning model trained on massive text corpora that can generate, understand, and reason about language. Verbalyze uses Indic fine-tuned LLMs (e.g. Llama 3 Indic, Gemma-Indic) to power voice agent reasoning.

WER

Word Error Rate

The standard metric for ASR accuracy — the percentage of words incorrectly transcribed compared to a reference transcript. Lower is better. Verbalyze achieves 3.2% WER on Hindi ASR benchmarks.

Diarization

Speaker Diarization

The process of segmenting and labelling audio by speaker — answering 'who spoke when'. Critical for call centre analytics where agent and customer speech must be separated for downstream analysis.

Code-Switching

Code-Switching / Code-Mixing

The practice of alternating between two or more languages in a single conversation. Common in India: Hinglish (Hindi + English), Tanglish (Tamil + English). Verbalyze's models are trained on real code-switched audio.

BCP-47

IETF BCP-47 Language Tag

A standardised format for identifying human languages. Used in all Verbalyze API calls to specify language and locale. Example: hi-IN (Hindi, India), ta-IN (Tamil, India), en-IN (English, India).

SSML

Speech Synthesis Markup Language

An XML-based markup language that controls TTS output — adding pauses (<break>), adjusting pitch and rate (<prosody>), and specifying phonemes. Fully supported in Verbalyze TTS.

INT4 / INT8

4-bit / 8-bit Integer Quantization

Model compression techniques that reduce LLM/ASR model weight precision from 32-bit floats to 4 or 8-bit integers. Dramatically reduces memory footprint and inference latency, enabling on-premise deployment on smaller GPU setups.

DPDP

Digital Personal Data Protection Act

India's primary data privacy law (2023). Governs how companies collect, process, and store personal data of Indian citizens. Verbalyze is DPDP-compliant — offering PII redaction, consent APIs, data residency in India, and audit trails.

VAD

Voice Activity Detection

An algorithm that detects the presence or absence of human speech in an audio stream. Used to segment audio into speech and silence regions before ASR processing, improving accuracy and reducing compute cost.

RTF

Real-Time Factor

A ratio measuring ASR processing speed — RTF of 0.1 means the system processes 10 seconds of audio in 1 second. Verbalyze's streaming ASR operates at RTF < 0.05, enabling sub-90ms first-word latency.

mTLS

Mutual TLS (Transport Layer Security)

A security protocol where both client and server authenticate each other using certificates. Verbalyze supports mTLS for enterprise on-premise deployments requiring maximum API security.

PII

Personally Identifiable Information

Any data that can be used to identify an individual — Aadhaar numbers, PAN, phone numbers, bank accounts. Verbalyze's DPDP compliance layer automatically detects and redacts PII from transcripts before storage.