> ## Documentation Index > Fetch the complete documentation index at: https://docs.callkaro.ai/llms.txt > Use this file to discover all available pages before exploring further. # Models, Voice & Transcriber > Configure LLM models, voice providers, and transcribers for your AI agents ## Overview Callkaro AI provides flexibility in choosing the right combination of **LLM Model**, **Voice Provider**, and **Transcriber** for your AI agents. Core intelligence powering your agent's conversation and decision-making capabilities Text-to-speech engines that give your agent its unique voice and personality Speech-to-text engines that convert customer speech into text for processing *** ## LLM Models The Language Learning Model is the brain of your AI agent. It processes the conversation, makes decisions, and generates appropriate responses based on your system prompt. ### Available Models #### OpenAI Models OpenAI's flagship model with superior reasoning capabilities, multimodal understanding, and excellent performance across diverse tasks. **Realtime Compatible:** Yes (use `gpt-4o-realtime-preview`) Compact version of GPT-4o offering excellent balance between cost and performance. **Realtime Compatible:** Yes (use `gpt-4o-mini-realtime-preview`) **Recommended for most use cases.** Advanced GPT-4 variant with improved performance and reliability. **Recommended for most use cases.** Optimized mini variant offering strong performance. Ultra-lightweight model for simple, straightforward interactions. #### Open Source Models Large parameter Meta model offering versatile performance across various tasks. Efficiency-optimized Llama 4 variant with extended context window. Fast and affordable Llama 4 model for standard conversational tasks. Lightweight, fast model for quick responses. Google's Gemma model optimized for instruction-following and conversation. #### Realtime Models **Realtime models require OpenAI voice provider.** When selecting a realtime model, your voice provider will automatically switch to OpenAI. Premium realtime model with ultra-low latency and natural conversation flow. **Features:** * Sub-200ms response time * Natural interruptions and turn-taking * Real-time voice streaming * Advanced emotion detection Cost-effective realtime model for most voice applications. **Features:** * Low latency responses * Real-time voice streaming * Natural conversation flow ### Model Parameters #### Temperature **Range:** 0.0 to 1.0 Temperature controls the randomness and creativity of the model's responses. | Value | Behavior | | ------------- | ----------------------------------- | | **0.0 - 0.3** | Deterministic, focused, predictable | | **0.4 - 0.7** | Balanced, natural, consistent | | **0.8 - 1.0** | Creative, varied, spontaneous | *** ## Voice Providers Voice providers convert your agent's text responses into natural-sounding speech. ### Available Providers Cartesia offers high-quality, low-latency voice synthesis with extensive multilingual support and emotion control. ### Features * Ultra-low latency (\< 300ms) * Extensive Indian language support * Emotion and style control * Multiple voice models (sonic, sonic-turbo, sonic-2, sonic-3) ### Available Voices #### English (Indian Accent) * **Janvi** - Slower, conversational female voice (customer support, hotel reception) * **Kiara** - Versatile, engaging female voice (commercials, narrations, promos) * **Aditi** - Slower female voice (commercials, narrations) * **Devansh** - Friendly, neutral male voice (call center support) * **Neil** - Clear and crisp male voice (customer support, sales, reception) * **Indian Lady** - Young, rich, curious voice (narrator, fictional character) * **Indian Man** - Smooth male voice (narrator) #### Hindi & Hinglish * **Apoorva** - Warm, friendly female (Hinglish sales, commercials) * **Ananya** - Warm, friendly female (Hinglish sales) * **Hinglish Speaking Woman** - Versatile bilingual voice * **Ishan** - Conversational male (Hinglish sales, support) * **Ayush** - Confident young male (Hindi demos, instructions) * **Rupali** - Firm young female (Hindi natural conversation) * **Aadhya** - Slower Hindi female conversational voice * **Amit** - Calm, clear Hindi male (narration, conversation) * **Indian Conversational Woman** - Warm feminine voice * **Parvati** - Young, friendly female (customer support) * **Mihir** - Deeper toned male (casual conversation, support) * **Hindi Reporter Man** - Clear, authoritative (news, documentaries) * **Hindi Narrator Man** - Warm, authoritative (audiobooks, documentaries) #### Regional Languages * **Prakash** (Kannada) - Instructor voice * **Divya** (Kannada) - Joyful narrator * **Suresh** (Marathi) - Instruction voice * **Anika** (Marathi) - Enthusiastic seller * **Vikram** (Telugu) - Folk narrator * **Sindhu** (Telugu) - Conversational partner * **Amit** (Gujarati) - Sports student * **Isha** (Gujarati) - Learner voice #### American Accent * **Brooke** - Friendly, natural female * **Wise Lady** - Authoritative narrator * **Corinne** - Smooth, conversational (phone calls, support) * **Cathy** - Enthusiastic coder * **Friendly Sidekick** - Supportive male (games, videos) ### Voice Models * **sonic-3** - Latest model, best quality * **sonic-turbo** - Optimized for speed * **sonic-2** - Stable, reliable * **sonic** - Original model ### Voice Parameters #### Speed (-1.0 to 1.0) Adjusts speaking rate relative to normal. ### Emotion Control Cartesia supports emotion control allowing you to adjust: * Curiosity * Positivity * Surprise * Anger * Sadness Premium voice synthesis with industry-leading quality and extensive multilingual support. ### Features * Highest voice quality * Extensive language support (32+ languages) * Voice cloning capabilities * Advanced voice customization ### Voice Parameters #### Stability (0.0 - 1.0) Controls consistency and predictability of voice generation. #### Similarity Boost (0.0 - 1.0) Enhances similarity to the original voice sample. #### Style (0.0 - 1.0) Controls style exaggeration and expressiveness. #### Speed (0.5 - 2.0) Adjusts speaking rate. ### Voice Models * **eleven\_flash\_v2\_5** - Latest, fastest (32 languages) * **eleven\_turbo\_v2\_5** - High quality, low latency (32 languages) * **eleven\_turbo\_v2** - English-optimized, fast * **eleven\_flash\_v2** - English-optimized, ultra-fast ### Supported Languages * Hindi * English * Kannada * Marathi * Tamil * Telugu * Bengali * Gujarati * Malayalam **Need another language?** Contact support to request additional language support. Indian-focused voice provider with excellent support for Indian languages. ### Features * Optimized for Indian accents * 11 Indian languages * Built for Indian market ### Available Voices * **Anushka** - Clear and professional * **Manisha** - Warm and friendly * **Vidya** - Articulate and precise * **Arya** - Young and energetic * **Abhilash** - Deep and authoritative * **Karun** - Natural and conversational * **Hitesh** - Professional and engaging ### Voice Models * **bulbul:v2** - Latest model with best quality ### Voice Parameters #### Pitch (0.0 - 1.0) Adjusts voice pitch. #### Speed (0.5 - 2.0) Adjusts speaking rate. ### Supported Languages * Hindi (hi-IN) * English (en-IN) * Kannada (kn-IN) * Marathi (mr-IN) * Tamil (ta-IN) * Telugu (te-IN) * Bengali (bn-IN) * Gujarati (gu-IN) * Malayalam (ml-IN) **Need another language?** Contact support to request additional language support. Microsoft Azure's reliable Text-to-Speech service with extensive global language support. ### Features * Wide language coverage * Voice style support * Enterprise-grade reliability ### Voice Parameters #### Speed (-1.0 to 1.0) Adjusts speaking rate relative to normal. #### Pitch (-1.0 to 1.0) Adjusts voice pitch. #### Volume (0 - 100) Controls voice volume level. #### Voice Style Available styles vary by voice, common styles include: * default * cheerful * calm * empathetic * newscast * customerservice #### Style Degree (0.01 - 2.0) Controls intensity of the voice style. OpenAI's built-in voices for realtime models only. ### Features * Ultra-low latency * Natural conversation flow * Automatic emotion detection ### Available Voices * **sage** - Balanced, professional * **alloy** - Neutral, versatile * **ash** - Clear, articulate * **ballad** - Smooth, melodic * **coral** - Warm, friendly * **echo** - Deep, resonant * **shimmer** - Light, energetic * **verse** - Expressive, dynamic **OpenAI voices are only available with realtime models** (`gpt-4o-realtime-preview` or `gpt-4o-mini-realtime-preview`). *** ## Transcribers Transcribers convert customer speech into text that the LLM can process. ### Available Providers Industry-leading speech recognition with high accuracy and extensive model options. ### Features * Highest accuracy for English * Domain-specific models * Multilingual support * Keyword boosting ### Models #### Nova Series * **nova-3-general** - Latest general-purpose model, best overall accuracy * **nova-3** - Advanced multi-language support * **nova-3-medical** - Optimized for medical terminology * **nova-2-general** - Stable, reliable general-purpose * **nova-2-phonecall** - Optimized for phone call quality * **nova-2-meeting** - Best for meetings and conference calls * **nova-2-conversationalai** - Optimized for AI conversations * **nova-2-medical** - Medical terminology support * **nova-2-finance** - Financial terminology support * **nova-2-voicemail** - Voicemail optimization * **nova-2-drivethru** - Drive-through environments * **nova-2-automotive** - In-vehicle environments #### Other Models * **flux-general** - Fast, lightweight option * **voicemail** - Voicemail-specific ### Supported Languages * Hindi * English * Kannada * Marathi * Tamil * Telugu * Bengali * Gujarati * Malayalam **Need another language?** Contact support to request additional language support. ### Keyword Boosting Deepgram supports **keyword boosting** to improve recognition of specific terms like company names, product names, and industry-specific jargon. Ultra-fast transcription powered by Whisper models on specialized hardware. ### Features * Extremely fast inference * Whisper model quality * Good multilingual support ### Models * **whisper-large-v3-turbo** - Fastest, recommended for most use cases * **whisper-large-v3** - Highest quality * **distil-whisper-large-v3-en** - Optimized for English ### Supported Languages * Hindi * English * Kannada * Marathi * Tamil * Telugu * Bengali * Gujarati * Malayalam **Need another language?** Contact support to request additional language support. Indian-language-focused transcription with excellent Hindi and regional language support. ### Features * Best-in-class Hindi recognition * 12 Indian languages * Code-mixing support (Hinglish) * Unknown language detection ### Models * **saarika:v2.5** - Latest, highest accuracy (Recommended) * **saaras:v2.5** - Alternative model variant * **saarika:v2.0** - Stable version * **saarika:v2** - Standard version * **saarika:v1** - Legacy version * **saarika:flash** - Fast inference ### Supported Languages * Hindi (hi-IN) * English (en-IN) * Kannada (kn-IN) * Marathi (mr-IN) * Tamil (ta-IN) * Telugu (te-IN) * Bengali (bn-IN) * Gujarati (gu-IN) * Malayalam (ml-IN) **Need another language?** Contact support to request additional language support. Microsoft Azure Speech-to-Text with strong Indian language support. ### Features * Multi-language detection * Reliable accuracy * Extensive Indian language support * Enterprise-grade service ### Multi-Language Support Azure transcriber supports **simultaneous multi-language detection**, allowing your agent to automatically detect and transcribe multiple languages within the same conversation. ### Supported Languages * Hindi (hi-IN) * English (en-IN) * Kannada (kn-IN) * Marathi (mr-IN) * Tamil (ta-IN) * Telugu (te-IN) * Bengali (bn-IN) * Gujarati (gu-IN) * Malayalam (ml-IN) **Need another language?** Contact support to request additional language support. Eleven Labs' transcription service with multilingual support. ### Features * Integrated with Eleven Labs voice * Good multilingual support * Decent Indian language coverage ### Models * **scribe\_v2** - Latest transcription model ### Supported Languages * Hindi * English * Kannada * Marathi * Tamil * Telugu * Bengali * Gujarati * Malayalam **Need another language?** Contact support to request additional language support.